Skip to content

Conversation

@fuweid
Copy link
Member

@fuweid fuweid commented Oct 28, 2024

The containerd-shim creates pipes and passes them to the init container as stdin, stdout, and stderr for logging purposes. By default, these pipes are owned by the root user (UID/GID: 0/0). The init container can access them directly through inheritance.

However, if the init container attempts to open any files pointing to these pipes (e.g., /proc/1/fd/2, /dev/stderr), it will encounter a permission issue since it is not the owner. To avoid this, we need to align the ownership of the pipes with the init process.

Fixes: #10598

@fuweid
Copy link
Member Author

fuweid commented Oct 28, 2024

ping @rata @ctrox

@fuweid fuweid requested review from dmcgowan and samuelkarp October 28, 2024 20:30
@fuweid fuweid force-pushed the fixissue-10598 branch 2 times, most recently from 9ea1b2f to 8eebe2f Compare October 28, 2024 20:50
@fuweid
Copy link
Member Author

fuweid commented Oct 28, 2024

hmm. failed when using crun. checking

@fuweid fuweid marked this pull request as draft October 29, 2024 06:08
@fuweid fuweid force-pushed the fixissue-10598 branch 4 times, most recently from 2f01bba to ef09f27 Compare October 30, 2024 04:48
@fuweid
Copy link
Member Author

fuweid commented Oct 30, 2024

kindly ping @AkihiroSuda I can pass that new test in my local and in ubuntu github action env. However, it failed in vagrant box. I don't have env to support nested vm yet. Any thought about this? Is it related to selinux profile to block read on proc? Thanks

@fuweid fuweid force-pushed the fixissue-10598 branch 2 times, most recently from ca18a1b to c3fc548 Compare October 30, 2024 20:56
@fuweid fuweid marked this pull request as ready for review October 30, 2024 22:09
@dosubot dosubot bot added the area/cri Container Runtime Interface (CRI) label Oct 30, 2024
@fuweid
Copy link
Member Author

fuweid commented Oct 30, 2024

kindly ping @AkihiroSuda @dmcgowan @samuelkarp it's ready for review.

@fuweid
Copy link
Member Author

fuweid commented Oct 30, 2024

/retest

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I've reviewed the code and also tested it locally and solves the problem here. The k8s fails seem unrelated to this

Sorry, I was sick, I'm just coming back :)

}
}

func TestIssue10598(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you have verified this test fails before the patch, right? As there are several subtle things, I'd like to confirm this.

@rata
Copy link
Contributor

rata commented Nov 1, 2024

/retest

@rata
Copy link
Contributor

rata commented Nov 1, 2024

But for some reason for other PRs doesn't seem to be failing. I can't see anything weird in the logs, just it didn't find a container in the kube-system namespace, the readiness is false. But the runtime is not using userns and the container seems running just fine.

Anyone has any clues?

@AkihiroSuda
Copy link
Member

/retest

@AkihiroSuda
Copy link
Member

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/containerd_containerd/10906/pull-containerd-node-e2e/1852346152444235776

E2eNode Suite: [It] [sig-node] Summary API [NodeConformance] when querying /stats/summary should report resource usage through the stats api expand_less | 3m19s
-- | --
{ failed [FAILED] Timed out after 180.000s. [...]

@AkihiroSuda
Copy link
Member

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/containerd_containerd/10906/pull-containerd-k8s-e2e-ec2/1852415523602567168

{  instance i-0e64fd3802bff9ea1 is not running}

https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/containerd_containerd/10906/pull-containerd-k8s-e2e-ec2/1852415523602567168/artifacts/logs/i-0e64fd3802bff9ea1/containerd.log

[...]
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.338344642Z" level=info msg="Container 14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8: CDI devices from CRI Config.CDIDevices: []"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.346766435Z" level=info msg="CreateContainer within sandbox \"3109094df28fc9e1e303fe7203517f74e19d04784f64a6c76fd9145b937a58f8\" for &ContainerMetadata{Name:kube-controller-manager,Attempt:5,} returns container id \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.347120265Z" level=info msg="StartContainer for \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.347518622Z" level=debug msg="Start writing stream \"stdout\" to log file \"/var/log/pods/kube-system_kube-controller-manager-ip-172-31-11-145.ec2.internal_cb3761f0f7b0ce36637118ce88517d86/kube-controller-manager/5.log\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.347562254Z" level=debug msg="Start writing stream \"stderr\" to log file \"/var/log/pods/kube-system_kube-controller-manager-ip-172-31-11-145.ec2.internal_cb3761f0f7b0ce36637118ce88517d86/kube-controller-manager/5.log\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.348326684Z" level=info msg="connecting to shim 14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8" address="unix:///run/containerd/s/2f7b648f9009da454c6f1e405c35375cc88d57de5e3ba196d3435fa375bb4a33" protocol=ttrpc version=3
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.398820672Z" level=debug msg="failed to delete task" error="rpc error: code = NotFound desc = container not created: not found" id=14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.398895893Z" level=error msg="copy shim log after reload" error="read /proc/self/fd/41: file already closed"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399409914Z" level=error msg="Failed to pipe stdout of container \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\"" error="reading from a closed fifo"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399462439Z" level=debug msg="Finish piping stdout of container \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399483901Z" level=debug msg="Finish redirecting stream \"stdout\" to log file \"/var/log/pods/kube-system_kube-controller-manager-ip-172-31-11-145.ec2.internal_cb3761f0f7b0ce36637118ce88517d86/kube-controller-manager/5.log\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399471851Z" level=error msg="Failed to pipe stderr of container \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\"" error="reading from a closed fifo"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399527284Z" level=debug msg="Finish piping stderr of container \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399559370Z" level=debug msg="Finish redirecting stream \"stderr\" to log file \"/var/log/pods/kube-system_kube-controller-manager-ip-172-31-11-145.ec2.internal_cb3761f0f7b0ce36637118ce88517d86/kube-controller-manager/5.log\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.399588450Z" level=debug msg="Finish redirecting log file \"/var/log/pods/kube-system_kube-controller-manager-ip-172-31-11-145.ec2.internal_cb3761f0f7b0ce36637118ce88517d86/kube-controller-manager/5.log\", closing it"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.401997333Z" level=error msg="StartContainer for \"14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8\" failed" error="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/14fc7e6f2d112f1f5b7f96a55ab7763b41962853ee4a1f1d64549061b57af2f8/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.739042062Z" level=info msg="RemoveContainer for \"de6ef68dd1c2ffc63ffcbea978c59f69a857ec54319b0a9a290117d5721ee487\""
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.744233072Z" level=info msg="RemoveContainer for \"de6ef68dd1c2ffc63ffcbea978c59f69a857ec54319b0a9a290117d5721ee487\" returns successfully"
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.891237094Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
Nov 01 18:36:40 ip-172-31-11-145 containerd[16097]: time="2024-11-01T18:36:40.892908607Z" level=debug msg="removed snapshot" key=k8s.io/116/de6ef68dd1c2ffc63ffcbea978c59f69a857ec54319b0a9a290117d5721ee487 snapshotter=overlayfs

exec: \"runc\": executable file not found in $PATH is quite weird 🤔

@samuelkarp
Copy link
Member

/retest

@fuweid fuweid force-pushed the fixissue-10598 branch 3 times, most recently from 261ad4e to 7b69b48 Compare November 7, 2024 01:08
@fuweid
Copy link
Member Author

fuweid commented Nov 7, 2024

kindly ping @dmcgowan @AkihiroSuda @samuelkarp @rata

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I've tested manually and this indeed solves the issue

@fuweid fuweid added the cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch label Nov 11, 2024
}
}

func TestIssue10598(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment and the issue link to explain what is being tested here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please take a look.

The containerd-shim creates pipes and passes them to the init container as
stdin, stdout, and stderr for logging purposes. By default, these pipes are
owned by the root user (UID/GID: 0/0). The init container can access them
directly through inheritance.

However, if the init container attempts to open any files pointing to these
pipes (e.g., /proc/1/fd/2, /dev/stderr), it will encounter a permission issue
since it is not the owner. To avoid this, we need to align the ownership of
the pipes with the init process.

Fixes: containerd#10598

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid
Copy link
Member Author

fuweid commented Nov 19, 2024

/retest

@dmcgowan dmcgowan added this pull request to the merge queue Nov 20, 2024
Merged via the queue into containerd:main with commit 480e85d Nov 20, 2024
56 checks passed
@fuweid fuweid deleted the fixissue-10598 branch November 20, 2024 06:28
@fuweid
Copy link
Member Author

fuweid commented Nov 20, 2024

/cherrypick release/2.0

@k8s-infra-cherrypick-robot

@fuweid: new pull request created: #11035

Details

In response to this:

/cherrypick release/2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@austinvazquez austinvazquez added cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch and removed cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch labels Nov 21, 2024
@lorenz
Copy link
Contributor

lorenz commented Nov 28, 2024

This change breaks runtimes which don't use runc's option format (like gVisor).

Error: failed to create containerd task: failed to create shim task: unsupported option type "containerd.runc.v1.Options"

Using WithUIDOwner/WithGIDOwner is only possible with a runc-style runtime, otherwise either the runtime returns an error (if that's the first option) or the function returns an error on task creation as it can only work on runc's option format.

@rata
Copy link
Contributor

rata commented Nov 29, 2024

@lorenz thanks for catching this! I'm currently traveling. I'll take a look soon to see if I can come up with something.

@fuweid do you have any ideas on how to solve this?

@fuweid
Copy link
Member Author

fuweid commented Dec 3, 2024

Hi @lorenz

containerd uses runc/options as task option. IMO, it's API proto and the containerd client SDK is using this proto.

Checked https://github.com/google/gvisor/blob/4971756d8d9b7b9f1954a1357348f3dd2b9d4a68/pkg/shim/runsc/service.go#L245.
Maybe gvisor-shim should align with this proto.

You could file an issue as well if you have any question.

Thanks

@lorenz
Copy link
Contributor

lorenz commented Dec 3, 2024

a) This is a breaking change inside the 2.0 release, this is IMO unacceptable to be released as is. It also affects pods not using user namespaces, so it doesn't even just break use cases enabled by this change.
b) If containerd policy is that you use the runc options proto that should be hardcoded in the proto and the options type renamed to be non-runc-specific. Right now this extensibility is explicitly codified in containerd's API and is being used.

gVisor has a fully incompatible option concept, it's not just some minor changes/renames, they store fundamentally different things in it.

@fuweid
Copy link
Member Author

fuweid commented Dec 3, 2024

a) This is a breaking change inside the 2.0 release, this is IMO unacceptable to be released as is. It also affects pods not using user namespaces, so it doesn't even just break use cases enabled by this change.

For the spec, I don't think we make breaking change here. If you can take a look on release 1.6 (check the below url), we, containerd, move the proto to another folder.

https://github.com/containerd/containerd/blob/release/1.6/runtime/v2/runc/options/oci.proto

Checked that gvisor again https://github.com/google/gvisor/blob/master/g3doc/user_guide/containerd/configuration.md

If you pass the TypeUrl with runsc, it might work.

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
  TypeUrl = "io.containerd.runsc.v1.options"
  ConfigPath = "/etc/containerd/runsc.toml"

Again, suggest file new issue to describe how to reproduce your issue. Thanks


Edited.

ping @lorenz just in case that miss the notification since this is closed pull request.

@lorenz
Copy link
Contributor

lorenz commented Dec 3, 2024

For the spec, I don't think we make breaking change here. If you can take a look on release 1.6 (check the below url), we, containerd, move the proto to another folder.

This works with the 2.0 release, but stopped working after this CL.

If you pass the TypeUrl with runsc, it might work.

The config I'm testing with already sets these options today as I need to set some gVisor-specific options.

@fuweid
Copy link
Member Author

fuweid commented Dec 3, 2024

This works with the 2.0 release, but stopped working after this CL.

Could you please create issue for this? I will take a look with reproduce steps. thanks

Edited.


Ping @lorenz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) area/runtime Runtime cherry-picked/2.0.x PR commits are cherry picked into the release/2.0 branch kind/bug size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v2.0.0-rc.3 user namespaces stdio permissions

9 participants