-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Description
In recent versions of containerd 2 when using user namespaces, the setgroups syscall has started failing with EPERM from inside the constructed user namespace.
The cause appears to be that /proc/self/setgroups has been set to deny.
#10611 and #10607 look like likely suspect to me, these change how the gid map is established in the user namespace provided to the container. Prior to #10611 we were writing to /proc/pid/gid_map ourselves instead of using the Go stdlib, and nothing particularly touched /proc/pid/setgroups at all, so it was left at its default of allow.
Switching to the stdlib had the subtle side-effect of a deny getting written to /proc/{cloned-pid}/setgroups by Go's forkAndExecInChild / forkAndExecInChild1 unless SysProcAttr.GidMappingsEnableSetgroups is true.
cc: @AkihiroSuda @fuweid @rata
Steps to reproduce the issue
- Start a container with userns enabled
- Read
/proc/self/setgroupsfrom the container, observe its value isdeny. Alternatively, attempt to callsetgroups, such as withsudo, from within the container.
Describe the results you received and expected
Expect setgroups to be allowed, as in non-user namespaced containers and as in namespaced containers before the changes from late August.
What version of containerd are you using?
current main
Any other relevant information
#10741 seems like it's about what's needed, it resolves the issue for me.
Show configuration if it is related to CRI plugin.
No response