-
Notifications
You must be signed in to change notification settings - Fork 2.2k
rootless: fix Docker-in-LXD regression #1833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cyphar PTAL |
utils_linux.go
Outdated
| return system.GetParentNSeuid() != 0 || system.RunningInUserNS(), nil | ||
| // if system.RunningInUserNS() == true: regardless of whether the userns was created by a regular | ||
| // user or the root (in the initial namespace), we need to enable the rootless mode. | ||
| return os.Geteuid() != 0 || system.RunningInUserNS(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense -- because the parent namespace's euid is not actually use for permission checks. However, I think that this check is still not entirely correct: os.Geteuid() == 0 is too strict of a check -- I think we should be checking whether the current process has CAP_SYS_ADMIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, on second thought, I'm not sure we should always automatically enable rootless mode when running in userns. Maybe this has been breaking Docker-in-LXD usecase? (Although the corresponding commit is not used in Docker yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always ask @brauner. They do have some runc patches from memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to just have , as in the initial revision of the code?return system.GetParentNSeuid() != 0
#1688 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @giuseppe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looks like my commit in #1688 breaks Docker in LXD (sorry):
$ lxc launch ubuntu:18.04 foo -c security.nesting=true
$ lxc file push /usr/local/sbin/runc foo/usr/bin/docker-runc
...
foo# docker run -it --rm busybox
docker: Error response from daemon: OCI runtime create failed: cannot specify gid= mount options for unmapped gid in rootless containers: unknown.
Could you suggest logic for isRootless?
For LXD compatibility, we could remove userns detection from isRootless(), but then we will break img/buildkit/buildah/podman probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, lemme think. I'll get back to you. I just need to finish something else first. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I need to know some stuff:
- Do you have a
fork()/clone()+exec()model whereby the child process is the container and runs in a new set of namespaces or something else? - Where is
IsRootless()supposed to be called? In the parent namespaces or the child namespaces (presupposing a particular model in this question so feel free to reformulate as approriate)? - What decisions are based on
IsRootless()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a fork()/clone()+ exec() model whereby the child process is the container and runs in a new set of namespaces or something else?
img and RootlessKit (used by BuildKit, containerd PR and Docker POC) do this.
I think Buildah and Podman do similar stuff but haven't looked into their code deeply yet.
In addition to user namespace, we sets CLONE_NEWNS and optionally CLONE_NEWNET.
However, unlike Docker-in-LXD, we don't do chroot nor pivot_root.
Also, we keep environment variables such as $USER.
i. e. we we execute runc as UID=0 but $USER="penguin" in userns+mntns(+netns).
Where is IsRootless() supposed to be called? In the parent namespaces or the child namespaces (presupposing a particular model in this question so feel free to reformulate as approriate)?
Child namespaces, from POV of img.
But note that executing runc in the initial namespace (as non-root UID) needs to be also allowed.
What decisions are based on IsRootless()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just saw your answers. Thanks.
However, unlike Docker-in-LXD, we don't do chroot nor pivot_root.
Out of curiosity, only for the fully unprivileged case?
d107f1d to
f1906fb
Compare
|
Updated PR, please refer to step 1 of #1837 |
| // Corner cases: | ||
| // * When runc is executed in userns via Docker-in-rootless-Docker ("rootless dind"), as the root in the contaienr, isRootless() returns false, and unlikely to work. The dockerd in the container would need to specify runc --rootless explicitly in this case. (And user would need to launch dockerd with --rootless explicitly, probably) | ||
| // * When runc is executed in "rootless dind", as a non-root in the contaienr, isRootless() returns true and likely to work. | ||
| u := os.Getenv("USER") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The USER environment variable seems to be set by the login process (see man login(1). Garden runs as server (i.e. it is not logging in) and therefore USER is not set when runc is invoked which makes isRootless return false regardless of whether we want rootless or not (os.Getenv returns an empty string if the var is not set)
In order to support both cases via checking the USER value only if it is set and fallback to the effective user ID if not, i.e. something like
func isRootless(context *cli.Context) (bool, error) {
if context != nil {
...
}
user, userSet := os.LookupEnv("USER")
if userSet {
return user != "" && user != "root", nil
}
return system.GetParentNSeuid() != 0 || system.RunningInUserNS(), nil
}What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danail-branekov this looks like a safer version, I think it makes sense to fallback to the euid when USER is not set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but it doesn't fix Docker-in-LXD regression, because Docker-in-LXD executes runc without $USER, as euid=0 in userns, but it expects isRootless() to return false...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Garden runs as server
How do you run rootless garden as a server? via systemd --user?
systemd --user seems to set $USER, at least on Ubuntu 18.04?
suda@suda-ws01:~$ systemctl show-environment
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binsuda@suda-ws01:~$ systemctl --user show-environment
HOME=/home/suda
LANG=en_US.UTF-8
LOGNAME=suda
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
SHELL=/bin/bash
USER=suda
XDG_RUNTIME_DIR=/run/user/1001There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run garden with monit, which doesn't set USER. We also run the server as a user with maximum user id that does not exist in /etc/passwd/ or elsewhere, so it would be best if we could run runc in rootless mode without setting $USER. On the other hand we can always run with the --rootless flag, so I guess this isn't too big of a problem for us.
To clarify, in the issue you said When runc is executed in userns via Docker-in-LXD, isRootless() returns false, because LXD would set $USER to "root". - I don't understand in which case $USER is being set in Docker-in-LXD here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because LXD would set $USER to "root”
sorry this was my wrong assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because LXD would set $USER to "root”
sorry this was my wrong assumption
So if Docker-in-LXD does NOT set $USER at all (i.e. $USER is either NOT on the environment, or set to en empty string when runc is run), then your PR would make isRootless return false, i.e. it would run rootful. Is that what you want to achieve? Is that what fixes the regression?
If yes, could you please update the comment in the code to something like
When runc is executed in userns via Docker-in-LXD, isRootless() returns false, because LXD would not set $USER.
so that we do not get confused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. also added EUID check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LXD does not set anything at all unless explicitly instructed via environment variables. As soon as the basic container setup is done everything is in the hands of the binary that is executed. In the regular case for LXD and LXC this means a full-blown init system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. also added EUID check
Oh, great! The EUID check makes our CF Garden acceptance tests quite happy 👍
Thanks!
|
I also made more precise version ("Step 2" of #1837), but very huge change: AkihiroSuda@dc7a32a So I suggest merging this PR (#1833) first if it doesn't break CloudFoundry. |
f1906fb to
076b3b1
Compare
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
076b3b1 to
018082a
Compare
|
Sorry, I obviously lack context and this more curiosity than anything else but since @cyphar and I had discussion about this before. :) I still don't fully understand why there is the conceptual necessity to have something like It'll just know what's up and manage your permissions accordingly: |
The current implementation also does a bunch of stuff that is not needed at all when we are already in userns, but these stuff will be removed in "Step 2" of #1837 |
|
ping @opencontainers/runc-maintainers |
|
ping @opencontainers/runc-maintainers PTAL |
|
This logic is getting more and more complex everytime it changes |
|
The logic will be simplified in "Step 2" of #1837 |
| // isRootless() returns false, and unlikely to work, unless cgroups is configured. | ||
| // The dockerd in the container would need to specify `runc --rootless` explicitly in this case. | ||
| // (And user would need to launch dockerd with --rootless explicitly, probably) | ||
| u := os.Getenv("USER") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks quite fragile, would be possible to add here the check from the point 2.2 in your proposal or wouldn't it help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what is the point 2.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the second bullet point in step 2:
"Switch the cgroup manager to libcontainer.RootlessCgroupfs"
We should detect cgroup availability explicitly by probably trying mkdir /sys/fs/cgroup/foo/bar or something similar. I guess the overhead is negligible.
Or just remove libcontainer.RootlessCgroupfs manager and ignore all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to just close this PR and open the step2 PR now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me, but I think some runc maintainer should ensure it can get reviewed promptly. If that is not possible, this much simpler PR can be an intermediate workaround
|
Opened the "Step 2" PR. PTAL. I'm closing this "Step 1" PR. |

Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp
"Step 1" of #1837