Skip to content

Conversation

@AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jul 2, 2018

Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp

"Step 1" of #1837

@AkihiroSuda
Copy link
Member Author

@cyphar PTAL

utils_linux.go Outdated
return system.GetParentNSeuid() != 0 || system.RunningInUserNS(), nil
// if system.RunningInUserNS() == true: regardless of whether the userns was created by a regular
// user or the root (in the initial namespace), we need to enable the rootless mode.
return os.Geteuid() != 0 || system.RunningInUserNS(), nil
Copy link
Member

@cyphar cyphar Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense -- because the parent namespace's euid is not actually use for permission checks. However, I think that this check is still not entirely correct: os.Geteuid() == 0 is too strict of a check -- I think we should be checking whether the current process has CAP_SYS_ADMIN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on second thought, I'm not sure we should always automatically enable rootless mode when running in userns. Maybe this has been breaking Docker-in-LXD usecase? (Although the corresponding commit is not used in Docker yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always ask @brauner. They do have some runc patches from memory.

Copy link
Member Author

@AkihiroSuda AkihiroSuda Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to just have return system.GetParentNSeuid() != 0, as in the initial revision of the code?
#1688 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@AkihiroSuda AkihiroSuda Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looks like my commit in #1688 breaks Docker in LXD (sorry):

$ lxc launch ubuntu:18.04 foo -c security.nesting=true
$ lxc file push /usr/local/sbin/runc foo/usr/bin/docker-runc
...
foo# docker run -it --rm busybox
docker: Error response from daemon: OCI runtime create failed: cannot specify gid= mount options for unmapped gid in rootless containers: unknown.

Could you suggest logic for isRootless?

For LXD compatibility, we could remove userns detection from isRootless(), but then we will break img/buildkit/buildah/podman probably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, lemme think. I'll get back to you. I just need to finish something else first. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need to know some stuff:

  • Do you have a fork()/clone()+ exec() model whereby the child process is the container and runs in a new set of namespaces or something else?
  • Where is IsRootless() supposed to be called? In the parent namespaces or the child namespaces (presupposing a particular model in this question so feel free to reformulate as approriate)?
  • What decisions are based on IsRootless()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a fork()/clone()+ exec() model whereby the child process is the container and runs in a new set of namespaces or something else?

img and RootlessKit (used by BuildKit, containerd PR and Docker POC) do this.

I think Buildah and Podman do similar stuff but haven't looked into their code deeply yet.

In addition to user namespace, we sets CLONE_NEWNS and optionally CLONE_NEWNET.

However, unlike Docker-in-LXD, we don't do chroot nor pivot_root.
Also, we keep environment variables such as $USER.
i. e. we we execute runc as UID=0 but $USER="penguin" in userns+mntns(+netns).

Where is IsRootless() supposed to be called? In the parent namespaces or the child namespaces (presupposing a particular model in this question so feel free to reformulate as approriate)?

Child namespaces, from POV of img.

But note that executing runc in the initial namespace (as non-root UID) needs to be also allowed.

What decisions are based on IsRootless()?

#1837

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just saw your answers. Thanks.

However, unlike Docker-in-LXD, we don't do chroot nor pivot_root.

Out of curiosity, only for the fully unprivileged case?

@AkihiroSuda AkihiroSuda changed the title rootless: simplify isRootless() rootless: fix Docker-in-LXD regression Jul 3, 2018
@AkihiroSuda
Copy link
Member Author

Updated PR, please refer to step 1 of #1837

// Corner cases:
// * When runc is executed in userns via Docker-in-rootless-Docker ("rootless dind"), as the root in the contaienr, isRootless() returns false, and unlikely to work. The dockerd in the container would need to specify runc --rootless explicitly in this case. (And user would need to launch dockerd with --rootless explicitly, probably)
// * When runc is executed in "rootless dind", as a non-root in the contaienr, isRootless() returns true and likely to work.
u := os.Getenv("USER")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The USER environment variable seems to be set by the login process (see man login(1). Garden runs as server (i.e. it is not logging in) and therefore USER is not set when runc is invoked which makes isRootless return false regardless of whether we want rootless or not (os.Getenv returns an empty string if the var is not set)

In order to support both cases via checking the USER value only if it is set and fallback to the effective user ID if not, i.e. something like

func isRootless(context *cli.Context) (bool, error) {
  if context != nil {
      ...
  }

  user, userSet := os.LookupEnv("USER")
  if userSet {
    return user != "" && user != "root", nil
  }

  return system.GetParentNSeuid() != 0 || system.RunningInUserNS(), nil
}

What do you think?

cc @ostenbom @teddyking @julz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danail-branekov this looks like a safer version, I think it makes sense to fallback to the euid when USER is not set

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but it doesn't fix Docker-in-LXD regression, because Docker-in-LXD executes runc without $USER, as euid=0 in userns, but it expects isRootless() to return false...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garden runs as server

How do you run rootless garden as a server? via systemd --user?

systemd --user seems to set $USER, at least on Ubuntu 18.04?

suda@suda-ws01:~$ systemctl show-environment
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
suda@suda-ws01:~$ systemctl --user show-environment
HOME=/home/suda
LANG=en_US.UTF-8
LOGNAME=suda
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
SHELL=/bin/bash
USER=suda
XDG_RUNTIME_DIR=/run/user/1001

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run garden with monit, which doesn't set USER. We also run the server as a user with maximum user id that does not exist in /etc/passwd/ or elsewhere, so it would be best if we could run runc in rootless mode without setting $USER. On the other hand we can always run with the --rootless flag, so I guess this isn't too big of a problem for us.

To clarify, in the issue you said When runc is executed in userns via Docker-in-LXD, isRootless() returns false, because LXD would set $USER to "root". - I don't understand in which case $USER is being set in Docker-in-LXD here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because LXD would set $USER to "root”

sorry this was my wrong assumption

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because LXD would set $USER to "root”
sorry this was my wrong assumption

So if Docker-in-LXD does NOT set $USER at all (i.e. $USER is either NOT on the environment, or set to en empty string when runc is run), then your PR would make isRootless return false, i.e. it would run rootful. Is that what you want to achieve? Is that what fixes the regression?

If yes, could you please update the comment in the code to something like

When runc is executed in userns via Docker-in-LXD, isRootless() returns false, because LXD would not set $USER.

so that we do not get confused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. also added EUID check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LXD does not set anything at all unless explicitly instructed via environment variables. As soon as the basic container setup is done everything is in the hands of the binary that is executed. In the regular case for LXD and LXC this means a full-blown init system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. also added EUID check

Oh, great! The EUID check makes our CF Garden acceptance tests quite happy 👍

Thanks!

@AkihiroSuda
Copy link
Member Author

I also made more precise version ("Step 2" of #1837), but very huge change: AkihiroSuda@dc7a32a

So I suggest merging this PR (#1833) first if it doesn't break CloudFoundry.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
@brauner
Copy link
Contributor

brauner commented Jul 5, 2018

Sorry, I obviously lack context and this more curiosity than anything else but since @cyphar and I had discussion about this before. :) I still don't fully understand why there is the conceptual necessity to have something like IsRootless() in a container runtime. @lxc has support for fully unprivileged containers for a long time (which is what you cover with the term "rootless") but it is not a different concept. It is just a specific point in a permissions-space that is managed continuously. So if you instruct @lxc with:

lxc.idmap = u 1000 1000 1
lxc.idmap = g 1000 1000 1

lxc.init.uid = 1000
lxc.init.gid = 1000

It'll just know what's up and manage your permissions accordingly:
asciicast

@AkihiroSuda
Copy link
Member Author

I still don't fully understand why there is the conceptual necessity to have something like IsRootless() in a container runtime.

  • When IsRootless() == true, runc ignores cgroup permission errors, because generally we don't have the write access to cgroups. (And we don't use lxcfs or pam_cgfs.)

  • When IsRootless() == true, runc uses $XDG_RUNTIME_DIR instead of /run.

The current implementation also does a bunch of stuff that is not needed at all when we are already in userns, but these stuff will be removed in "Step 2" of #1837
(POC: https://github.com/AkihiroSuda/runc/commits/decompose-rootless)

@AkihiroSuda
Copy link
Member Author

ping @opencontainers/runc-maintainers

@dqminh
Copy link
Contributor

dqminh commented Jul 28, 2018

LGTM

Approved with PullApprove

@kolyshkin
Copy link
Contributor

ping @opencontainers/runc-maintainers PTAL

@crosbymichael
Copy link
Member

This logic is getting more and more complex everytime it changes

@AkihiroSuda
Copy link
Member Author

The logic will be simplified in "Step 2" of #1837
AkihiroSuda@dc7a32a

// isRootless() returns false, and unlikely to work, unless cgroups is configured.
// The dockerd in the container would need to specify `runc --rootless` explicitly in this case.
// (And user would need to launch dockerd with --rootless explicitly, probably)
u := os.Getenv("USER")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks quite fragile, would be possible to add here the check from the point 2.2 in your proposal or wouldn't it help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what is the point 2.2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second bullet point in step 2:

"Switch the cgroup manager to libcontainer.RootlessCgroupfs"

We should detect cgroup availability explicitly by probably trying mkdir /sys/fs/cgroup/foo/bar or something similar. I guess the overhead is negligible.
Or just remove libcontainer.RootlessCgroupfs manager and ignore all errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to just close this PR and open the step2 PR now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me, but I think some runc maintainer should ensure it can get reviewed promptly. If that is not possible, this much simpler PR can be an intermediate workaround

@AkihiroSuda
Copy link
Member Author

Opened the "Step 2" PR. PTAL.
#1862

I'm closing this "Step 1" PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants