Skip to content

Conversation

@AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jan 15, 2018

Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp

The changes are mostly for setting up default paths under $HOME and providing rootless OCI spec generator.
Substantially no change on the daemon.


Rootless mode (Experimental)

Requirements:

  • runc (May 30, 2018) or later
  • Some distros such as Debian (excluding Ubuntu) and Arch Linux require echo 1 > /proc/sys/kernel/unprivileged_userns_clone
  • newuidmap and newgidmap need to be installed on the host. These commands are provided by the uidmap package on most distros.
  • /etc/subuid and /etc/subgid should contain >= 65536 sub-IDs. e.g. penguin:231072:65536.
  • To run in a Docker container with non-root USER, docker run --privileged is still required. See also Jessie's blog: https://blog.jessfraz.com/post/building-container-images-securely-on-kubernetes/

Daemon-side remarks:

  • The data dir will be set to /home/$USER/.local/share/containerd by default.
  • The address will be set to /run/user/$UID/containerd/containerd.sock by default.
  • CRI plugin is not supported yet.
  • overlayfs snapshotter is not supported except on Ubuntu-flavored kernel. native snapshotter should work on non-Ubuntu kernel.

Go client library remarks:

  • oci.WithRootless() removes Cgroups configuration. However, you can still set Cgroups configurations after calling oci.WithRootless(), if the permission bits are preconfigured on cgroup filesystems.

Network namespace remarks:

  • To set up NAT across the host and the containers, you need to use either a TAP with a usermode network stack (slirp) or a SUID helper. See RootlessKit.

Usage

Terminal 1:

$ unshare -U -m
unshared$ echo $$ > /tmp/pid

Unsharing mountns (and userns) is required for mounting filesystems without real root privileges.

Terminal 2:

$ id -u
1001
$ grep $(whoami) /etc/subuid
penguin:231072:65536
$ grep $(whoami) /etc/subgid
penguin:231072:65536
$ newuidmap $(cat /tmp/pid) 0 1001 1 1 231072 65536
$ newgidmap $(cat /tmp/pid) 0 1001 1 1 231072 65536

Terminal 1:

unshared# containerd

Terminal 2:

$ nsenter -U -m -t $(cat /tmp/pid)
unshared# ctr -a /run/user/1001/containerd/containerd.sock images pull docker.io/library/debian:latest
unshared# ctr -a /run/user/1001/containerd/containerd.sock run -t --rm --rootless --net-host docker.io/library/debian:latest foo
foo#

Usage (RootlessKit)

RootlessKit can be used for executing unshare and newuidmap/newgidmap at once.
RootlessKit also supports unsharing the network namespace with usermode NAT such as VPNKit
and libvdeplug_slirp.

Terminal 1:

The following example is tested with RootlessKit 20b0fc24b305b031a61ef1a1ca456aadafaf5e77.

$ rootlesskit --state-dir=/tmp/foo --net=vpnkit --copy-up=/etc containerd

Terminal 2:

$ nsenter -U -m -n -t $(cat /tmp/foo/child_pid)
unshared# ctr -a /run/user/1001/containerd/containerd.sock images pull docker.io/library/debian:latest
unshared# ctr -a /run/user/1001/containerd/containerd.sock run -t --rm --rootless --net-host docker.io/library/debian:latest foo
foo#

@codecov-io
Copy link

codecov-io commented Jan 15, 2018

Codecov Report

Merging #2006 into master will increase coverage by 0.17%.
The diff coverage is 69.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2006      +/-   ##
==========================================
+ Coverage      45%   45.18%   +0.17%     
==========================================
  Files          92       95       +3     
  Lines        9412     9481      +69     
==========================================
+ Hits         4236     4284      +48     
- Misses       4493     4512      +19     
- Partials      683      685       +2
Flag Coverage Δ
#linux 49.44% <73.84%> (+0.21%) ⬆️
#windows 41.28% <0%> (-0.03%) ⬇️
Impacted Files Coverage Δ
cio/io.go 33.33% <ø> (ø) ⬆️
oci/spec_opts_linux.go 0% <0%> (ø)
oci/spec_opts_nolinux.go 0% <0%> (ø)
rootless/specconv/specconv_linux.go 77.41% <77.41%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39b6ba8...1fb7231. Read the comment docs.

func WithRootless(_ context.Context, _ Client, _ *containers.Container, s *specs.Spec) error {
specconv.ToRootless(s)
// without removing CgroupsPath, runc fails:
// "process_linux.go:279: applying cgroup configuration for process caused \"mkdir /sys/fs/cgroup/cpuset/default: permission denied\""
Copy link
Member Author

@AkihiroSuda AkihiroSuda Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael
Copy link
Member

Lets wait for the other dependent PRs to be merged first before force vendoring code that does not exist in mainline projects. It's just temping errors from merging if people don't pay attention.

Only open PRs when they can be reviewed and are actionable or else they just sit and add more to our PR list when we cannot do anything with it.

@stevvooe
Copy link
Member

@AkihiroSuda Is there an issue for this capturing the main blockers for rootless? I think there may be some others, such as how snapshotters are handled and security around that.

@AkihiroSuda
Copy link
Member Author

Is there an issue for this capturing the main blockers for rootless?

The only blocker I see for this PR is opencontainers/runc#1688 , which is required for allowing unprivileged users to mount filesystems using mountns+userns.

I think we can reopen and merge this PR when opencontainers/runc#1688 gets merged.

Other issues / PRs are only required for advanced usecase such as running apt.

I think there may be some others, such as how snapshotters are handled and security around that.

Overlay snapshotter works in unprivileged userns in Ubuntu, but not in other distros: (Ubuntu kernel patch)

So I'd suggest using the naive snapshotter for rootless mode.

@AkihiroSuda
Copy link
Member Author

Reopened PR. (Thank you @crosbymichael for merging opencontainers/runc#1688)

@AkihiroSuda
Copy link
Member Author

CI failure will be fixed via opencontainers/runc#1808

@crosbymichael
Copy link
Member

@AkihiroSuda merged the fix in 1808

@AkihiroSuda AkihiroSuda force-pushed the rootless branch 6 times, most recently from abc411e to 079266c Compare May 31, 2018 07:34
@AkihiroSuda
Copy link
Member Author

weird CI failure, but seems unrelated

$ script/validate/vendor
2018/05/31 07:38:10 Collecting initial packages
2018/05/31 07:38:10 Download dependencies
2018/05/31 07:38:15 Starting whole vndr cycle because no package specified
github.com/containerd/cri: Err: exit status 128, out: fatal: reference is not a tree: 0d01163f9cbe4f353ca1b844230eeab37cd04f35

@AkihiroSuda
Copy link
Member Author

rebased

Copy link
Member

@dmcgowan dmcgowan Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we run this in Init() to prevent it from needing to be checked every time. My understanding is this won't change during the lifetime of the daemon.

@AkihiroSuda
Copy link
Member Author

addressed @dmcgowan 's comment

@AkihiroSuda AkihiroSuda force-pushed the rootless branch 5 times, most recently from 10dabc1 to 1cf60bc Compare July 2, 2018 07:30
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Jul 2, 2018

Refactored as in moby/buildkit#479 and in moby/buildkit#486
(EDIT: added moby/buildkit#486)

@AkihiroSuda AkihiroSuda force-pushed the rootless branch 2 times, most recently from 3d98d8d to f62396a Compare July 2, 2018 10:52
@AkihiroSuda AkihiroSuda force-pushed the rootless branch 2 times, most recently from 7a0311a to 86b4941 Compare July 4, 2018 10:19
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
@AkihiroSuda
Copy link
Member Author

will reopen and rebase when opencontainers/runc#1862 gets merged

@Callisto13
Copy link
Contributor

Callisto13 commented Sep 3, 2018

Hey @AkihiroSuda!
We (Garden/CloudFoundry) are wondering if you would consider modifying the check in cmd/containerd/command/main.go so that when certain plugins are disabled it does not insist on starting the server from within a user ns where a non-priv user is mapped to root? I am assuming that you enforced this requirement to perform rootfs mounts which we don't rely on containerd for (please correct me if I am wrong, we are disabling a lot of plugins not just the snapshotter etc). We implemented this very naively (deleted that check) in a recent spike and were able to pass all our container lifecycle tests by running the containerd server and making client calls as a non-root non-mapped-in-ns user. Please let me know your thoughts :)

}
app.Action = func(context *cli.Context) error {
if runtime.GOOS == "linux" && os.Geteuid() != 0 {
return errors.New("rootless mode requires the daemon to be executed as the mapped root in a user namespace")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Callisto13 do you mean removing this check?

Copy link
Contributor

@Callisto13 Callisto13 Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the one.
I mean we removed it because we didn't need to enforce it for our rootless case. It may be necessary for other use cases, but for us we didn't need to start the server in a new ns, or use oci.WithRootless() in our client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll rebase and remove the check from this PR when
opencontainers/runc#1862 gets merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @AkihiroSuda
just sharing with you all the hacky changes we made to your pr branch. i've added some notes to explain things but ping me with any questions. using these changes based on your pr, garden was able to use containerd as a non-priv user (for basic create/exec/delete lifecycle, other things may yet fall out).
masters-of-cats@242e830
:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will look into 👍

(off-topic: Is your ContainerCamp slide deck available online?)

Copy link
Contributor

@Callisto13 Callisto13 Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the slides should be made available by the organisers when they publish the videos... i think.
or i can just share the slides (they are in google drive) using the email on your profile

edit: or i will just make them public. nothing secret or special about them anyway :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, could you send me the link to the slides if you don't mind 🙏

@AkihiroSuda AkihiroSuda mentioned this pull request Nov 6, 2018
@AkihiroSuda
Copy link
Member Author

refreshed PR as #2766

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants