Skip to content

conf: create separate peer group for container's root#4229

Merged
stgraber merged 3 commits intolxc:masterfrom
brauner:rootfs.propagate.shared
Nov 29, 2022
Merged

conf: create separate peer group for container's root#4229
stgraber merged 3 commits intolxc:masterfrom
brauner:rootfs.propagate.shared

Conversation

@brauner
Copy link
Member

@brauner brauner commented Nov 24, 2022

Finally, we turn the rootfs into a shared mount. Note, that this doesn't reestablish mount propagation with the hosts mount namespace. Instead we'll create a new peer group.

We're doing this because most workloads do rely on the rootfs being a shared mount. For example, systemd daemon like sytemd-udevd run in their own mount namespace. Their mount namespace has been made a dependent mount (MS_SLAVE) with the host rootfs as it's dominating mount. This means new mounts on the host propagate into the respective services.

This is broken if we leave the container's rootfs a dependent mount. In which case both the container's rootfs and the service's rootfs will be dependent mounts with the host's rootfs as their dominating mount. So if you were to mount over the rootfs from the host it would not just propagate into the container's mount namespace it would also propagate into the service.

@brauner brauner force-pushed the rootfs.propagate.shared branch 2 times, most recently from 68485b0 to 86cd616 Compare November 24, 2022 08:38
@brauner brauner requested a review from stgraber November 24, 2022 08:38
Finally, we turn the rootfs into a shared mount. Note, that this
doesn't reestablish mount propagation with the hosts mount
namespace. Instead we'll create a new peer group.

We're doing this because most workloads do rely on the rootfs being
a shared mount. For example, systemd daemon like sytemd-udevd run in
their own mount namespace. Their mount namespace has been made a
dependent mount (MS_SLAVE) with the host rootfs as it's dominating
mount. This means new mounts on the host propagate into the
respective services.

This is broken if we leave the container's rootfs a dependent mount.
In which case both the container's rootfs and the service's rootfs
will be dependent mounts with the host's rootfs as their dominating
mount. So if you were to mount over the rootfs from the host it
would not just propagate into the container's mount namespace it
would also propagate into the service. That's nonsense semantics for
nearly all relevant use-cases. Instead, establish the container's
rootfs as a separate peer group mirroring the behavior on the host.

Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
@brauner
Copy link
Member Author

brauner commented Nov 24, 2022

Jenkins: test this please

@brauner
Copy link
Member Author

brauner commented Nov 24, 2022

jenkins: test this please

1 similar comment
@brauner
Copy link
Member Author

brauner commented Nov 28, 2022

jenkins: test this please

@brauner brauner force-pushed the rootfs.propagate.shared branch from 86cd616 to 4f5e5cc Compare November 29, 2022 09:02
@stgraber
Copy link
Member

jenkins: test this please

1 similar comment
@stgraber
Copy link
Member

jenkins: test this please

Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
@brauner brauner force-pushed the rootfs.propagate.shared branch from 4f5e5cc to 01ae6d4 Compare November 29, 2022 19:59
Signed-off-by: Christian Brauner (Microsoft) <christian.brauner@ubuntu.com>
@lxc-jenkins
Copy link

Testsuite passed

@stgraber stgraber merged commit b16e4ea into lxc:master Nov 29, 2022
cmatsuoka added a commit to cmatsuoka/craft-parts that referenced this pull request Mar 9, 2023
Address shared mount issues affecting /dev mount in chroots. This
is a result of lxc/lxc#4229 (container rootfs
became a shared mount, meaning that unmounts propagates through the
shared group and original mounts are unmounted too).

See canonical/rockcraft#195 for details.

Signed-off-by: Claudio Matsuoka <claudio.matsuoka@canonical.com>
cmatsuoka added a commit to canonical/craft-parts that referenced this pull request Mar 9, 2023
Address shared mount issues affecting /dev mount in chroots. This
is a result of lxc/lxc#4229 (container rootfs
became a shared mount, meaning that unmounts propagates through the
shared group and original mounts are unmounted too).

See canonical/rockcraft#195 for details.

Signed-off-by: Claudio Matsuoka <claudio.matsuoka@canonical.com>
mihalicyn added a commit to mihalicyn/lxc that referenced this pull request Mar 31, 2023
Long story behind this. Many years ago, Stéphane Graber
discovered an issue with apparmor mount rules.

Since
lxc@7f2b132
commit ("apparmor: Update mount states handling") it was prohibited
to change mount propagation flags, just because adding rules which
allow mount propagation user inside the container gets an ability
to mount everything [1].

Now with modern systemd versions this problem become more critical than
before. For instance, ArchLinux containers fail to start without
nesting apparmor profile enabled (because nesting profile effectively
just allow all mounts). Of course, that's a security issue.

We've also enabled sharing on the container rootfs:
lxc#4229

Now for many workloads it's needed to change propagation flag to
private (see canonical/craft-parts#400).

Issue:
$ lxc-start -F archlinux-test

systemd 253-1-arch running in system mode (+PAM +AUDIT -SELINUX -APPARMOR -IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Arch Linux!

bpf-lsm: BPF LSM hook not enabled in the kernel, BPF LSM not supported
Failed to remount root directory as MS_SLAVE: Permission denied
(sd-gens) failed with exit status 1.
[!!!!!!] Failed to start up manager.
Exiting PID 1...

Workaround (unsafe):
$ lxc-start -s lxc.apparmor.allow_nesting=1 -s lxc.apparmor.profile=generated -F arch-test

John Johansen (Apparmor maintainer) and LXD team worked on fix [2].
It was merged to stable AppArmor 3.0 and 3.1 branches already.
There is no stable AppArmor version tag for that, but I think it will
be in the AppArmor version 3.0.10.

See also:
[1] https://bugs.launchpad.net/apparmor/+bug/1597017
[2] https://gitlab.com/apparmor/apparmor/-/merge_requests/333

Fixes: lxc#4280

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants