Skip to content

Use v2 capabilities in layer archives#41724

Merged
AkihiroSuda merged 1 commit intomoby:masterfrom
EricMountain:dishonoured-capabilities
Jan 13, 2021
Merged

Use v2 capabilities in layer archives#41724
AkihiroSuda merged 1 commit intomoby:masterfrom
EricMountain:dishonoured-capabilities

Conversation

@EricMountain
Copy link
Copy Markdown
Contributor

- What I did

Fixes #41723

When building images in a user-namespaced container, v3 capabilities are
stored including the root UID of the creator of the user-namespace.

This UID does not make sense outside the build environment however. If
the image is run in a non-user-namespaced runtime, or if a user-namespaced
runtime uses a different UID, the capabilities requested by the effective
bit will not be honoured by execve(2) due to this mismatch.

Instead, we convert v3 capabilities to v2, dropping the root UID on the
fly.

- How I did it

Patched ReadSecurityXattrToTarHeader() to automatically convert v3 capabilities to v2 by switching the version identifier and dropping the root UID data.

- How to verify it

This reproducer can be used.

Compared with the output in issue #41723 this produces:

    default: + docker run --rm capabilities-built-with-no-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000400
    default: CapEff:	0000000000000400
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000
    default: + docker run --rm capabilities-built-with-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000400
    default: CapEff:	0000000000000400
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000

So we see that in the 2nd case also execve(2) honoured the effective bit.

- Description for the changelog

Capabilities in image layers are stored in v2 format even when built inside a non-root user-namespace.

- A picture of a cute animal (not mandatory but encouraged)

image

@justincormack
Copy link
Copy Markdown
Contributor

Hi, I would rather this went via buildkit first as that is where we are doing the work in build. We might port it to the legacy builder, but this is not where we are largely doing build development work.

@EricMountain
Copy link
Copy Markdown
Contributor Author

EricMountain commented Nov 27, 2020

@justincormack , thanks I'll take a look. Would you be able to point me to where in buildkit this should go?

Trying to get to grips with the buildkit code: IIUC buildkit leverages containerd to write out image layers, correct? Would this be the right place to make the equivalent change?

@EricMountain
Copy link
Copy Markdown
Contributor Author

@justincormack I've come to realise BuildKit is using the same block of code to build the tarballs, so this PR is also the fix for BuildKit. Or am I missing something?

@thaJeztah
Copy link
Copy Markdown
Member

@tonistiigi PTAL

@AkihiroSuda
Copy link
Copy Markdown
Member

Could you add integration tests?

@thaJeztah
Copy link
Copy Markdown
Member

@dmcgowan @estesp would containerd's archive package also need similar changes? https://github.com/containerd/containerd/tree/master/archive

@EricMountain
Copy link
Copy Markdown
Contributor Author

@AkihiroSuda is there an image I can use in the integration tests that would support the getcap -n option? I checked debian:buster, one of the frozen images already used in integration tests, however it doesn't support -n.

I might be able to get away without this, however it would be a lot uglier: basically setcap in a build, then tar the file in a run (hoping debian:buster tar knows about serialising security.capabibilities) and check the version in the serialised data.

@thaJeztah
Copy link
Copy Markdown
Member

Alpine's libcap looks to have it, but don't think we include it in the frozen images.

@EricMountain
Copy link
Copy Markdown
Contributor Author

EricMountain commented Dec 1, 2020

@thaJeztah in integration tests, are we supposed to use only the frozen images, or is it considered OK to use e.g. Alpine for cases such as this? From what I've seen, integration tests only use scratch or frozen images, so my guess was I'm not supposed to use anything else.

And if I should only use frozen images, would it be acceptable for me to add Alpine to the list?

@EricMountain
Copy link
Copy Markdown
Contributor Author

Integration test implemented in #41740. Switching this PR to draft for now as it will require an update to the test once that has merged.

@EricMountain EricMountain marked this pull request as draft December 2, 2020 14:12
@EricMountain EricMountain force-pushed the dishonoured-capabilities branch from 72bf86a to 01d3a17 Compare December 22, 2020 14:41
When building images in a user-namespaced container, v3 capabilities are
stored including the root UID of the creator of the user-namespace.

This UID does not make sense outside the build environment however. If
the image is run in a non-user-namespaced runtime, or if a user-namespaced
runtime uses a different UID, the capabilities requested by the effective
bit will not be honoured by `execve(2)` due to this mismatch.

Instead, we convert v3 capabilities to v2, dropping the root UID on the
fly.

Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>
@EricMountain EricMountain force-pushed the dishonoured-capabilities branch from 01d3a17 to 95eb490 Compare December 23, 2020 13:17
@EricMountain EricMountain marked this pull request as ready for review December 23, 2020 13:33
@EricMountain
Copy link
Copy Markdown
Contributor Author

Test in #41740 merged, the fix is now ready for review.

@EricMountain
Copy link
Copy Markdown
Contributor Author

Hi @thaJeztah , @AkihiroSuda might I get a review now the integration test PR has merged? Thanks!

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ping @AkihiroSuda @tonistiigi PTAL

@thaJeztah
Copy link
Copy Markdown
Member

do we need similar changes in containerd as well, @AkihiroSuda ?

@AkihiroSuda
Copy link
Copy Markdown
Member

do we need similar changes in containerd as well, @AkihiroSuda ?

Better to have, but not urgent

@AkihiroSuda AkihiroSuda merged commit 327daef into moby:master Jan 13, 2021
@thaJeztah
Copy link
Copy Markdown
Member

do we need similar changes in containerd as well, @AkihiroSuda ?

Better to have, but not urgent

@EricMountain perhaps you're interested in contributing to containerd as well? (Or to open a tracking issue, describing the problem)?

@thaJeztah thaJeztah added this to the 20.10.3 milestone Jan 13, 2021
@EricMountain
Copy link
Copy Markdown
Contributor Author

@EricMountain perhaps you're interested in contributing to containerd as well? (Or to open a tracking issue, describing the problem)?

@thaJeztah sure, I'll take a look.

@EricMountain EricMountain deleted the dishonoured-capabilities branch January 13, 2021 10:36
@thaJeztah thaJeztah modified the milestones: 20.10.3, 21.xx Feb 2, 2021
@tonistiigi
Copy link
Copy Markdown
Member

@AkihiroSuda Do we need to cherry-pick this? Is this breaking existing users? Should we strip to v3 part on extraction if it is node specific.

@thaJeztah
Copy link
Copy Markdown
Member

@AkihiroSuda ^ could you have a look?

@AkihiroSuda
Copy link
Copy Markdown
Member

Seems safe to cherry-pick, but if we can have v21.xx soon, I guess cherry-picking is not necessary.

@AkihiroSuda
Copy link
Copy Markdown
Member

Cherry-picked as #42352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Capabilities not honoured when image was built under userns

5 participants