Skip to content

[20.10 backport] Use v2 capabilities in layer archives#42352

Merged
cpuguy83 merged 1 commit intomoby:20.10from
AkihiroSuda:cherrypick-41724
Jun 1, 2021
Merged

[20.10 backport] Use v2 capabilities in layer archives#42352
cpuguy83 merged 1 commit intomoby:20.10from
AkihiroSuda:cherrypick-41724

Conversation

@AkihiroSuda
Copy link
Copy Markdown
Member

Cherry-pick #41724


- What I did

Fixes #41723

When building images in a user-namespaced container, v3 capabilities are
stored including the root UID of the creator of the user-namespace.

This UID does not make sense outside the build environment however. If
the image is run in a non-user-namespaced runtime, or if a user-namespaced
runtime uses a different UID, the capabilities requested by the effective
bit will not be honoured by execve(2) due to this mismatch.

Instead, we convert v3 capabilities to v2, dropping the root UID on the
fly.

- How I did it

Patched ReadSecurityXattrToTarHeader() to automatically convert v3 capabilities to v2 by switching the version identifier and dropping the root UID data.

- How to verify it

This reproducer can be used.

Compared with the output in issue #41723 this produces:

    default: + docker run --rm capabilities-built-with-no-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000400
    default: CapEff:	0000000000000400
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000
    default: + docker run --rm capabilities-built-with-userns:1.0 /bin/bash -c '(/usr/local/bin/sleep-test infinity & ); sleep 1; grep Cap /proc/$(pgrep sleep-test)/status'
    default: CapInh:	00000000a80425fb
    default: CapPrm:	0000000000000400
    default: CapEff:	0000000000000400
    default: CapBnd:	00000000a80425fb
    default: CapAmb:	0000000000000000

So we see that in the 2nd case also execve(2) honoured the effective bit.

- Description for the changelog

Capabilities in image layers are stored in v2 format even when built inside a non-root user-namespace.

- A picture of a cute animal (not mandatory but encouraged)

image

When building images in a user-namespaced container, v3 capabilities are
stored including the root UID of the creator of the user-namespace.

This UID does not make sense outside the build environment however. If
the image is run in a non-user-namespaced runtime, or if a user-namespaced
runtime uses a different UID, the capabilities requested by the effective
bit will not be honoured by `execve(2)` due to this mismatch.

Instead, we convert v3 capabilities to v2, dropping the root UID on the
fly.

Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com>
(cherry picked from commit 95eb490)
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Copy Markdown
Member Author

cc @EricMountain @tonistiigi

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mrueg
Copy link
Copy Markdown
Contributor

mrueg commented May 31, 2021

This seems to be the last open item for the v20.10.7 milestone. Since v20.10.7 includes new version of runc that has a security fix, I was wondering if there is anything we can do to help out here to get the release out?

Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants