Skip to content

c8d: Fix image commit with userns mapping#46938

Closed
rumpl wants to merge 1 commit intomoby:masterfrom
rumpl:fix-userns-capabilities
Closed

c8d: Fix image commit with userns mapping#46938
rumpl wants to merge 1 commit intomoby:masterfrom
rumpl:fix-userns-capabilities

Conversation

@rumpl
Copy link
Member

@rumpl rumpl commented Dec 14, 2023

- What I did

Fixed image commit with userns mapping

The remapping in the commit code was in the wrong place, we would create a diff and then remap the snapshot, but the descriptor created in "CreateDiff" was still pointing to the old snapshot, we now remap the snapshot before creating a diff. Also make sure we don't lose any capabilities, they used to be lost after the chown.

Somewhat related to #41724

- How I did it

  • did the remap before creating a diff on commit
  • save the xattrs before chowning the file and resetting them after

- How to verify it

The test TestBuildUserNamespaceValidateCapabilitiesAreV2 should pass now

$ make DOCKER_GRAPHDRIVER=overlayfs TEST_FILTER=TestBuildUserNamespaceValidateCapabilitiesAreV2 TEST_INTEGRATION_USE_SNAPSHOTTER=1 TEST_IGNORE_CGROUP_CHECK=1 test-integration

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@rumpl rumpl added area/images Image Distribution kind/bugfix PR's that fix bugs containerd-integration Issues and PRs related to containerd integration labels Dec 14, 2023
@rumpl rumpl added this to the 25.0.0 milestone Dec 14, 2023
@rumpl rumpl self-assigned this Dec 14, 2023
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -334 to -336
if cerrdefs.IsAlreadyExists(err) {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was removed in the new location, that's correct?

Comment on lines +117 to +120
if !errors.Is(err, syscall.EOPNOTSUPP) {
return err
}
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we must break here, or continue with the next attribute; is it possible some extended attributes are not supported, and we would now be skipping the remaining ones?

I also recall this PR from @corhere where we were silently swallowing errors, and wondering if discarding attributes is the right thing to do;

Also looking at this existing code, which is not ignoring these;

func copyXattr(srcPath, dstPath, attr string) error {
data, err := system.Lgetxattr(srcPath, attr)
if err != nil {
if errors.Is(err, syscall.EOPNOTSUPP) {
// Task failed successfully: there is no xattr to copy
// if the source filesystem doesn't support xattrs.
return nil
}
return err
}
if data != nil {
if err := system.Lsetxattr(dstPath, attr, data, 0); err != nil {
return err
}
}
return nil
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EOPNOTSUPP errors is returned if Extended attributes are not supported by the filesystem, or are disabled. so if one fails, all will fail, that's why I break. The copyXattr is ingoring it, it returns nil if it can't copy the xattrs.

Regarding Cory's PR, Cory is right, we should fail. But his PR doesn't cover all the cases, as you can see with the copyXattr case you just linked. If I fail here, do we want an escape hatch like the one in Cory's PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EOPNOTSUPP errors is returned if Extended attributes are not supported by the filesystem, or are disabled. so if one fails, all will fail,

Gotcha! It wasn't immediately clear to me if this would "fail for some (but not all)" or always for all.

If I fail here, do we want an escape hatch like the one in Cory's PR?

As the containerd image store is "new territory", and assuming this would only happen in a "broken" setup (unsupported filesystem, so not "because user-NS"), I suggest that we should not provide an escape-hatch out of the box. Worst case, we add one later if there's many complaints, but (as that escape hatch communicates through its ridiculous i_want_broken_containers name) we really shouldn't.

w.r.t. "xattrs are supported", I'm wondering if this is also information that snapshotters should provide and/or if they should somehow detect this and mark themselves as "not supported". I guess some of that may be more complicated with containerd being less opinionated (storage location and such, which may mean "any location could be used?")

Based on the above;

  • I think we should probably bail out if it's not supported (return an error that extended attributes are not supported).
  • If we do want to "continue", at least it should be logged, and we should have a comment in code describing the intent "if the first one fails.. all of them will fail; it's not supported). So that it's clear that the intent is not to "skip some".

xattrs := make(map[string][]byte, len(xattrKeys))

for _, xattr := range xattrKeys {
data, err := sysx.LGetxattr(path, xattr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does continuity's sysx.LGetxattr() differ from our pkg/system.LGetxattr() ? I haven't dug into it, but curious why one is used instead of the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are the same, I'm using sysx here because it's the only one that has LListxattr so I thought I should use the same package for everything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using sysx here because it's the only one that has LListxattr so I thought I should use the same package for everything

Ah! So I noticed that this is the first use of this package in our code, and everything else uses pkg/system. We should look at that;

  • consider if "continuity" is the right place for this (I know it's become a bit of a grab-bag of things)
  • check what the origin of the "continuity" code is (was it forked from our pkg/system or vice-versa?)
  • consider if we should move that functionality of pkg/system elsewhere (perhaps a module in https://github.com/moby/sys)
  • in either case, consolidate those packages if they really overlap 90%, deprecate one (adding temporary aliases) and remove after.

From a quick look (see the copyXattr function I linked above), this appears to be because we only copy specific attributes;

if err := copyXattr(srcPath, dstPath, "security.capability"); err != nil {

return copyXattr(srcPath, dstPath, "trusted.overlay.opaque")

☝️ that leaves me the question; must we copy all attributes, or is this one of those cases where we should only copy specific ones?

For additional fun and giggles; https://github.com/tonistiigi/fsutil has a copyXattrs (which uses sysx) https://github.com/tonistiigi/fsutil/blob/797bd683d5a28a87bea5421f51db1e112ac2edd4/copy/copy_nowindows.go#L15C1-L32.

Comment on lines +136 to +138
if !errors.Is(err, syscall.EOPNOTSUPP) {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically the error return should be here (not when reading the attributes), but in practice they should be symmetrical (you cannot either read and write them), but if we decide to produce an error, we should do for both. (similar to https://github.com/tonistiigi/fsutil/blob/797bd683d5a28a87bea5421f51db1e112ac2edd4/copy/copy_nowindows.go#L15C1-L32)

Copy link
Member

@laurazard laurazard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

Don't merge yet; some things we discussed on Slack;

  • add a comment describing "why" we're re-applying the xattrs (and why they were removed in the first place)
  • consider a hard failure if we fail to apply the xattrs (instead of carrying on)

Also to be looked at; if we should always apply xattrs unfiltered or if there's only specific ones we can safely re-apply after UID/GID changed (but that may need more looking)

@thaJeztah
Copy link
Member

@rumpl if you have some time, could you look at the above? (perhaps we can get this one moving then)

The remapping in the commit code was in the wrong place, we would create
a diff and then remap the snapshot, but the descriptor created in
"CreateDiff" was still pointing to the old snapshot, we now remap the
snapshot before creating a diff. Also make sure we don't lose any
capabilities, they used to be lost after the chown.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@rumpl rumpl force-pushed the fix-userns-capabilities branch from 38d267b to cf5a3bc Compare December 20, 2023 16:31
@rumpl
Copy link
Member Author

rumpl commented Dec 20, 2023

@rumpl if you have some time, could you look at the above? (perhaps we can get this one moving then)

I just pushed an update but asked @dmcgowan to take a closer look, this code works but creates too many layers...

@dmcgowan
Copy link
Member

I just pushed an update but asked @dmcgowan to take a closer look, this code works but creates too many layers...

My main concern is that the CreateDiff shouldn't have the side effect of altering the provided snapshot. The snapshot name gets provided as an active snapshot with the id mapping, then after return the snapshot name is a committed snapshot with the id mapping removed. It shouldn't be necessary to commit any of the snapshots since the end result just needs to be the diff and the diff logic just needs the mounts from an active snapshot.

Some solutions I can think of here ranging from less effort to more effort

  1. Just apply the id unmap directly in the passed in active snapshot. This is not ideal from a functional perspective because it alters the snapshot, but if the current logic works and the container snapshot will not be re-used, then it is the simplest change with similar end result to the current code.
  2. Create a new active snapshot from the same parent as the passed in snapshot then do an fs copy and id unmap from current snapshot to the new one. (This is similar to the Snapshot function considered on the Snapshotter which takes in an active snapshot and returns an active snapshot with the same filesystem at the point Snapshot. This allows changes to be made on an active snapshot while preserving the original state without adding a layer. It is ironic Snapshot was never added but we haven't had a strong need for it since normally diff is enough).
  3. Update the differ or add a differ implementation which is able to consider the id mapping when calculating or applying the diff. The caller can pass in the id mapping as diff options and the diff service could make sure the options are appropriately handled. This is probably the best long term solution and goes along with the same reasoning as to why Snapshot from 2 was never added.

@rumpl
Copy link
Member Author

rumpl commented Jan 2, 2024

Closing this one in favor of #46972

@rumpl rumpl closed this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/images Image Distribution containerd-integration Issues and PRs related to containerd integration kind/bugfix PR's that fix bugs

Projects

Development

Successfully merging this pull request may close these issues.

5 participants