Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

virtcontainers: set private propagation in rootfs#980

Merged
bergwolf merged 1 commit intokata-containers:masterfrom
devimc:topic/left_mount_points
Jan 21, 2019
Merged

virtcontainers: set private propagation in rootfs#980
bergwolf merged 1 commit intokata-containers:masterfrom
devimc:topic/left_mount_points

Conversation

@devimc
Copy link
Copy Markdown

@devimc devimc commented Dec 5, 2018

When overlay is used as storage driver, kata runtime creates a new bind mount
point to the merged directory, that way this directory can be shared with the
VM through 9p. By default the mount propagation is shared, that means mount
events are propagated, but umount events not, to deal with this problem and to
avoid left mount points in the host once container finishes, the mount
propagation of bind mounts should be set to private.

Depends-on: github.com/kata-containers/tests#971

fixes #794

Signed-off-by: Julio Montes julio.montes@intel.com

@devimc
Copy link
Copy Markdown
Author

devimc commented Dec 5, 2018

/test

devimc pushed a commit to devimc/kata-tests that referenced this pull request Dec 5, 2018
Unskip docker cp test to check mount points are not left
after running docker cp.

Depends-on: github.com/kata-containers/runtime#980

fixes kata-containers#970

Signed-off-by: Julio Montes <julio.montes@intel.com>
return fmt.Errorf("Could not bind mount %v to %v: %v", absSource, destination, err)
}

if err := syscall.Mount("none", destination, "", syscall.MS_PRIVATE, ""); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to really understand how this solves the problem, and how is the PRIVATE flag helping?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde
Copy link
Copy Markdown
Member

amshinde commented Dec 5, 2018

Nice. So this means we do not strictly need to create a new mount namespace anymore @devimc .

Copy link
Copy Markdown
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@devimc
Copy link
Copy Markdown
Author

devimc commented Dec 5, 2018

@amshinde

Nice. So this means we do not strictly need to create a new mount namespace anymore @devimc .

it depends, if we want to improve the isolation/security we have to, because currently host's rootfs is available/visible for qemu,shim and proxy, this change is just to fix the issues with docker cp

@sboeuf
Copy link
Copy Markdown

sboeuf commented Dec 6, 2018

@devimc by using this private mount, are we sure we're not breaking the mount propagation?

@devimc
Copy link
Copy Markdown
Author

devimc commented Dec 6, 2018

@kata-containers/runtime please take a look

@caoruidong
Copy link
Copy Markdown
Member

Why umount events are not propagated? For mount propagation, mount and umount events should be same.

@devimc
Copy link
Copy Markdown
Author

devimc commented Dec 7, 2018

@caoruidong

Why umount events are not propagated?

It depends of the propagation type: shared, private or slave, see http://man7.org/linux/man-pages/man2/mount.2.html

please take a look to kata-containers/tests#971

@caoruidong
Copy link
Copy Markdown
Member

I see shared is “Mount and unmount events
immediately under this mount point will propagate to the other
mount points that are members of this mount's peer group.”

@raravena80
Copy link
Copy Markdown
Member

@devimc any updates in this PR?

devimc pushed a commit to devimc/kata-tests that referenced this pull request Jan 7, 2019
Unskip docker cp test to check mount points are not left
after running docker cp.

Depends-on: github.com/kata-containers/runtime#980

fixes kata-containers#970

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc force-pushed the topic/left_mount_points branch 2 times, most recently from 75326f5 to 71cdab5 Compare January 7, 2019 16:29
@devimc
Copy link
Copy Markdown
Author

devimc commented Jan 7, 2019

/test

@devimc
Copy link
Copy Markdown
Author

devimc commented Jan 7, 2019

added Depends-on: github.com/kata-containers/tests#971 to make sure this PR fixes #794

@devimc
Copy link
Copy Markdown
Author

devimc commented Jan 7, 2019

@caoruidong could you please add more details?, I'd like to understand your concerns

@jodh-intel
Copy link
Copy Markdown

Thanks for raising this @devimc. This looks like a delicate issue. If I'm reading the man pages correctly, I think the following summarises what this PR does (but please confirm @devimc et al :)...

Currently (No MS_PRIVATE):

Operation Host outcome Guest outcome
unmount bind-mounted sandbox dir bind mount removed bind mount NOT removed (bug)
host creates sub-mount below bind-mounted sandbox dir mount created mount visible
guest creates new sub-mount below sandbox dir mount visible mount created

New behaviour (using MS_PRIVATE):

Operation Host outcome Guest outcome
unmount bind-mounted sandbox dir bind mount removed bind mount removed (yay!)
host creates sub-mount below bind-mounted sandbox dir mount created mount NOT visible
guest creates new sub-mount below sandbox dir mount NOT visible mount created

Clearly, we're happy for the bug scenario to be fixed. The question is, are we happy for the change in behaviour for the other two rows in the tables? I think the answer is "yes" since:

  • the host context should not be trying to modify the existing container mounts.
  • If the guest creates new sub-mounts, those should not be visible to the host.

Both of these two scenarios can (and really should) be handled directly using "docker run -v from:to" or equivalent.

@jodh-intel
Copy link
Copy Markdown

/cc @bergwolf, @sboeuf.

@egernst
Copy link
Copy Markdown
Member

egernst commented Jan 7, 2019

@mcastelino PTAL?

@jodh-intel thanks for the table - this helps a lot. I agree with your assessment.

@caoruidong
Copy link
Copy Markdown
Member

@devimc this PR lgtm. It's just to my knowledge mount and umount events are propagated equally in a mount ns. Maybe I'm wrong.

@sboeuf
Copy link
Copy Markdown

sboeuf commented Jan 8, 2019

@jodh-intel
Thanks for the analysis! I have a few comments:

If the guest creates new sub-mounts, those should not be visible to the host.

I agree with that, I don't see why bind mount inside the guest should be propagated back to the host.

the host context should not be trying to modify the existing container mounts.

I'm not sure about that. What about the use case where you do something like docker run -v /foo/bar:/foo/bar busybox, and later you bind mount something into /foo/bar from the host? Don't you expect this mount to be propagated?

Now, talking about the initial issue described by @devimc:

When overlay is used as storage driver, kata runtime creates a new bind mount
point to the merged directory, that way this directory can be shared with the
VM through 9p. By default the mount propagation is shared, that means mount
events are propagated, but umount events not, to deal with this problem and to
avoid left mount points in the host once container finishes, the mount
propagation of bind mounts should be set to private.

It looks to me that using MS_PRIVATE will simply workaround the root cause. IIUC, the problem we're trying to solve here is to make sure everything is properly unmounted when the container is being stopped and removed. Using MS_PRIVATE by not propagating anything will allow for a simpler de-init path since nothing has been previously propagated. But it sounds like the root cause is about our runtime code not being able to unmount what we've been mounting on the create/start path.

Maybe I'm missing one piece here, but I'd like to make sure it's obvious for everybody which issue we're solving here, and how we solve it.

When overlay is used as storage driver, kata runtime creates a new bind mount
point to the merged directory, that way this directory can be shared with the
VM through 9p. By default the mount propagation is shared, that means mount
events are propagated, but umount events not, to deal with this problem and to
avoid left mount points in the host once container finishes, the mount
propagation of bind mounts should be set to private.

Depends-on: github.com/kata-containers/tests#971

fixes kata-containers#794

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc force-pushed the topic/left_mount_points branch from 71cdab5 to b029e44 Compare January 8, 2019 19:15
@devimc
Copy link
Copy Markdown
Author

devimc commented Jan 8, 2019

@sboeuf

I'm not sure about that. What about the use case where you do something like docker run -v /foo/bar:/foo/bar busybox, and later you bind mount something into /foo/bar from the host? Don't you expect this mount to be propagated?

/foo/bar is shared using 9p, not the rootfs, as far as I know, mounts events are not propagated through 9p, for example

$ mkdir -p /tmp/dir/{bar,foo}
$ sudo mount --bind /tmp/dir/{bar,foo}
$ touch /tmp/dir/bar/file
$ ls -R /tmp/dir/{bar,foo}
/tmp/dir/bar:
file

/tmp/dir/foo:
file

$ docker run --rm -ti --runtime=kata-runtime -v /tmp/dir:/tmp/dir debian bash -c 'ls -R /tmp/dir/{bar,foo}'
/tmp/dir/bar:
file

/tmp/dir/foo:

in the container foo is a directory not a mount point, hence it's another limitation of 9p

But it sounds like the root cause is about our runtime code not being able to unmount what we've been mounting on the create/start path.

nop, the runtime doesn't mount anything, docker does it, for example -v /foo/bar:/foo/bar is mounted and unmounted by docker in the overlay fs each time cp is used, mount events are propagated to our bind mount point but not unmount events, even using MS_SLAVE those events are not propagated, having said that the solution I propose is to don't propagate any event (mount/umount), we don't need them since volumes are shared through 9p and not using the rootfs like runc does.

@devimc devimc mentioned this pull request Jan 17, 2019
bergwolf added a commit to bergwolf/kata-runtime that referenced this pull request Jan 18, 2019
`docker cp` might bind mount some files/dirs under container rootfs
without notifying runtime but expect runtime to unmount them. We need
to unmount them otherwise docker will fail to clean up containers.

Depends-on: github.com/kata-containers#980

Signed-off-by: Peng Tao <bergwolf@gmail.com>
@bergwolf
Copy link
Copy Markdown
Member

bergwolf commented Jan 18, 2019

I think the rational behind the patch is that docker cp is really a dirty hack that it bind-mounts host dir under container rootfs without notifying the runtime in any means. So it makes sense to set the rootfs mount private to avoid being propagated by docker cp. For volumes OTOH, we use dedicated 9pfs mountpoint for each of them and they can still propagate new mounts to the guest properly.

LGTM! Thanks @devimc!

Approved with PullApprove

@sboeuf
Copy link
Copy Markdown

sboeuf commented Jan 18, 2019

/test

@sboeuf
Copy link
Copy Markdown

sboeuf commented Jan 18, 2019

Thanks @bergwolf and @devimc for your explanations. Let's move forward with this PR and merge it when the CI turns green!

@bergwolf
Copy link
Copy Markdown
Member

jenkins-ci-fedora-vsocks and jenkins-metrics-ubuntu-16-04 are broken unrelated to this PR. Merging.

@bergwolf bergwolf merged commit 0c09d2b into kata-containers:master Jan 21, 2019
chavafg pushed a commit to chavafg/tests-1 that referenced this pull request Feb 27, 2019
Unskip docker cp test to check mount points are not left
after running docker cp.

Depends-on: github.com/kata-containers/runtime#980

fixes kata-containers#970

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc deleted the topic/left_mount_points branch April 8, 2019 14:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mount points are left after performing a docker cp

8 participants