Set daemon root to use shared propagation#36096
Conversation
daemon/daemon_unix.go
Outdated
There was a problem hiding this comment.
If someone has set MountFlags=Private on docker.service in systemd, won't this result in dockerd root being converted to shared, thereby negating the operator's preference to isolate docker mounts? If so, and that was their intent, we should recommend they switch to MountFlags=slave.
There was a problem hiding this comment.
With MountFlags=Private, this would till be disconnected from the parent mount namespace even though we've set it to shared.
So new mounts would not leak in or out (to/from the parent).
This change sets an explicit mount propagation for the daemon root. This is useful for people who need to bind mount the docker daemon root into a container. Since bind mounting the daemon root should only ever happen with at least `rlsave` propagation (to prevent the container from holding references to mounts making it impossible for the daemon to clean up its resources), we should make sure the user is actually able to this. Most modern systems have shared root (`/`) propagation by default already, however there are some cases where this may not be so (e.g. potentially docker-in-docker scenarios, but also other cases). So this just gives the daemon a little more control here and provides a more uniform experience across different systems. Signed-off-by: Brian Goff <cpuguy83@gmail.com>
1954c67 to
a510192
Compare
|
LGTM |
|
Where is this mount cleaned up?
Why do we want to allow that? We are setting the parent directories of graph-dir and container dir to private/unbindable (in part) to avoid these accidental leaks and possible EBUSY. If something outside is tracking the graphdriver mounts then the reference counter in Docker gets confused and will delete mounted data (or at least in some versions in aufs fail doing that). |
Nice catch, I'll follow up to clean that up on shutdown.
It's already alllowed today, lots of things do it including Kube, cadavisor, and even docker-ee.
We pretty much need to stop setting We could do unbindable... and maybe we should still... but I worry about this breaking people. I definitely don't think we should set the graph dir itself to unbindable because this would prevent any sort of way to do analysis on the graph driver dir from a container. This would mean setting up a new sub-dir (like we did for IPC and secret mounts on containers) and adjusting the graphdriver implementation to use it, but also be backwards compatible for old containers (live restore at least). |
|
Also, maybe missing #36055 as context to also fix issues with people bind mounting the daemon root. |
This change sets an explicit mount propagation for the daemon root.
This is useful for people who need to bind mount the docker daemon root
into a container.
Since bind mounting the daemon root should only ever happen with at
least
rlsavepropagation (to prevent the container from holdingreferences to mounts making it impossible for the daemon to clean up its
resources), we should make sure the user is actually able to this.
Most modern systems have shared root (
/) propagation by defaultalready, however there are some cases where this may not be so
(e.g. potentially docker-in-docker scenarios, but also other cases).
So this just gives the daemon a little more control here and provides
a more uniform experience across different systems.