Conversation
9d58803 to
37f7c16
Compare
|
Does it replace volume plugin? |
|
@AkihiroSuda this does not replace volume plugins -- it's a different layer from volume plugins. Volume plugins provide some volume file system at a source location. Mount point plugins can then act on this data (or change the source of the mount) before a container starts and after a container stops (but before a return code/exit event is issued). The current implementation allows for a stack of mount point plugins to participate in host bind mounts and volume mounts. The design should be able to accommodate interposing on tmpfs, layers, config, network, and secrets mounts as well (but none of these are implemented yet). I recommend reading #33048 for the motivation and background of this feature. |
cd05492 to
ed6d965
Compare
|
I'm not sure what to do about: Should Perhaps this highlights an issue with sharing the Should |
47272ab to
da6b434
Compare
|
I've rebased this PR and moved the implementation from |
|
The types moved to |
|
Why does |
|
|
|
In #32989 the proposed new location for I think you're right about the intent of |
da6b434 to
9a11d2e
Compare
|
OK, I've put all the new modules under |
7ac780b to
83b68b2
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
just some nits found while reading
docs/extend/plugins_mountpoint.md
Outdated
There was a problem hiding this comment.
missing trailing comma (hence the "red" highlighting because it's invalid JSON)
docs/extend/plugins_mountpoint.md
Outdated
docs/extend/plugins_mountpoint.md
Outdated
docs/extend/plugins_mountpoint.md
Outdated
|
ping @johnstep to have a look from the windows side of things |
|
I've been prototyping with this PR and I think that the mount point filtering should be much more expressive (e.g. to allow filtering on volume labels) and the attach response message should be more uniform. In particular, I think both should use datatypes with the same shape as the attach request mount point struct. Regardless of those changes, I believe the basic design of this changeset can be reviewed/approved. I will be off the grid from June 15 until July 7 but @yallop will be available to answer questions. |
cpuguy83
left a comment
There was a problem hiding this comment.
In general I'm good with design, though I would like to see the pieces split up a bit such that we have mount middleware, and then plugins that can act as mount middleware rather than just mount plugins.
Likewise the plugin API should reflect that the plugins are implementing mountpoint middleware rather than a mount plugin.
docs/extend/plugins_mountpoint.md
Outdated
There was a problem hiding this comment.
Can we define a struct to encapsulate this rather than string parsing?
There was a problem hiding this comment.
Yes, the filters need to be more expressive than they currently are and packing expressions into strings is not really appropriate (was it ever?). I'm working on this now.
volume/mountpoint/api.go
Outdated
There was a problem hiding this comment.
Can we use the types from api/types/mount?
There was a problem hiding this comment.
We could but eventually this enumeration will be bigger than the one in api/types/mount. For instance, layers and secrets could also be interposed on. This implementation currently doesn't support tmpfs so perhaps that variant should be removed from this type.
There was a problem hiding this comment.
This doesn't seem like a very good reason to create yet another type...
volume/mountpoint.go
Outdated
There was a problem hiding this comment.
I'm not sure if I understand the use of the clock terminology here. Is this an ordinal describing the order in which the mounts were applied?
There was a problem hiding this comment.
Yes, the field doc is Clock is a positive integer used to ensure mount detachments occur in the correct order. Because interposition is targeted based on mount point properties, different mounts into the same container can have different stacks of participating plugins. To unwind the plugins in the correct order without maintaining a separate index, we use this ordinal. See https://github.com/moby/moby/pull/33375/files#diff-077bf4d53f2bf30ff32bdb1c2c2c70daR208 for the unwinding routine.
There was a problem hiding this comment.
@dsheets It might be good to have some commentary on why this works, over using the separate index approach or just a stack. I don't think I quite understand the intended behavior here.
There was a problem hiding this comment.
At a glance, this does seem more complicated than other methods. I'm not understanding the unwind code very well.
Previously, these values which are pointers to strings would be dereferenced to have their (aliased) strings written to. If these fields were not specified in the plugin configuration, the values would be pointers and dereferencing would fail. This removes the use of the aliased string pointer field and eliminates any mutation side effects. Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
… store Signed-off-by: David Sheets <dsheets@docker.com>
Necessary until moby#33658 is merged. Signed-off-by: David Sheets <dsheets@docker.com>
This requires an update to swarmkit so the swarm protobuf protocol includes the consistency field. Signed-off-by: David Sheets <dsheets@docker.com>
Previously, volume.MountPoint.Setup would return the path at which the mount would appear but the volume.MountPoint object itself might not contain this path (e.g. in the case of anonymous volume mounts). Now, .Setup does not return the path and, instead, guarantees the postcondition that, after a successful call, .Source will contain the path to the mount point. Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
Mount point plugins are just one type of mount point middleware, now. This will allow us to create internal mount point middleware that performs similar functions to mount point plugins without all of the plugin machinery including IPC. daemon/volumes: split volume.MountPoint.Setup into Realize and Setup Realize ensures that there is a path to the mount point file system whereas Setup performs finalization of a mount point including creation of the mount point path if it does not exist. volume: moved source path existence validation to mount failure This simplifies mount spec validation by removing the file system state input side effect of checking whether the source path exists. Instead, this error condition is naturally handled during bind mounting. To achieve this, I introduce a new flag on volume.MountPoint, CreateSourceIfMissing, that indicates that the source path should be created if it is missing. We use this flag to decide whether to idtools.MkdirAllAndChownNew rather than always executing that function and relying on the early spec validation to prevent --mount requests from creating directory hierarchies. I also simplified volume.MountPoint.Setup to remove the optional checking function as this check can now be performed accurately immediately after volume.MountPoint.Realize for Unix-like systems. This change was necessary in order to support mount point plugins that implement full file system interposition as the requested source should not be checked early in that case and, instead, the effective source should be used for mounting (or fail). daemon/cluster/executor/container: don't precheck bind mount sources Checking bind mount source directories for existence interferes with mount point plugins and is unnecessary as automatic creation of source directories only occurs with 'raw' mount specifications (e.g. `-v /foo:/bar`) and not with structured specifications. daemon/cluster/executor/container: add mount point plugins to info Also, fix an authz plugin type name reporting bug. volume/mountpoint: add emergency detach instructions for middleware In the case where middleware is unreachable when it is time to detach an interposed mount, emergency detach instructions can be used to prevent mount leaks or file system debris from accumulating or causing further issues. Without emergency detach instructions, many mount point middleware chains could fail to detach if a plugin is unreachable (even if later plugins plowed ahead, they may be unable to clean up their mounts or directories if the unresponsive plugin used them). With emergency detach instructions, many mount point plugin chains can safely and completely detach without _requiring_ responsive plugins. For resource tracking purposes, we still prefer to contact plugins for detach. Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
A missing bind source on container creation is OK. The failure for a missing bind source should only occur much later after any mount point middleware has participated and then only if the the mount point has its CreateSourceIfMissing flag set (e.g. from -v). Signed-off-by: David Sheets <dsheets@docker.com>
Signed-off-by: David Sheets <dsheets@docker.com>
5f2f54d to
8a40244
Compare
| sysInfo = sysinfo.New(true) | ||
|
|
||
| testEnv.Print() | ||
| setupSuite() |
There was a problem hiding this comment.
This is going to break windows CI (I just realized that these suites aren't running on windows CI but that should be fixed shortly).
It looks like mountpoint_windows_test.go is missing which needs to define these.
| mountPointController[2] = mountPointControllerForLocalVolumeBindMounts() | ||
| mountPointController[3] = mountPointControllerForAnonymousBindMounts() | ||
| mountPointController[4] = mountPointControllerForAllBindMounts() | ||
| mountPointController[5] = mountPointControllerForMountsIntoContainersWithoutAllowLabel() |
There was a problem hiding this comment.
It seems that there are 6 controllers being setup, but some tests also modify the controllers. I believe not every test uses every controller, is that right?
I think this is going to be difficult to maintain. Instead of having everything setup for every test could you have a function like:
func setupPlugin(controllers ...testMountPointController) { ... }and call that on each test with just the controllers that it needs? That should also remove the need to modify the static controllers from individual tests, and will hopefully remove the indexing using numbers as well.
setupPlugin() can still add them to a static slice, and teardown can iterate over the slice to perform cleanup.
|
Could someone please review the design and implementation of this PR? A few additional test cases and some minor clean-up will be forthcoming but nothing is blocking a review, afaik. |
|
cc @thaJeztah @cpuguy83 @vieux what is the status for this one ? 👼 |
|
FWIW, we (NVIDIA, for GPU enablement in containers) would not benefit from this new type of plugin, I believe. Having support for standardized OCI hooks would be better for us. As described in #28837, such hooks could either be registered to the docker daemon (like we do for runtimes), or passed to |
|
@flx42 I think hooks are basically deprecated in favor of the create/start split that is available in runc.... how to expose that from docker I'm really not certain.... right now there isn't even such a split in the moby codebase yet. |
|
@cpuguy83 What do you mean "deprecated"? Aren't they part of the OCI runtime spec? |
|
@flx42 Sure it's part of the spec, but it basically the preference is to use create/start instead of hooks. Deprecated doesn't mean gone. |
|
@anusha-ragunathan it would be interesting to investigate whether this approach could work for @flx42 as well. |
|
@cpuguy83 what is the status here ? |
|
If this is something we need to have, it would be good if someone who is interested in having this merged to take it over. I'm going to go ahead and close this one (which has tons of merge conflicts as is) and move discussion back to the open issue. Thanks! 👼 |
- What I did
Designed and implemented mount point interposition plugins as (mostly) described by #33048. This PR implements that proposal but improves the mount point filtering system to be more general and accurate. In particular, filters loaded from mount point plugins are now DNF and include negation (necessary to select local volume bind mounts and anonymous local volume mounts).
I'm moderately happy with how the implementation turned out.
I'm not sure what the Windows story is.
I'm not sure how I should adjust the tests to account for the new API-tests-only policy. Are there good examples somewhere that I should follow? Or a transition PR that I can crib from?
This is my first major contribution to the Docker engine and the most golang I have written in a single patch. I'm very happy to take pointers on both Docker code base best practice and golang best practice.
Container error conditions, daemon restart/reload, and plugin server assumptions were some of my biggest hurdles.
In terms of design, one of my remaining reservations is the level of expressivity of the mount point plugin filter 'language'. It supports what is mostly an equality predicate (set membership for values) and negation. I can see the case for more expressive predicates like path prefix (useful for
--opt device=/prefix/rest) but this would require expanding the predicate language further (currently negation is a barely acceptable special case).- How I did it
I read a lot of the volume/mount/bind related code and then copied a lot of the structure of the AuthZ plugin subsystem and hooked into the places that seemed relevant from reading and testing.
- How to verify it
I wrote some tests in bd35f31. I plan to write some more to test the plugin V2 dynamics (similar to
integration-cli/docker_cli_authz_plugin_v2_test.go) but the relevant client package isn't published yet (it's in this PR) and I haven't worked out how to fake-vendor it, yet.I also recommend reading 9d58803 to understand the design.
- Description for the changelog
Mount point plugins allow bind mounts and volume mounts to be selectively intercepted
- A picture of a cute animal (not mandatory but encouraged)
Lontra felina, or the marine otter, is believed to be the smallest marine mammal on Earth. It is endangered and can be found along the western coast of South America.
/cc @cpuguy83 @thaJeztah @yallop @dnephin