daemon: support alternative runtimes MVP#43800
Conversation
|
How about an integration test? |
|
Let me post my comment from the other PR here for consideration #43736 (comment)
|
And from a perspective of the dockerd<->containerd interface, the One of my motivations for wanting the daemon to support using ContainerD Runtimes without needing to first explicitly inform the daemon of their existence is to remove a significant hurdle to packaging, installing and using alternative runtimes with Docker. Distributors could package runtimes purely as ContainerD Runtimes with no Docker awareness, and users would still be able to use the runtimes with Docker immediately after installation. And the daemon could dynamically discover which runtimes are available so they could be listed in
As they say, with great power comes great responsibility. Containerd only accepts absolute paths, not relative, so it should hopefully be a red flag if someone instructs you to specify a runtime of |
4fdadd1 to
a5f43f0
Compare
193b197 to
4cf6771
Compare
| "-E", "XDG_RUNTIME_DIR="+d.rootlessXDGRuntimeDir, | ||
| "-E", "HOME="+d.rootlessUser.HomeDir, | ||
| "-E", "PATH="+os.Getenv("PATH"), | ||
| "--preserve-env", | ||
| "XDG_RUNTIME_DIR="+d.rootlessXDGRuntimeDir, | ||
| "HOME="+d.rootlessUser.HomeDir, |
There was a problem hiding this comment.
Note for reviewers: the sudo -E flag takes no arguments. sudo -E HOME=/home/rootlessuser is parsed as two separate flags: preserve all environment variables, and set the HOME variable. The extra -E flags were redundant. The only functional change I have made is to stop overriding the PATH variable.
There was a problem hiding this comment.
What was the reason for removing PATH from the list? Depending on the environment, PATH may be a protected env-var, and therefore reset to its default (which can lead to fun situations; see moby/term#21 (comment))
There was a problem hiding this comment.
I was hoping to pass through PATH from the os/exec.Cmd.Args so the path override in my integration test (daemon.WithEnvVars) would Just Work in rootless mode. Guess I'll need to find a different way to accomplish that.
There was a problem hiding this comment.
I guess that would still be possible 🤔 as in; when it would run sudo -E PATH=/foo:/bar PATH=/bar/baz, the second PATH would override the former (default "inherited") path, correct?
daemon/runtime_unix.go
Outdated
| // it would resolve to a binary named "containerd-shim---". | ||
| // https://github.com/containerd/containerd/blob/11ded166c15f92450958078cd13c6d87131ec563/runtime/v2/manager.go#L297-L317 | ||
| // https://github.com/containerd/containerd/blob/11ded166c15f92450958078cd13c6d87131ec563/runtime/v2/shim/util.go#L83-L93 | ||
| return !filepath.IsAbs(name) && !strings.ContainsRune(name, '/') && strings.ContainsRune(name, '.') |
There was a problem hiding this comment.
This path would always be "local", for containerd, right? (i.e., no need to take cross-platform into account) otherwise we would need something like docker/cli#1990
I think ideally;
- we should consume some function from containerd that performs the validation, so that we don't diverge
- perhaps create a tracking issue to contribute such a function in containerd (i.e. based on this code
- wondering if the containerd validation is (per the above) too relaxed, i.e., does it want to allow
con....tai-----nerd-🤣🐳-shim...---as name? (something likeRestrictedNameChars, possibly tweaked to not allow consecutive separators may be good?
There was a problem hiding this comment.
This path would always be "local", for containerd, right? (i.e., no need to take cross-platform into account) otherwise we would need something like docker/cli#1990
Yes. The barriers to having dockerd operate on a containerd instance over the network include requiring a network filesystem which transparently allows FIFOs in exported shares to be read from and written to (standard container streams) and all the assumptions in dockerd that any observations of system state, such as /proc/mounts, are reflective of the observations from containerd.
I think ideally;
- we should consume some function from containerd that performs the validation, so that we don't diverge
- perhaps create a tracking issue to contribute such a function in containerd (i.e. based on this code
This validation function diverges from containerd's validation by design so a function which reports whether containerd would consider the name valid would be of rather limited use to us. I had initially implemented this check in terms of shim.BinaryName but it seemed so trivial a function that all the extra transitive packages vendored in seemed so wasteful. And yet I still managed to mischaracterize the function Fair point about divergence: the validation it performs has recently been made more restrictive than the check in the containerd version we currently vendor. I'll switch back to importing it so we pick up any validation improvements whenever we bump containerd.
- wondering if the containerd validation is (per the above) too relaxed, i.e., does it want to allow
con....tai-----nerd-🤣🐳-shim...---as name? (something likeRestrictedNameChars, possibly tweaked to not allow consecutive separators may be good?
I don't see the need as I don't see a way to exploit the validation. It can't be used to smuggle a path or a path traversal in as any names containing / bytes are blocked. It makes sense to apply limitations such as RestrictedNameChars when assigning a name to an entity, but this is a case of referencing an entity by its preassigned name. I say let users play stupid games with runtime names if they want; they'll be told that the runtime is not installed when they try to start the container, same as would happen when specifying the name of any other not-installed runtime.
daemon/runtime_unix.go
Outdated
| } | ||
|
|
||
| // isValidC8dRuntimeName tests whether name | ||
| // is a well-formed ContainerD runtime name, |
There was a problem hiding this comment.
note that the canonical name for containerd is all lowercase (containerd)
There was a problem hiding this comment.
🤦 I must've mixed it up with runC
There was a problem hiding this comment.
I don't think there's an "official" statement, but afaik, it's all lowercase (but I know some people use the capital D) - as we're using lowercase everywhere, best to stick with it 😅
| "-E", "XDG_RUNTIME_DIR="+d.rootlessXDGRuntimeDir, | ||
| "-E", "HOME="+d.rootlessUser.HomeDir, | ||
| "-E", "PATH="+os.Getenv("PATH"), | ||
| "--preserve-env", | ||
| "XDG_RUNTIME_DIR="+d.rootlessXDGRuntimeDir, | ||
| "HOME="+d.rootlessUser.HomeDir, |
There was a problem hiding this comment.
I guess that would still be possible 🤔 as in; when it would run sudo -E PATH=/foo:/bar PATH=/bar/baz, the second PATH would override the former (default "inherited") path, correct?
4797242 to
54de464
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
overall looks good; left some thoughts / ramblings
| if !isPermissibleC8dRuntimeName(name) { | ||
| return nil, errdefs.InvalidParameter(errors.Errorf("unknown or invalid runtime name: %s", name)) | ||
| } | ||
| return &types.Runtime{Shim: &types.ShimConfig{Binary: name}}, nil |
There was a problem hiding this comment.
Slightly on the fence if it would be more idiomatic to change isPermissibleC8dRuntimeName to return the error; something like;
if err := validateRuntimeName(name); err != nil {
return nil, err
}Of course at that point it's debatable if it should also handle converting it to a types.Runtime, so something like;
return getContainerdRuntime(name)There was a problem hiding this comment.
This PR is also laying some groundwork for the follow-up PR to enable a containerd runtime to be set as the default runtime. That function is not returning an error directly because the appropriate error to return is context-dependent. A different error message is more appropriate when validating the configured default runtime.
Of course at that point it's debatable if it should also handle converting it to a
types.Runtime, so something like;return getContainerdRuntime(name)
That hypothetical getContainerdRuntime function would be a near drop-in replacement for (*Daemon).getRuntime, having the same signature and everything. And it would almost seem to work, too. But it would be subtly broken to call it anywhere other than (*Daemon).getRuntime as it would disregard any configured runtimes. I would prefer to have only one function which maps a name to a runtime so there is no possibility to choose wrong.
There was a problem hiding this comment.
But it would be subtly broken to call it anywhere other than (*Daemon).getRuntime as it would disregard any configured runtimes.
Overall I'm not too concerned (for a non-exported function, which more commonly come with some guidance). Was mostly considering it to be nice if it would return a nicely formatted error with the right type ("invalid name"), and generally err can be used as a "boolean" for validation.
Not a blocker though.
| @@ -116,7 +117,10 @@ func (daemon *Daemon) rewriteRuntimePath(name, p string, args []string) (string, | |||
| func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) { | |||
There was a problem hiding this comment.
There's also a Windows equivalent of this function; do we need to implement this logic for Windows as well (when using containerd as runtime?
There was a problem hiding this comment.
I don't think so, at least not until an alternative to containerd-shim-runhcs-v1 exists. And I'm not sure one ever will exist, since runhcs covers all the use cases (Windows & Linux containers with Process & VM isolation) and I suspect implementing alternative runtimes would be tricky without access to the Windows source code.
There was a problem hiding this comment.
My thinking here was that if Windows containerd has the same semantics, we could expose the --runtime on Windows, and have containerd deal with it (once an alternative comes to exist).
| return &types.Runtime{Shim: &types.ShimConfig{Binary: name}}, nil | ||
| } | ||
|
|
||
| if len(rt.Args) > 0 { |
There was a problem hiding this comment.
small observation that all code below is only applied to runtimes configured through daemon.json; for clarity we could consider reversing the if rt == nil { to if rt != nil { with those options below in it.
That said, as I see it correctly;
- setting a runtime through
daemon.jsonwithout specifyingShim, would result indefaultV2ShimConfig()to be applied. Whereas passing the same through--runtimewould not. Which means that I cannot "just" specifyio.containerd.FOOBAR.v2indaemon.json(as that would apply the defaultOptions, assuming it's a variant ofruncshim? Argscan be passed throughdaemon.json(but obviously not through--runtime
Not sure what the best migration path is to allow "just" specifying a shim through daemon.json, and/or if some of those options should be marked deprecated and/or produce an error. (input/ideas welcome)
There was a problem hiding this comment.
small observation that all code below is only applied to runtimes configured through
daemon.json; for clarity we could consider reversing theif rt == nil {toif rt != nil {with those options below in it.
rt == nil is an error condition so its flow is indented.
That said, as I see it correctly;
- setting a runtime through
daemon.jsonwithout specifyingwould result inShim,defaultV2ShimConfig()to be applied. Whereas passing the same through--runtimedockerd --add-runtimewould not. Which means that I cannot"just"specifyio.containerd.FOOBAR.v2indaemon.json(as that would apply the defaultOptions, assuming it's a variant ofruncshimCLI binary with the file nameio.containerd.FOOBAR.v2)Argscan be passed throughdaemon.json(but obviously not through--runtime--add-runtime
FTFY. runtimes.<name>.Shim cannot be specified in daemon.json as that struct field is tagged json:"-".
Not sure what the best migration path is to allow "just" specifying a shim through daemon.json, and/or if some of those options should be marked deprecated and/or produce an error. (input/ideas welcome)
That's the neat part; you don't! "Just" specify the shim name as the runtime when creating the container. As noted in the PR description, I will be following up with a PR to allow default-runtime to be set to a shim name. No migration path is needed as daemon.json has exactly the same semantics as it always had. People can continue to define "runtimes" which use runC-compatible binaries with the io.containerd.runc.v2 containerd runtime, same as they always have done. I see no need to deprecate that functionality, either.
There was a problem hiding this comment.
That's the neat part; you don't! "Just" specify the shim name as the runtime when creating the container. As noted in the PR description, I will be following up with a PR to allow default-runtime to be set to a shim name.
Ah, right, yes; I was thinking "with this PR, it's possible both through daemon.json and through --runtime to specify a shim.
But currently, daemon.json will continue to be "only runtimes that use the runc shim" (but alternative binary)
| @@ -707,8 +707,8 @@ func verifyPlatformContainerSettings(daemon *Daemon, hostConfig *containertypes. | |||
| hostConfig.Runtime = daemon.configStore.GetDefaultRuntimeName() | |||
There was a problem hiding this comment.
Not introduced in this PR, but this looks somewhat odd in a verifyXX function; if I see it correctly, this may (due to hostConfig being passed by reference) be mutating the config, both on create and start (and possible other locations that call verifyContainerSettings()).
Possibly the intent was to bake the config into the container's config, but if it's called in multiple locations, it may actually have become mutable 🤔
There was a problem hiding this comment.
This is mutating the config. I cringed when I first noticed it.
There was a problem hiding this comment.
Same! Yes, we should look into that; it's probably ok to perform it when creating a container (set default if not set to "bake" it into the config), but definitely feels fishy for start, and definitely should be outside of a "verify" function.
It may have been for migrating existing container configs after updating docker engine, but if that was the reason, we should have it part as an explicit step during "restore" (where I think we previously did such migrations).
Contrary to popular belief, the OCI Runtime specification does not
specify the command-line API for runtimes. Looking at containerd's
architecture from the lens of the OCI Runtime spec, the _shim_ is the
OCI Runtime and runC is "just" an implementation detail of the
io.containerd.runc.v2 runtime. When one configures a non-default runtime
in Docker, what they're really doing is instructing Docker to create
containers using the io.containerd.runc.v2 runtime with a configuration
option telling the runtime that the runC binary is at some non-default
path. Consequently, only OCI runtimes which are compatible with the
io.containerd.runc.v2 shim, such as crun, can be used in this manner.
Other OCI runtimes, including kata-containers v2, come with their own
containerd shim and are not compatible with io.containerd.runc.v2.
As Docker has not historically provided a way to select a non-default
runtime which requires its own shim, runtimes such as kata-containers v2
could not be used with Docker.
Allow other containerd shims to be used with Docker; no daemon
configuration required. If the daemon is instructed to create a
container with a runtime name which does not match any of the configured
or stock runtimes, it passes the name along to containerd verbatim. A
user can start a container with the kata-containers runtime, for
example, simply by calling
docker run --runtime io.containerd.kata.v2
Runtime names which containerd would interpret as a path to an arbitrary
binary are disallowed. While handy for development and testing it is not
strictly necessary and would allow anyone with Engine API access to
trivially execute any binary on the host as root, so we have decided it
would be safest for our users if it was not allowed.
It is not yet possible to set an alternative containerd shim as the
default runtime; it can only be configured per-container.
Signed-off-by: Cory Snider <csnider@mirantis.com>
54de464 to
547da0d
Compare
|
What's up with the vendoring on this? I would not expect any changes to vendor/. |
Looks like it's because of the |
|
|
|
Ah, I totally missed that shim import. Seems like we can copy this one. |
That's what I did in the first revision! @thaJeztah wasn't happy about it: #43800 (comment) |
Yes, I guess there's pros/cons (no risk of diverging, but more code vendored).
Happy to hear your thoughts! |
| return &types.Runtime{Shim: &types.ShimConfig{Binary: name}}, nil | ||
| } | ||
|
|
||
| if len(rt.Args) > 0 { |
There was a problem hiding this comment.
That's the neat part; you don't! "Just" specify the shim name as the runtime when creating the container. As noted in the PR description, I will be following up with a PR to allow default-runtime to be set to a shim name.
Ah, right, yes; I was thinking "with this PR, it's possible both through daemon.json and through --runtime to specify a shim.
But currently, daemon.json will continue to be "only runtimes that use the runc shim" (but alternative binary)
| @@ -116,7 +117,10 @@ func (daemon *Daemon) rewriteRuntimePath(name, p string, args []string) (string, | |||
| func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) { | |||
There was a problem hiding this comment.
My thinking here was that if Windows containerd has the same semantics, we could expose the --runtime on Windows, and have containerd deal with it (once an alternative comes to exist).
| @@ -707,8 +707,8 @@ func verifyPlatformContainerSettings(daemon *Daemon, hostConfig *containertypes. | |||
| hostConfig.Runtime = daemon.configStore.GetDefaultRuntimeName() | |||
There was a problem hiding this comment.
Same! Yes, we should look into that; it's probably ok to perform it when creating a container (set default if not set to "bake" it into the config), but definitely feels fishy for start, and definitely should be outside of a "verify" function.
It may have been for migrating existing container configs after updating docker engine, but if that was the reason, we should have it part as an explicit step during "restore" (where I think we previously did such migrations).
| if !isPermissibleC8dRuntimeName(name) { | ||
| return nil, errdefs.InvalidParameter(errors.Errorf("unknown or invalid runtime name: %s", name)) | ||
| } | ||
| return &types.Runtime{Shim: &types.ShimConfig{Binary: name}}, nil |
There was a problem hiding this comment.
But it would be subtly broken to call it anywhere other than (*Daemon).getRuntime as it would disregard any configured runtimes.
Overall I'm not too concerned (for a non-exported function, which more commonly come with some guidance). Was mostly considering it to be nice if it would return a nicely formatted error with the right type ("invalid name"), and generally err can be used as a "boolean" for validation.
Not a blocker though.
Contrary to popular belief, the OCI Runtime specification does not specify the command-line API for runtimes. Looking at containerd's architecture from the lens of the OCI Runtime spec, the shim is the OCI Runtime and runC is "just" an implementation detail of the
io.containerd.runc.v2runtime. When one configures a non-default runtime in Docker, what they're really doing is instructing Docker to create containers using theio.containerd.runc.v2runtime with a configuration option telling the runtime that the runC binary is at some non-default path. Consequently, only OCI runtimes which are compatible with theio.containerd.runc.v2shim, such as crun, can be used in this manner. Other OCI runtimes, including kata-containers v2, come with their own containerd shim and are not compatible withio.containerd.runc.v2. As Docker has not historically provided a way to select a non-default runtime which requires its own shim, runtimes such as kata-containers v2 could not be used with Docker.Allow other ContainerD shims to be used with Docker; no daemon configuration required. If the daemon is instructed to create a container with a runtime name which does not match any of the configured or stock runtimes, it passes the name along to ContainerD verbatim. A user can start a container with the kata-containers runtime, for example, simply by calling
Runtime names which ContainerD would interpret as a path to an arbitrary binary are disallowed. While handy for development and testing it is not strictly necessary and would allow anyone with Engine API access to trivially execute any binary on the host as root, so we have decided it would be safest for our users if it was not allowed.
It is not yet possible to set an alternative ContainerD shim as the default runtime; it can only be configured per-container. Doing so properly, with validation and configuration reload handling, is quite the nontrivial undertaking with the current state of the daemon configuration logic. That will be addressed in a subsequent PR.
- What I did
- How I did it
- How to verify it
Unit and integration tests cover the new functionality.
To manually verify:
$PATHdocker run --rm --runtime io.containerd.my-runtime.v1 alpine lsThe
io.containerd.runc.v2shim is aliased toio.containerd.runc-alias.v2in the development container image as a convenience for manual testing.- Description for the changelog
docker infoand cannot be set as the default runtime at this time.- A picture of a cute animal (not mandatory but encouraged)