Add CRI plugin config from source containerd config to drop-in file#1311
Conversation
53fd2be to
af457bc
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR updates the nvidia-ctk-installer to include existing runtime configurations from the source containerd config when creating drop-in files. Instead of starting with an empty config, the drop-in file now preserves all runtime configurations already present in the source's 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes' section.
- Modified the destination config to start with existing runtime configurations from source
- Added a new function
getBaseDropInConfigTreeto extract and copy runtime configurations - Updated all test expectations to include existing runtime configurations in drop-in files
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/config/engine/containerd/containerd.go | Implements the core logic to copy existing runtime configs to drop-in files |
| pkg/config/engine/containerd/config_test.go | Updates test expectations to include existing runtime configurations |
| cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go | Updates integration test expectations to verify runtime config preservation |
| // The destinationConfig is a minimal config with the same options | ||
| // as the source config. The starting content of the destinationConfig | ||
| // is the entire plugins."io.containerd.grpc.v1.cri".containerd.runtimes | ||
| // section of the source config. This is needed due to how containerd | ||
| // merges configuration from multiple files. | ||
| // Reference: https://github.com/containerd/containerd/issues/5837 |
There was a problem hiding this comment.
Given that newer containerd versions (v2.1) do seem to do what we want, did we want to make this optional somehow?
There was a problem hiding this comment.
I have a local change that I can create a PR from as a follow-up. Let's not overcomplicate this now though.
af457bc to
2c06db4
Compare
2c06db4 to
e1c213e
Compare
Pull Request Test Coverage Report for Build 17971376989Details
💛 - Coveralls |
|
Initial testing of these changes looks good. Using the GPU Operator (built from NVIDIA/gpu-operator#1710) I have verified several of the tests captured in https://github.com/NVIDIA/nvidia-container-toolkit/blob/b3ce3a4f3855764208b4b14004620d27a947110a/cmd/nvidia-ctk-installer/container/runtime/containerd/config_test.go. |
| if cfg.UseDropInConfig() { | ||
| return cfg.CleanupDropInConfig() | ||
| } |
There was a problem hiding this comment.
@elezar note, I remove the nvidia runtimes first before a call to CleanupDropInConfig()since the ConfigWithDropIn implementation of RemoveRuntime removes runtimes from both the top-level config and the drop-in file.
There was a problem hiding this comment.
Thanks. The other option would be to remove the runtimes in the top-level config when ADDING them to the drop in, but that may be more complex than it needs to be.
e1c213e to
dafe333
Compare
| } | ||
|
|
||
| func (c *Config) UseDropInConfig() bool { | ||
| return false |
There was a problem hiding this comment.
Question: Why is this always false? for containerd?
There was a problem hiding this comment.
It is false for the Config implementation, and true for the ConfigWithDropIn implemenation.
Thinking about it now, this method is not very useful. I believe we can just call CleanupDropInConfig directly and this can be a no-op for all cases except for ConfigWithDropIn.
There was a problem hiding this comment.
Yes. I think the check was only needed if we called this BEFORE we removed the runtimes. Since we're doing that first now, we can just call it unconditionally.
There was a problem hiding this comment.
(and yes. I'm sorry, it's late).
There was a problem hiding this comment.
I have left this as-is for now. Feel free to clean this up further as you see fit.
I spent some time trying to remove the UseDropInConfig method entirely (and unconditionally calling CleanupDropInConfig), but this did not end up being as straightforward as I would have liked.
There was a problem hiding this comment.
Could something like #1311 (comment) work? If I recall correctly this is what you proposed initially. This is definitely simpler than trying to "clear" the config after the fact.
| } | ||
|
|
||
| func (c *ConfigWithDropIn) CleanupDropInConfig() { | ||
| c.Interface = &Config{Tree: toml.NewEmpty()} |
There was a problem hiding this comment.
Should this not just be:
| c.Interface = &Config{Tree: toml.NewEmpty()} | |
| c.Tree = toml.NewEmpty() |
What about the configOptions which are used to update the config later? (I understand that we only ever call this AFTER we're done updating the config, but this may not always be the case).
There was a problem hiding this comment.
We aren't able to do this without some refactoring. Currently, an engine.Interface is a member of ConfigWithDropIn and not a Config.
There was a problem hiding this comment.
Sorry. I think I'm missing some important details here after a very long day. I will have a look again after some rest.
pkg/config/engine/crio/crio.go
Outdated
| } | ||
|
|
||
| func (c *Config) CleanupDropInConfig() { | ||
| c.Tree = toml.NewEmpty() |
There was a problem hiding this comment.
If this works for crio then i think we should be able to use it for containerd too.
pkg/config/engine/api.go
Outdated
| EnableCDI() | ||
| GetRuntimeConfig(string) (RuntimeConfig, error) | ||
| RemoveRuntime(string) error | ||
| UseDropInConfig() bool |
There was a problem hiding this comment.
Just as a note to self. We should probably clean up this interface / compose the other two interfaces that we have introduced.
dafe333 to
6e7220f
Compare
| if o.DropInConfig == "" { | ||
| return nil | ||
| } | ||
| // When a drop-in config is used, we remove the drop-in file explicitly. | ||
| // This is require for cases where we may have to include other contents | ||
| // in the drop-in file and as such it may not be empty when we flush it. | ||
| err = os.Remove(o.DropInConfig) | ||
| if err != nil && !errors.Is(err, os.ErrNotExist) { | ||
| return fmt.Errorf("failed to remove drop-in config file: %w", err) | ||
| } |
There was a problem hiding this comment.
@cdesiniotis given the discussions around the additional functions to the interface, I thought it would actually be cleaner to explicitly remove the file here instead of a more complex implementation at a lower level.
I'm going to push out an rc including these changes be we can always revisit if there's something that I missed.
There was a problem hiding this comment.
Removing the file is definitely cleaner and simplifies implementation. I am onboard as long as we are sure the conditional immediately preceding this is sufficient.
There was a problem hiding this comment.
For use cases where we are modifying the config directly -- or drop ins are not supported -- a user SHOULD be setting DropInConfig to "" explicitly. This would be the case for docker, or for v1 containerd configs.
The check for DropInConfig == "" also mirrors what we do when deciding whether to flush to the drop-in config or the "regular" config.
Are the situations that you're concerned about?
There was a problem hiding this comment.
Okay, I see that this is aligned with the code that flushes the config, so I believe this addresses my question.
For use cases where we are modifying the config directly -- or drop ins are not supported -- a user SHOULD be setting DropInConfig to "" explicitly. This would be the case for docker, or for v1 containerd configs.
Ah, I wasn't aware that the intention was for users to set --drop-in-config="" explicitly when using v1 containerd configs... As long as we cover this in our documentation, this should be fine. Most should be using v2 or v3 configs at this point anyways.
6e7220f to
887b7f6
Compare
This change updates the drop-in file contents that the nvidia-ctk-installer creates for containerd. In addition to adding nvidia runtimes, the nvidia-ctk-installer also includes all existing configuration present in the source containerd configuration for the CRI plugin. That is, all configuration already present in the 'plugins."io.containerd.grpc.v1.cri"' section will be added to our drop-in file. This is needed due to how containerd merges configuration from multiple files. See containerd/containerd#5837 Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
887b7f6 to
598c632
Compare
| if o.DropInConfig == "" { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Question -- if o.DropInConfig != "" is it always the case that we are using a drop-in file? Even if this option is specified, my understanding is that a drop-in file would not be used for 1) v1 containerd configs, or 2) cri-o when the hook config mode is configured.
This change updates the drop-in file contents that the nvidia-ctk-installer creates for containerd. In addition to adding nvidia runtimes, the nvidia-ctk-installer also includes all existing configuration present in the source containerd configuration for the CRI plugin. That is, all configuration already present in the 'plugins."io.containerd.grpc.v1.cri"' section will be added to our drop-in file.
This is needed due to how containerd merges configuration from multiple files. See containerd/containerd#5837