Skip to content

Conversation

@djdongjin
Copy link
Member

@djdongjin djdongjin commented Jan 27, 2025

Fix #10710, this PR adds a new cri CNI config bin_dirs that is similar to bin_dir but accepts a list of bin dirs. This PR includes:

commit 1: support his new bin_dirs field:

  1. Add this new bin_dirs field;
  2. In ValidateRuntimeConfig, make sure only one of bin_dir and bin_dirs is set.

commit 2: make bin_dirs as default, and migrate bin_dir to bin_dirs:

  1. Change default to setting bin_dirs instead of bin_dir.
  2. Mark bin_dir as DEPRECATED.
  3. In ConfigMigration, migrate bin_dir in version 2 to bin_dirs in version 3.
  4. Add a deprecation warning for bin_dir.

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@djdongjin djdongjin force-pushed the multiple-cni-dir branch 4 times, most recently from ffe6a8c to dadc06f Compare January 28, 2025 18:38
@djdongjin djdongjin changed the title Support multiple cni plugin dirs Support multiple cni plugin bin dirs Jan 28, 2025
@djdongjin djdongjin marked this pull request as ready for review January 28, 2025 19:10
@dosubot dosubot bot added area/cri Container Runtime Interface (CRI) kind/feature labels Jan 28, 2025
@djdongjin djdongjin force-pushed the multiple-cni-dir branch 2 times, most recently from be0abf4 to d248d2d Compare February 5, 2025 17:06
@djdongjin djdongjin requested a review from dcantah February 5, 2025 17:07
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to find the reason we never did this..

anyhow I don't see a good reason to not support it outside we've been talking about switching to rpc and this would be an intermediate step in that case.

Let's add the validation and do some integration testing to make sure it's all tied in well with the go-cni and cni libs should work as cni already has the plugin dir list but still let's test it

@djdongjin
Copy link
Member Author

Let's add the validation and do some integration testing

@mikebrow Thanks for the review and suggestion, will add tests to verify the config-side change and the new behavior of supporting multiple bin dirs.

Also will mark NetworkPluginBinDir as Deprecated and migrate its usage in containerd :)

@MikeZappa87
Copy link
Contributor

trying to find the reason we never did this..

anyhow I don't see a good reason to not support it outside we've been talking about switching to rpc and this would be an intermediate step in that case.

Let's add the validation and do some integration testing to make sure it's all tied in well with the go-cni and cni libs should work as cni already has the plugin dir list but still let's test it

I remember us talking about this years ago. I’ll take a look at this once I’m home

@MikeZappa87
Copy link
Contributor

This looks good. Not sure why we didn't do this earlier? Since this function already takes a slice of string. https://github.com/containerd/go-cni/blob/f4736bb1d1b10293d9c484cbf79254a5efa1e020/opts.go#L44

@djdongjin
Copy link
Member Author

/retest

@djdongjin
Copy link
Member Author

djdongjin commented Feb 10, 2025

Hi @mikebrow , I added tests checking that the two fields cannot be used together, and that we migrated bin_dir to bin_dirs during config loading.

Currently we use bin_dir in our integration test config setup and use it for all integration test. So this change should be covered E2E by existing integration tests (i.e.,

  1. use bin_dir in config;
  2. move bin_dir to bin_dirs in config loading;
  3. cri passes the bin_dirs slice to go-cni directly.

PTAL, let me know if you have any other comments. thanks!

cni.WithPluginConfDir(dir),
cni.WithPluginMaxConfNum(max),
cni.WithPluginDir([]string{c.config.NetworkPluginBinDir}))
cni.WithPluginDir(c.config.NetworkPluginBinDirs))
Copy link
Member Author

@djdongjin djdongjin Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikebrow actually I just realized one thing: we cannot switch the default here yet, as the old field is not yet removed.

Say if a user only uses bin_dir, we will actually set this default bin_dirs, causing both fields being used and fail our validation. (it's kind of similar to we cannot set the default config_path yet as we haven't removed the deprecated mirros. #6488).

Giving this, how about let's keep the default value unchanged? And do the switch when we remove the old field? (we will still add a deprecation warning saying bin_dir is deprecated)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait... actually the opposite also stands. If we keep the default bin_dir unchanged. If a user set up bin_dirs instead. We will end up in the same situation 😅

Copy link
Member Author

@djdongjin djdongjin Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially we cannot figure out a field is set by default or explicitly by user.

We can do something like this:

  1. move the default to bin_dirs here (keep this change).
  2. if bin_dir is not empty, log a deprecation warning. then check if len(bin_dirs) == 1 && bin_dirs[0] == "/opt/cni/bin"
    2.1. if yes, that means bin_dirs is from default and we can proceeed by merging the two.
    2.2. if no, that means the user set up both bin_dir and bin_dirs, we should error in this case.

WDYT @mikebrow

Copy link
Member

@mikebrow mikebrow Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way it works.. just to confirm...
if the config.toml is unset (file is not present in the path) we go to the compiled in defaults.
if the config.toml is set we use what we get from the config.toml (no matter what version is used as defined in the config) and migrate to the latest version
if the request is made to generate the config defaults as output containerd config default we output that

the concern about "default" should not be an issue unless the field doesn't work when unset... but even then we should not be .. "merging" defaults over unset fields with the exception of there is no config.toml at all

Copy link
Member

@mikebrow mikebrow Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so .. if bin_dir is set in a prior version.. migrate to bin_dirs[] to become the current version ..

pls check my math :-) may require a version bump for the config for 2.1

Copy link
Member Author

@djdongjin djdongjin Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so .. if bin_dir is set in a prior version.. migrate to bin_dirs[] to become the current version

yep that's right, no issue if a user migrates from verson 2 to 3, as bin_dirs doesn't exist there. but the issue is in latest config version 3, we have both bin_dir and bin_dirs. And there is no migration if version is already latest.

Say a user has bin_dir = /my/cni/bin in their version=3 config. Previously it will just overwrite the bin_dir's default value.

Since we move default from bin_dir to bin_dirs, now the user has below at the end of config loading:

bin_dir = /my/cni/bin`     // from user's config
bin_dirs = ['opt/cni/bin'] // from our new default.

This will break our config validation which doesn't allow both fields non-empty. To the user it shouldn't break, becault they still only set up bin_dir only. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the config.toml is set we use what we get from the config.toml (no matter what version is used as defined in the config) and migrate to the latest version

It will be:

  1. migrate toml content first
  2. get the default config value
  3. insert toml content into the config struct (e.g., overwrite default values)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so .. if bin_dir is set in a prior version.. migrate to bin_dirs[] to become the current version

yep that's right, no issue if a user migrates from verson 2 to 3, as bin_dirs doesn't exist there. but the issue is in latest config version 3, we have both bin_dir and bin_dirs. And there is no migration if version is already latest.

Say a user has bin_dir = /my/cni/bin in their version=3 config. Previously it will just overwrite the bin_dir's default value.

Since we move default from bin_dir to bin_dirs, now the user has below at the end of config loading:

bin_dir = /my/cni/bin`     // from user's config
bin_dirs = ['opt/cni/bin'] // from our new default.

This will break our config validation which doesn't allow both fields non-empty. To the user it shouldn't break, becault they still only set up bin_dir only. :)

nod.. needs a new config "version" to migrate to otherwise user will have to do a manual migration.. to 3.0 containerd 2.0 to 3.0++ containerd 2.1 .. surprised we didn't do a bump yet missed we don't have a 4.0 config for 2.1 or some other process to handle new fields minor migrations for exclusive fields.. @samuelkarp @dmcgowan ... so hold off on merging till we get a migration solution from 2.0 to 2.1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could hold off on the migration feature until there is a bump.. keeping it a "manual" operation .. leave the default as it was.. originally.. add the new field description in to the docs with the deprecation.. indicating the old field will be migrated with the next version of the config.. something to that effect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikebrow could you PTAL the latest patch? especially this file: https://github.com/containerd/containerd/pull/11311/files#diff-e8d6dbb2c75752c173c570eabb9961d5b0f3273b682788f2d7eab2f5ffd5d7d6

This should handle the case where a user-set bin_dir unexpectedly conflict with the default bin_dirs.

I think this will make the change backward compatible with 2.0, i.e., regardless if user set bin_dir or not, there is no behavior difference. And when user uses this new bin_dirs, we can still enforce the restriction that you cannot set bin_dir/bin_dirs at the same time.

@djdongjin djdongjin changed the title Support multiple cni plugin bin dirs [WIP] Support multiple cni plugin bin dirs Mar 20, 2025
@djdongjin djdongjin force-pushed the multiple-cni-dir branch 4 times, most recently from bc48d42 to 1e40278 Compare March 20, 2025 19:57
@djdongjin
Copy link
Member Author

/test pull-containerd-node-e2e

@djdongjin djdongjin changed the title [WIP] Support multiple cni plugin bin dirs Support multiple cni plugin bin dirs Mar 20, 2025
@mikebrow
Copy link
Member

/test pull-containerd-node-e2e

1 similar comment
@djdongjin
Copy link
Member Author

/test pull-containerd-node-e2e

@djdongjin djdongjin requested a review from AkihiroSuda March 21, 2025 00:28
@djdongjin
Copy link
Member Author

/test pull-containerd-node-e2e

@djdongjin
Copy link
Member Author

/retest

@djdongjin djdongjin requested a review from mikebrow March 21, 2025 15:53
@djdongjin
Copy link
Member Author

/test pull-containerd-node-e2e

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a small nit

To make it as DEPRECATED, this PR does the following:

1. Changes config default to use `NetworkPluginBinDirs`;
2. Mark `NetworkPluginBinDir` as deprecated (in config version 3);
3. Add config migration from 2 to 3, which migrates `bin_dir`
  in version 2 to `bin_dirs` in version 3.

Signed-off-by: Jin Dong <djdongjin95@gmail.com>

[wip] add deprecation warning

Signed-off-by: Jin Dong <djdongjin95@gmail.com>
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@djdongjin
Copy link
Member Author

/test pull-containerd-node-e2e

@mikebrow mikebrow added this pull request to the merge queue Mar 24, 2025
Merged via the queue into containerd:main with commit c9edaba Mar 24, 2025
59 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Mar 24, 2025
mansikulkarni96 added a commit to mansikulkarni96/containerd that referenced this pull request Dec 4, 2025
containerd 2.1.0

Welcome to the v2.1.0 release of containerd!

The first minor release of containerd 2.x focuses on continued stability alongside
new features and improvements. This is the first time-based released for containerd.
Most the feature set and core functionality has long been stable and harderened in production
environments, so now we transition to a balance of timely delivery of new functionality
with the same high confidence in stability and performance.

* Add no_sync option to boost boltDB performance on ephemeral environments ([containerd#10745](containerd#10745))
* Add content create event ([containerd#11006](containerd#11006))
* Erofs snapshotter and differ ([containerd#10705](containerd#10705))

* Update CRI to use transfer service for image pull by default ([containerd#8515](containerd#8515))
* Support multiple cni plugin bin dirs ([containerd#11311](containerd#11311))
* Support container restore through CRI/Kubernetes ([containerd#10365](containerd#10365))
* Add OCI/Image Volume Source support ([containerd#10579](containerd#10579))
* Enable Writable cgroups for unprivileged containers ([containerd#11131](containerd#11131))
* Fix recursive RLock() mutex acquisition ([containerd/go-cni#126](containerd/go-cni#126))
* Support CNI STATUS Verb ([containerd/go-cni#123](containerd/go-cni#123))

* Retry last registry host on 50x responses ([containerd#11484](containerd#11484))
* Multipart layer fetch ([containerd#10177](containerd#10177))
* Enable HTTP debug and trace for transfer based puller ([containerd#10762](containerd#10762))
* Add support for unpacking custom media types  ([containerd#11744](containerd#11744))
* Add dial timeout field to hosts toml configuration ([containerd#11106](containerd#11106))

* Expose Pod assigned IPs to NRI plugins ([containerd#10921](containerd#10921))

* Support multiple uid/gid mappings ([containerd#10722](containerd#10722))
* Fix race between serve and immediate shutdown on the server ([containerd/ttrpc#175](containerd/ttrpc#175))

* Update FreeBSD defaults and re-organize platform defaults ([containerd#11017](containerd#11017))

* Postpone cri config deprecations to v2.2 ([containerd#11684](containerd#11684))
* Remove deprecated dynamic library plugins ([containerd#11683](containerd#11683))
* Remove the support for Schema 1 images ([containerd#11681](containerd#11681))

Please try out the release binaries and report any issues at
https://github.com/containerd/containerd/issues.

* Derek McGowan
* Phil Estes
* Akihiro Suda
* Maksym Pavlenko
* Jin Dong
* Wei Fu
* Sebastiaan van Stijn
* Samuel Karp
* Mike Brown
* Adrien Delorme
* Austin Vazquez
* Akhil Mohan
* Kazuyoshi Kato
* Henry Wang
* Gao Xiang
* ningmingxiao
* Krisztian Litkey
* Yang Yang
* Archit Kulkarni
* Chris Henzie
* Iceber Gu
* Alexey Lunev
* Antonio Ojea
* Davanum Srinivas
* Marat Radchenko
* Michael Zappa
* Paweł Gronowski
* Rodrigo Campos
* Alberto Garcia Hierro
* Amit Barve
* Andrey Smirnov
* Divya
* Etienne Champetier
* Kirtana Ashok
* Philip Laine
* QiPing Wan
* fengwei0328
* zounengren
* Adrian Reber
* Alfred Wingate
* Amal Thundiyil
* Athos Ribeiro
* Brian Goff
* Cesar Talledo
* ChengyuZhu6
* Chongyi Zheng
* Craig Ingram
* Danny Canter
* David Son
* Fupan Li
* HirazawaUi
* Jing Xu
* Jonathan A. Sternberg
* Jose Fernandez
* Kaita Nakamura
* Kohei Tokunaga
* Lei Liu
* Marco Visin
* Mike Baynton
* Qiyuan Liang
* Sameer
* Shiming Zhang
* Swagat Bora
* Teresaliu
* Tony Fang
* Tõnis Tiigi
* Vered Rosen
* Vinayak Goyal
* bo.jiang
* chriskery
* luchenhan
* mahmut
* zhaixiaojuan

* **github.com/Microsoft/hcsshim**                                                 v0.12.9 -> v0.13.0-rc.3
* **github.com/cilium/ebpf**                                                       v0.11.0 -> v0.16.0
* **github.com/containerd/cgroups/v3**                                             v3.0.3 -> v3.0.5
* **github.com/containerd/containerd/api**                                         v1.8.0 -> v1.9.0
* **github.com/containerd/continuity**                                             v0.4.4 -> v0.4.5
* **github.com/containerd/go-cni**                                                 v1.1.10 -> v1.1.12
* **github.com/containerd/imgcrypt/v2**                                            v2.0.0-rc.1 -> v2.0.1
* **github.com/containerd/otelttrpc**                                              ea5083fda723 -> v0.1.0
* **github.com/containerd/platforms**                                              v1.0.0-rc.0 -> v1.0.0-rc.1
* **github.com/containerd/ttrpc**                                                  v1.2.6 -> v1.2.7
* **github.com/containerd/typeurl/v2**                                             v2.2.2 -> v2.2.3
* **github.com/containernetworking/cni**                                           v1.2.3 -> v1.3.0
* **github.com/containernetworking/plugins**                                       v1.5.1 -> v1.7.1
* **github.com/containers/ocicrypt**                                               v1.2.0 -> v1.2.1
* **github.com/davecgh/go-spew**                                                   d8f796af33cc -> v1.1.1
* **github.com/fsnotify/fsnotify**                                                 v1.7.0 -> v1.9.0
* **github.com/go-jose/go-jose/v4**                                                v4.0.4 -> v4.0.5
* **github.com/google/go-cmp**                                                     v0.6.0 -> v0.7.0
* **github.com/grpc-ecosystem/grpc-gateway/v2**                                    v2.22.0 -> v2.26.1
* **github.com/klauspost/compress**                                                v1.17.11 -> v1.18.0
* **github.com/mdlayher/socket**                                                   v0.4.1 -> v0.5.1
* **github.com/moby/spdystream**                                                   v0.4.0 -> v0.5.0
* **github.com/moby/sys/user**                                                     v0.3.0 -> v0.4.0
* **github.com/opencontainers/image-spec**                                         v1.1.0 -> v1.1.1
* **github.com/opencontainers/runtime-spec**                                       v1.2.0 -> v1.2.1
* **github.com/opencontainers/selinux**                                            v1.11.1 -> v1.12.0
* **github.com/pelletier/go-toml/v2**                                              v2.2.3 -> v2.2.4
* **github.com/petermattis/goid**                                                  4fcff4a6cae7 **_new_**
* **github.com/pmezard/go-difflib**                                                5d4384ee4fb2 -> v1.0.0
* **github.com/prometheus/client_golang**                                          v1.20.5 -> v1.22.0
* **github.com/prometheus/common**                                                 v0.55.0 -> v0.62.0
* **github.com/sasha-s/go-deadlock**                                               v0.3.5 **_new_**
* **github.com/smallstep/pkcs7**                                                   v0.1.1 **_new_**
* **github.com/stretchr/testify**                                                  v1.9.0 -> v1.10.0
* **github.com/tchap/go-patricia/v2**                                              v2.3.1 -> v2.3.2
* **github.com/urfave/cli/v2**                                                     v2.27.5 -> v2.27.6
* **github.com/vishvananda/netlink**                                               v1.3.0 -> 0e7078ed04c8
* **github.com/vishvananda/netns**                                                 v0.0.4 -> v0.0.5
* **go.etcd.io/bbolt**                                                             v1.3.11 -> v1.4.0
* **go.opentelemetry.io/auto/sdk**                                                 v1.1.0 **_new_**
* **go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc**  v0.56.0 -> v0.60.0
* **go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp**                v0.56.0 -> v0.60.0
* **go.opentelemetry.io/otel**                                                     v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace**                            v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc**              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp**              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/metric**                                              v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/sdk**                                                 v1.31.0 -> v1.35.0
* **go.opentelemetry.io/otel/trace**                                               v1.31.0 -> v1.35.0
* **go.opentelemetry.io/proto/otlp**                                               v1.3.1 -> v1.5.0
* **golang.org/x/crypto**                                                          v0.28.0 -> v0.36.0
* **golang.org/x/exp**                                                             aacd6d4b4611 -> 2d47ceb2692f
* **golang.org/x/mod**                                                             v0.21.0 -> v0.24.0
* **golang.org/x/net**                                                             v0.30.0 -> v0.38.0
* **golang.org/x/oauth2**                                                          v0.22.0 -> v0.27.0
* **golang.org/x/sync**                                                            v0.8.0 -> v0.14.0
* **golang.org/x/sys**                                                             v0.26.0 -> v0.33.0
* **golang.org/x/term**                                                            v0.25.0 -> v0.30.0
* **golang.org/x/text**                                                            v0.19.0 -> v0.23.0
* **golang.org/x/time**                                                            v0.3.0 -> v0.7.0
* **google.golang.org/genproto/googleapis/api**                                    5fefd90f89a9 -> 56aae31c358a
* **google.golang.org/genproto/googleapis/rpc**                                    324edc3d5d38 -> 56aae31c358a
* **google.golang.org/grpc**                                                       v1.67.1 -> v1.72.0
* **google.golang.org/protobuf**                                                   v1.35.1 -> v1.36.6
* **k8s.io/api**                                                                   v0.31.2 -> v0.32.3
* **k8s.io/apimachinery**                                                          v0.31.2 -> v0.32.3
* **k8s.io/apiserver**                                                             v0.31.2 -> v0.32.3
* **k8s.io/client-go**                                                             v0.31.2 -> v0.32.3
* **k8s.io/cri-api**                                                               v0.31.2 -> v0.32.3
* **k8s.io/kubelet**                                                               v0.31.2 -> v0.32.3
* **k8s.io/utils**                                                                 18e509b52bc8 -> 3ea5e8cea738
* **sigs.k8s.io/json**                                                             bc3834ca7abd -> 9aa6b5e7a4b3
* **sigs.k8s.io/structured-merge-diff/v4**                                         v4.4.1 -> v4.4.2
* **tags.cncf.io/container-device-interface**                                      v0.8.0 -> v1.0.1
* **tags.cncf.io/container-device-interface/specs-go**                             v0.8.0 -> v1.0.0

Previous release can be found at [v2.0.0](https://github.com/containerd/containerd/releases/tag/v2.0.0)
* `containerd-<VERSION>-<OS>-<ARCH>.tar.gz`:         ✅Recommended. Dynamically linked with glibc 2.35 (Ubuntu 22.04).
* `containerd-static-<VERSION>-<OS>-<ARCH>.tar.gz`:  Statically linked. Expected to be used on Linux distributions that do not use glibc >= 2.35. Not position-independent.

In addition to containerd, typically you will have to install [runc](https://github.com/opencontainers/runc/releases)
and [CNI plugins](https://github.com/containernetworking/plugins/releases) from their official sites too.

See also the [Getting Started](https://github.com/containerd/containerd/blob/main/docs/getting-started.md) documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Allow support for multiple CNI directories

9 participants