Skip to content

network driver: add nodeSelector to CiliumNetworkDriverConfig CRD#43797

Closed
bersoare wants to merge 41 commits intocilium:feature/dra-driverfrom
bersoare:pr/bersoare/configselector
Closed

network driver: add nodeSelector to CiliumNetworkDriverConfig CRD#43797
bersoare wants to merge 41 commits intocilium:feature/dra-driverfrom
bersoare:pr/bersoare/configselector

Conversation

@bersoare
Copy link
Copy Markdown
Contributor

@bersoare bersoare commented Jan 15, 2026

With the Cilium Network Driver, we're likely dealing with specialized HW (ex: sr-iov capabiel nic), and such resources might not be present on all the nodes on a cluster. In other words, nodes may or may not need to run the Network Driver at all.

By adding a nodeSelector to the config we can control which nodes (or set of nodes) will receive a configuration and start the network driver.

add nodeSelector to CiliumNetworkDriverConfig CRD

pippolo84 and others added 30 commits December 5, 2025 11:51
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
network-driver is a new feature introduced that enables
exposing unmanaged network interfaces to pods.

adding sr-iov interface support and basic netdev configuration
inside the pod

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add back missing volume mount for /var/run/nri. Without this the plugin
is unable to connect to the NRI socket and fails with the following
error:

```
msg="NRI plugin failed" module=agent.controlplane.network-driver error="failed to connect to NRI service: dial unix /var/run/nri/nri.sock: connect: no such file or directory" name=dummy.cilium.k8s.io
```

Fixes: e663e3c ("network-driver: add new network-driver feature pkg")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
If the start of the NRI plugin fails, wait a second before retrying to
restart it. This avoid consuming too much CPU in case of a persistent
failure.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
In commit e663e3c the network driver has been reworked to add a
modular structure.  Add a module to support dummy devices and an example
configuration to start the dumme devices support.

Related: e663e3c ("network-driver: add new network-driver feature pkg")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
as part of allowing the Network Driver to be configured
by an operator, add the CRD spec so we can
have configurations deployed to the cluster

adding the CRD definitions, as well as the generated code

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
collect the CRD for the network driver configuration
with the sysdump command

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
NRI and DRA frameworks are not available in
older kube versions, so perform a version check for mounting
the paths.

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
will likely fail for older versions. ensure
we're in at least kube v1.34

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
maps make it hard to change the json tag for the keys,
and using an array makes the CRD looks cleaner as
we can decouple the field name from the json tag without
having to implement a custom marshaler.

also adding some default values that were missing

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
no longer use the config types we were using previously,
since now we want to get the config from the crd

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
adjusting the current config validation to validate
the struct from the CRD

removed the config types the code was previously using

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
get and use configuration from the CRD

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
agent now waits for a config at any point in time and starts
the network driver if one ever arrives.

also encapsulated the containerd and dra initialization logic
to make the start sequence more readable. no functional changes
were made

updates to configuration aren't handled yet.

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
continue reading from the channel
even after the first configuration is received,
and log any further update attempts

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
check for an empty config before
accessing the config definitions (and avoid a nil
pointer access)

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
In case of an agent restart without a change to the
CiliumNetworkDriverConfig we are going to receive a Sync after an Upsert
for the resource. In that case, we should read the last available config
version and enable the driver accordingly.  Conversely, if a new version
of the CiliumNetworkDriverConfig is upserted after the initial sync, we
should handle that version.

The commit updates the loop reading the events to correctly handle both
scenarios (upsert + sync and sync + upsert) while discarding further
attempts of updating the config.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
with this we can write tests more easily
without the need of an sr-iov capable nic

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
using netlink attributes to find interface name
and pf name. some error handling improvements

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
added some more test coverage, now ensuring that
the attributes match the expectation

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
this is so we can access and read/write device sr-iov attributes

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
now using a single sysfs path to simplify the mount structure
small refactor to encapsulate the file reading/writing logic
added some additional unit tests to cover sriov vf setup

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
sysfs mock in the sr-iov package has circular links,
and grep complains about it.
this commit excludes the testdata directory from the grep command

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
This is useful for a subsequent commit where we need to JSON marshal and
marshal each devices to restore the driver status after a restart.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
In order to allow the restoration of all device allocations after an
agent restart, all device types should implement binary marshalling and
unmarshalling. The Device interface leaves flexibility about how the
devices should be encoded, but for the sake of simplicity both Dummy and
SR-IOV devices relies on JSON.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add support for device allocations restore after a restart.

This is done writing in the ResourceClaim Status (namely in the
Status.AllocatedDeviceStatus[].Data field) the binary serialization of
each allocated device and, after an agent restart, listing all the
ResourceClaims referenced by the local pods and unmarshalling all the
allocations found.

Moreover, the interface name and IP addresses of each allocated device
is also written in Status.AllocatedDeviceStatus[].NetworkData for
improved visibility of the configuration parameters.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Use the correct DeviceManagerTypeDummy type when loading the dummy
manager configuration from the CiliumNetworkDriverConfig resource.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Copy link
Copy Markdown
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed as a sig-k8s. I have some concerns about how we handle the conflicting nodeSelector. I think it's just a matter of defining how we handle that. Once we define it, it's also worth adding it to the CRD's field description and the documentation.

@bersoare bersoare marked this pull request as draft January 20, 2026 08:52
add selector field to the CiliumNetworkDriverConfig CRD

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
select a config based on the nodeSelector field

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
use a more deterministic config selection logic
1- selects most specific match - more selectors wins
2- if tie, choose the oldest one
3- fallback to config with no selectors

Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
@bersoare bersoare force-pushed the pr/bersoare/configselector branch from 076d5e7 to cc70079 Compare January 20, 2026 17:02
@bersoare
Copy link
Copy Markdown
Contributor Author

@YutaroHayakawa , thanks for the feedback. i implemented some configuration selection logic in cc70079#diff-f1e0656d95af1cea68f8bca36588e5e2ab3e6e2b7affa572ae0f9a0d6bbe224dR84-R107

basically, the agent now

  • collects all configurations on startup
  • after a sync, call selectConfig which picks a config among the ones we received and send it back to the channel
  • if updates are received afterwards, we don't handle it and log an error (right now we dont allow configuration updates)

let me know what you think.

@bersoare bersoare marked this pull request as ready for review January 20, 2026 17:06
cfg = ev.Object.Spec.DeepCopy()
upserted = true
if ev.Object.Spec.NodeSelector == nil {
cfgs[ev.Object.GetUID()] = ev.Object.DeepCopy()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since this map only contains the single kind of resource, UID is overkill. The name would uniquely identifies the resource among same kind.

Comment on lines 254 to 258
// discard updates if we already handled a config
if handled {
driver.logger.InfoContext(
ctx, "config received, but we already have one",
)
Copy link
Copy Markdown
Member

@YutaroHayakawa YutaroHayakawa Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but looks like here we ignore any update for the resource after we are done with the initial list. It also doesn't handle the resource deletion. When the cfg == nil, it provides nil configuration to the channel and never accept any configuration even if users provide it. Are you sure this is intended?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict, the resoure deletion is handled as the config is removed from the map, and the rest of the code outside of the select block still runs.
yes, right now we don't allow further updates once we find a config.
the cfg == nil case looks wrong, as you pointed out. we should only pass a config if it is non nil. i'll fix that

Copy link
Copy Markdown
Member

@YutaroHayakawa YutaroHayakawa Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, right now we don't allow further updates once we find a config.

I think this behavior is very tricky in some state transition case. For example.

  1. User created the configuration. It is selected by the node.
  2. User updated the configuration. The change doesn't take effect as we ignore the update.
  3. User restarted the agent. Now the change takes effect.

Another case would be:

  1. User create the configuration. It is selected by the node.
  2. User deleted the configuration. The change doesn't take effect as we just delete the configuration from the internal map and don't do any cleanup for the exposed resources.
  3. User create another configuration potentially with different settings. The stale resources are still there. What happens in this case?

@bersoare
Copy link
Copy Markdown
Contributor Author

moving this to draft until we make a decision on how to handle configuration selection/updates

@pippolo84 pippolo84 force-pushed the feature/dra-driver branch 3 times, most recently from 63c1cd0 to 215ff6c Compare February 2, 2026 09:17
@pippolo84 pippolo84 force-pushed the feature/dra-driver branch 2 times, most recently from 69cee71 to ffb12cc Compare February 10, 2026 10:16
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 13, 2026
@bersoare
Copy link
Copy Markdown
Contributor Author

superseded by #44761

@bersoare bersoare closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cilium-cli This PR contains changes related with cilium-cli dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants