Skip to content

[network-driver]: Add configuration management in the operator#44501

Draft
pippolo84 wants to merge 8 commits intocilium:feature/dra-driverfrom
pippolo84:pr/pippolo84/network-driver-operator-config
Draft

[network-driver]: Add configuration management in the operator#44501
pippolo84 wants to merge 8 commits intocilium:feature/dra-driverfrom
pippolo84:pr/pippolo84/network-driver-operator-config

Conversation

@pippolo84
Copy link
Copy Markdown
Member

-- WORK IN PROGRESS --

@pippolo84 pippolo84 force-pushed the pr/pippolo84/network-driver-operator-config branch from 5bc9d75 to b051947 Compare March 24, 2026 18:17
@pippolo84 pippolo84 force-pushed the feature/dra-driver branch 3 times, most recently from a842d82 to 70966b4 Compare March 25, 2026 16:14
@pippolo84 pippolo84 force-pushed the pr/pippolo84/network-driver-operator-config branch 2 times, most recently from 7796edf to c187d7f Compare March 26, 2026 15:28
@pippolo84 pippolo84 requested a review from bersoare March 26, 2026 15:28
@pippolo84 pippolo84 added the area/dra-plugin Impacts the Cilium Network Driver DRA plugin. label Mar 26, 2026
CiliumNetworkDriverNodeConfigList was not listed among the known types
and therefore it was not registered by the operator.

Fixes: e166ccf ("network driver: add CRDs for cluster and node configs")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
…atus

Report conditions in CiliumNetworkDriverClusterConfig Status. This
allows to report to a cluster operator a conflict between driver cluster
configurations selecting the same set of nodes.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network driver module in the Cilium Operator will be responsible of
the CiliumNetworkDriverClusterConfig objects handling. The operator will
ingest all the driver cluster configurations and will produce a
CiliumNetworkDriverNodeConfig for each node selected by a cluster
configuration. In case of conflicting cluster configurations (that is,
multiple cluster configurations selecting the same set of nodes) the
operator will not create any node configuration and will report an error
condition in the CiliumNetworkDriverClusterConfig Status instead.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network-driver operator module needs to watch the CiliumNodes to:

- create a CiliumNetworkDriverNodeConfig obj for each new node that is
  selected by a CiliumNetworkDriverClusterConfig already applied
- delete the CiliumNetworkDriverNodeConfig obj for each node that has
  been deleted

This commit adds a k8s reflector to push into the k8s-cilium-nodes
stateDB table any change to the CiliumNodes k8s objects. The logic to
handle these changes will be added in a subsequent commit.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
…ctor

The network driver operator module needs to watch the
CiliumNetworkDriverClusterConfigs to:

- Upsert a CiliumNetworkDriverNodeConfig for each selected node in the
  applied CiliumNetworkDriverClusterConfig object
- Delete all CiliumNetworkDriverConfig that were generated by a deleted
  CiliumNetworkDriverClusterConfig

This commit adds a k8s reflector to push into the
k8s-netdriver-cluster-config stateDB table any change to the
CiliumNetworkDriverClusterConfig k8s objects. The logic to handle these
changes will be added in a subsequent commit.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a stateDB table for the internal model of
CiliumNetworkDriverNodeConfig. This will allow the driver configuration
manager, to be added in a subsequent commit, to reconcile the current
status of the node specific configurations with the changes observed in
both cluster configuration and nodes. The reconciliation is done through
a reconciler that writes the k8s node configurations according to the
desired state in the table. The Prune operations allows to remove stale
CiliumNetworkDriverNodeConfig objects from a previous run.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network driver configuration manager watches changes in
CiliumNetworkDriverClusterConfig and CiliumNode to reconcile the state
of CiliumNetworkDriverNodeConfig accordingly. The manager is composed of
three independent jobs:

1) the first one rebuilds the initial snapshot of the node
   configurations and, once finished, it initializes the driver node
   configuration table to enable the pruning of stale node configurations
2) the second one handles changes to the CiliumNodes and reconciles the
   node specific configurations accordingly
3) the third one handles changes to the
   CiliumNetworkDriverClusterConfigs and reconciles the node specific
   configurations accordingly

If multiple cluster configurations select the same set of nodes, a
conflict error is reported in the conflicting
CiliumNetworkDriverClusterConfig. In this case, no change is propagated
to the already applied node configuration: this should reduce unwanted
spurious network driver restarts.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Separate the IPAM part of the driver and the configuration management
one in two different group, and include both in the Cilium operator
network driver module.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/network-driver-operator-config branch from c187d7f to f994557 Compare March 26, 2026 17:10
continue
}

if _, _, err := p.NodeConfigs.Insert(wtxn, &driverNodeConfig{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we do a restart in the operator, we end up with another for a config that already exists. it'd be great if we could prevent this from happening, as it can cause some (avoidable) churn in the kube api when the operator is restarted

))

p.JobGroup.Add(job.OneShot(
"network-driver-node-handler",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when changing the node labels, there's no config selection to clean up if we end up no longer matching any config

var conflictErrs []error

for ciliumNode := range ciliumNodes {
if !clusterCfg.NodeSelector.Matches(labels.Set(ciliumNode.Labels)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we have a config without selector, it would be marked as conflicting. do we want to allow the user to be able to specify a catch-all config, instead of requiring selectors?
also, i wonder if we could support config overrides by the user (manually create a node config targeting a node) and the operator gracefully handling that

))

p.JobGroup.Add(job.OneShot(
"network-driver-cluster-config-handler",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed that when we have a conflicting config, and delete the other configs we have, we do not reconcile and unmark the conflict.
in addition to that, selection is not triggered again, it seems - so when we delete the old config, we never re-evaluate the current state to create a new ciliumnetworkdrivernodeconfig if necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/dra-plugin Impacts the Cilium Network Driver DRA plugin. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants