[network-driver]: Add configuration management in the operator#44501
[network-driver]: Add configuration management in the operator#44501pippolo84 wants to merge 8 commits intocilium:feature/dra-driverfrom
Conversation
5bc9d75 to
b051947
Compare
a842d82 to
70966b4
Compare
7796edf to
c187d7f
Compare
CiliumNetworkDriverNodeConfigList was not listed among the known types and therefore it was not registered by the operator. Fixes: e166ccf ("network driver: add CRDs for cluster and node configs") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
…atus Report conditions in CiliumNetworkDriverClusterConfig Status. This allows to report to a cluster operator a conflict between driver cluster configurations selecting the same set of nodes. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network driver module in the Cilium Operator will be responsible of the CiliumNetworkDriverClusterConfig objects handling. The operator will ingest all the driver cluster configurations and will produce a CiliumNetworkDriverNodeConfig for each node selected by a cluster configuration. In case of conflicting cluster configurations (that is, multiple cluster configurations selecting the same set of nodes) the operator will not create any node configuration and will report an error condition in the CiliumNetworkDriverClusterConfig Status instead. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network-driver operator module needs to watch the CiliumNodes to: - create a CiliumNetworkDriverNodeConfig obj for each new node that is selected by a CiliumNetworkDriverClusterConfig already applied - delete the CiliumNetworkDriverNodeConfig obj for each node that has been deleted This commit adds a k8s reflector to push into the k8s-cilium-nodes stateDB table any change to the CiliumNodes k8s objects. The logic to handle these changes will be added in a subsequent commit. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
…ctor The network driver operator module needs to watch the CiliumNetworkDriverClusterConfigs to: - Upsert a CiliumNetworkDriverNodeConfig for each selected node in the applied CiliumNetworkDriverClusterConfig object - Delete all CiliumNetworkDriverConfig that were generated by a deleted CiliumNetworkDriverClusterConfig This commit adds a k8s reflector to push into the k8s-netdriver-cluster-config stateDB table any change to the CiliumNetworkDriverClusterConfig k8s objects. The logic to handle these changes will be added in a subsequent commit. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a stateDB table for the internal model of CiliumNetworkDriverNodeConfig. This will allow the driver configuration manager, to be added in a subsequent commit, to reconcile the current status of the node specific configurations with the changes observed in both cluster configuration and nodes. The reconciliation is done through a reconciler that writes the k8s node configurations according to the desired state in the table. The Prune operations allows to remove stale CiliumNetworkDriverNodeConfig objects from a previous run. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
The network driver configuration manager watches changes in CiliumNetworkDriverClusterConfig and CiliumNode to reconcile the state of CiliumNetworkDriverNodeConfig accordingly. The manager is composed of three independent jobs: 1) the first one rebuilds the initial snapshot of the node configurations and, once finished, it initializes the driver node configuration table to enable the pruning of stale node configurations 2) the second one handles changes to the CiliumNodes and reconciles the node specific configurations accordingly 3) the third one handles changes to the CiliumNetworkDriverClusterConfigs and reconciles the node specific configurations accordingly If multiple cluster configurations select the same set of nodes, a conflict error is reported in the conflicting CiliumNetworkDriverClusterConfig. In this case, no change is propagated to the already applied node configuration: this should reduce unwanted spurious network driver restarts. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Separate the IPAM part of the driver and the configuration management one in two different group, and include both in the Cilium operator network driver module. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
c187d7f to
f994557
Compare
| continue | ||
| } | ||
|
|
||
| if _, _, err := p.NodeConfigs.Insert(wtxn, &driverNodeConfig{ |
There was a problem hiding this comment.
when we do a restart in the operator, we end up with another for a config that already exists. it'd be great if we could prevent this from happening, as it can cause some (avoidable) churn in the kube api when the operator is restarted
| )) | ||
|
|
||
| p.JobGroup.Add(job.OneShot( | ||
| "network-driver-node-handler", |
There was a problem hiding this comment.
when changing the node labels, there's no config selection to clean up if we end up no longer matching any config
| var conflictErrs []error | ||
|
|
||
| for ciliumNode := range ciliumNodes { | ||
| if !clusterCfg.NodeSelector.Matches(labels.Set(ciliumNode.Labels)) { |
There was a problem hiding this comment.
in case we have a config without selector, it would be marked as conflicting. do we want to allow the user to be able to specify a catch-all config, instead of requiring selectors?
also, i wonder if we could support config overrides by the user (manually create a node config targeting a node) and the operator gracefully handling that
| )) | ||
|
|
||
| p.JobGroup.Add(job.OneShot( | ||
| "network-driver-cluster-config-handler", |
There was a problem hiding this comment.
noticed that when we have a conflicting config, and delete the other configs we have, we do not reconcile and unmark the conflict.
in addition to that, selection is not triggered again, it seems - so when we delete the old config, we never re-evaluate the current state to create a new ciliumnetworkdrivernodeconfig if necessary
-- WORK IN PROGRESS --