network driver: add nodeSelector to CiliumNetworkDriverConfig CRD#43797
network driver: add nodeSelector to CiliumNetworkDriverConfig CRD#43797bersoare wants to merge 41 commits intocilium:feature/dra-driverfrom
Conversation
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
network-driver is a new feature introduced that enables exposing unmanaged network interfaces to pods. adding sr-iov interface support and basic netdev configuration inside the pod Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add back missing volume mount for /var/run/nri. Without this the plugin is unable to connect to the NRI socket and fails with the following error: ``` msg="NRI plugin failed" module=agent.controlplane.network-driver error="failed to connect to NRI service: dial unix /var/run/nri/nri.sock: connect: no such file or directory" name=dummy.cilium.k8s.io ``` Fixes: e663e3c ("network-driver: add new network-driver feature pkg") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
If the start of the NRI plugin fails, wait a second before retrying to restart it. This avoid consuming too much CPU in case of a persistent failure. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
In commit e663e3c the network driver has been reworked to add a modular structure. Add a module to support dummy devices and an example configuration to start the dumme devices support. Related: e663e3c ("network-driver: add new network-driver feature pkg") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
as part of allowing the Network Driver to be configured by an operator, add the CRD spec so we can have configurations deployed to the cluster adding the CRD definitions, as well as the generated code Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
collect the CRD for the network driver configuration with the sysdump command Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
NRI and DRA frameworks are not available in older kube versions, so perform a version check for mounting the paths. Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
will likely fail for older versions. ensure we're in at least kube v1.34 Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
maps make it hard to change the json tag for the keys, and using an array makes the CRD looks cleaner as we can decouple the field name from the json tag without having to implement a custom marshaler. also adding some default values that were missing Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
no longer use the config types we were using previously, since now we want to get the config from the crd Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
adjusting the current config validation to validate the struct from the CRD removed the config types the code was previously using Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
get and use configuration from the CRD Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
agent now waits for a config at any point in time and starts the network driver if one ever arrives. also encapsulated the containerd and dra initialization logic to make the start sequence more readable. no functional changes were made updates to configuration aren't handled yet. Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
continue reading from the channel even after the first configuration is received, and log any further update attempts Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
check for an empty config before accessing the config definitions (and avoid a nil pointer access) Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
In case of an agent restart without a change to the CiliumNetworkDriverConfig we are going to receive a Sync after an Upsert for the resource. In that case, we should read the last available config version and enable the driver accordingly. Conversely, if a new version of the CiliumNetworkDriverConfig is upserted after the initial sync, we should handle that version. The commit updates the loop reading the events to correctly handle both scenarios (upsert + sync and sync + upsert) while discarding further attempts of updating the config. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
with this we can write tests more easily without the need of an sr-iov capable nic Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
using netlink attributes to find interface name and pf name. some error handling improvements Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
added some more test coverage, now ensuring that the attributes match the expectation Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
this is so we can access and read/write device sr-iov attributes Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
now using a single sysfs path to simplify the mount structure small refactor to encapsulate the file reading/writing logic added some additional unit tests to cover sriov vf setup Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
sysfs mock in the sr-iov package has circular links, and grep complains about it. this commit excludes the testdata directory from the grep command Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
This is useful for a subsequent commit where we need to JSON marshal and marshal each devices to restore the driver status after a restart. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
In order to allow the restoration of all device allocations after an agent restart, all device types should implement binary marshalling and unmarshalling. The Device interface leaves flexibility about how the devices should be encoded, but for the sake of simplicity both Dummy and SR-IOV devices relies on JSON. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add support for device allocations restore after a restart. This is done writing in the ResourceClaim Status (namely in the Status.AllocatedDeviceStatus[].Data field) the binary serialization of each allocated device and, after an agent restart, listing all the ResourceClaims referenced by the local pods and unmarshalling all the allocations found. Moreover, the interface name and IP addresses of each allocated device is also written in Status.AllocatedDeviceStatus[].NetworkData for improved visibility of the configuration parameters. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Use the correct DeviceManagerTypeDummy type when loading the dummy manager configuration from the CiliumNetworkDriverConfig resource. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
YutaroHayakawa
left a comment
There was a problem hiding this comment.
Reviewed as a sig-k8s. I have some concerns about how we handle the conflicting nodeSelector. I think it's just a matter of defining how we handle that. Once we define it, it's also worth adding it to the CRD's field description and the documentation.
add selector field to the CiliumNetworkDriverConfig CRD Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
select a config based on the nodeSelector field Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
use a more deterministic config selection logic 1- selects most specific match - more selectors wins 2- if tie, choose the oldest one 3- fallback to config with no selectors Signed-off-by: Bernardo Soares <20172413+bersoare@users.noreply.github.com>
076d5e7 to
cc70079
Compare
|
@YutaroHayakawa , thanks for the feedback. i implemented some configuration selection logic in cc70079#diff-f1e0656d95af1cea68f8bca36588e5e2ab3e6e2b7affa572ae0f9a0d6bbe224dR84-R107 basically, the agent now
let me know what you think. |
| cfg = ev.Object.Spec.DeepCopy() | ||
| upserted = true | ||
| if ev.Object.Spec.NodeSelector == nil { | ||
| cfgs[ev.Object.GetUID()] = ev.Object.DeepCopy() |
There was a problem hiding this comment.
Nit: Since this map only contains the single kind of resource, UID is overkill. The name would uniquely identifies the resource among same kind.
| // discard updates if we already handled a config | ||
| if handled { | ||
| driver.logger.InfoContext( | ||
| ctx, "config received, but we already have one", | ||
| ) |
There was a problem hiding this comment.
Not related to this PR, but looks like here we ignore any update for the resource after we are done with the initial list. It also doesn't handle the resource deletion. When the cfg == nil, it provides nil configuration to the channel and never accept any configuration even if users provide it. Are you sure this is intended?
There was a problem hiding this comment.
afaict, the resoure deletion is handled as the config is removed from the map, and the rest of the code outside of the select block still runs.
yes, right now we don't allow further updates once we find a config.
the cfg == nil case looks wrong, as you pointed out. we should only pass a config if it is non nil. i'll fix that
There was a problem hiding this comment.
yes, right now we don't allow further updates once we find a config.
I think this behavior is very tricky in some state transition case. For example.
- User created the configuration. It is selected by the node.
- User updated the configuration. The change doesn't take effect as we ignore the update.
- User restarted the agent. Now the change takes effect.
Another case would be:
- User create the configuration. It is selected by the node.
- User deleted the configuration. The change doesn't take effect as we just delete the configuration from the internal map and don't do any cleanup for the exposed resources.
- User create another configuration potentially with different settings. The stale resources are still there. What happens in this case?
|
moving this to draft until we make a decision on how to handle configuration selection/updates |
63c1cd0 to
215ff6c
Compare
69cee71 to
ffb12cc
Compare
|
This pull request has been automatically marked as stale because it |
|
superseded by #44761 |
With the Cilium Network Driver, we're likely dealing with specialized HW (ex: sr-iov capabiel nic), and such resources might not be present on all the nodes on a cluster. In other words, nodes may or may not need to run the Network Driver at all.
By adding a nodeSelector to the config we can control which nodes (or set of nodes) will receive a configuration and start the network driver.