Cilium can miss NodeDelete events from Kubernetes, e.g when nodes get deleted during an upgrade while the cilium agent is down. This leads to leaking of resources (e.g. xfrm states/policies, node IDs) if a subsystem in cilium that subscribes to node events doesn't implement reconciliation/garbage collection. #26298 (comment) offers a handful of steps to reproduce the bug.
Some subsystems already implement garbage collection to a certain degree for themselves, e.g. the wireguard agent, however any feature depending on the linuxNodeHandler for cleanup suffers from the described issue.
This issue should track the design and development of a reconciler that ensures that the cilium agent knows about the same set of nodes that are currently in the Kubernetes cluster, triggering the cleanup of node-specific resources and unifying garbage collection functionality across all subsystems that subscribe to the NodeManager.
An implementation approach could be based on the Generic Reconciler that was merged into the cilium code base for 1.16
Cilium can miss
NodeDeleteevents from Kubernetes, e.g when nodes get deleted during an upgrade while the cilium agent is down. This leads to leaking of resources (e.g. xfrm states/policies, node IDs) if a subsystem in cilium that subscribes to node events doesn't implement reconciliation/garbage collection. #26298 (comment) offers a handful of steps to reproduce the bug.Some subsystems already implement garbage collection to a certain degree for themselves, e.g. the wireguard agent, however any feature depending on the
linuxNodeHandlerfor cleanup suffers from the described issue.This issue should track the design and development of a reconciler that ensures that the cilium agent knows about the same set of nodes that are currently in the Kubernetes cluster, triggering the cleanup of node-specific resources and unifying garbage collection functionality across all subsystems that subscribe to the
NodeManager.An implementation approach could be based on the Generic Reconciler that was merged into the cilium code base for 1.16