Description
When using chained CNI plugins, if a pod fails to be created due to an error in a later CNI plugin, an IP address can be leaked from an earlier plugin in the chain in every iteration of sandbox creation. Consequently, a large number of IP addresses might be allocated and not cleaned up from a single pod creation in a relatively short amount of time, until no IP addresses available.
Steps to reproduce the issue
This occurs when the following sequence of events happens:
- During sandbox creation,
CNI ADD is called for the CNI plugin chain.
- The first plugin (e.g.,
host-local IPAM) successfully allocates an IP address.
- A subsequent plugin in the chain fails the
CNI ADD operation. This causes the overall setupPodNetwork step to fail.
- Containerd then attempts to clean up the failed sandbox.
- During cleanup,
teardownPodNetwork is called, which invokes CNI DEL on the plugin chain.
- The same CNI plugin that failed on
ADD also fails on DEL (note in practice, this might be transient, e.g. CNI provider may happen to be loaded and unable to process the request).
- Because the teardown failed, and the initial setup also failed (
CNIResult was never populated), containerd ignores the teardown error and considers network cleanup finished.
- As a result, the
CNI DEL command is never sent to the first plugin, and the IP address it allocated is permanently leaked. Kubelet retries creating the pod, leading to multiple leaked IPs for a single pod creation attempt.
Describe the results you received and expected
In the case illustrated above, a Kubernetes node will experience IP leakage and in worst case burning all available IP on the node in a short amount of time.
In a more general view, it seems containerd is more fragile to churns/errors in CNI plugins, especially the ones used in a chain pattern.
Ask - The logic that swallows the cleanupErr should be reconsidered. Even if the initial setup failed, a teardown error indicates that some resources might be left behind and cleanup should be retried or handled more robustly.
What version of containerd are you using?
Containerd v1.7.22+ or v1.6.37+.
Any other relevant information
The skipping behavior for teardownPodNetwork was added in #10744 and got cherrypicked into 1.7 and 1.6.
Show configuration if it is related to CRI plugin.
N/A
Description
When using chained CNI plugins, if a pod fails to be created due to an error in a later CNI plugin, an IP address can be leaked from an earlier plugin in the chain in every iteration of sandbox creation. Consequently, a large number of IP addresses might be allocated and not cleaned up from a single pod creation in a relatively short amount of time, until no IP addresses available.
Steps to reproduce the issue
This occurs when the following sequence of events happens:
CNI ADDis called for the CNI plugin chain.host-localIPAM) successfully allocates an IP address.CNI ADDoperation. This causes the overallsetupPodNetworkstep to fail.teardownPodNetworkis called, which invokesCNI DELon the plugin chain.ADDalso fails onDEL(note in practice, this might be transient, e.g. CNI provider may happen to be loaded and unable to process the request).CNIResultwas never populated), containerd ignores the teardown error and considers network cleanup finished.CNI DELcommand is never sent to the first plugin, and the IP address it allocated is permanently leaked. Kubelet retries creating the pod, leading to multiple leaked IPs for a single pod creation attempt.Describe the results you received and expected
In the case illustrated above, a Kubernetes node will experience IP leakage and in worst case burning all available IP on the node in a short amount of time.
In a more general view, it seems containerd is more fragile to churns/errors in CNI plugins, especially the ones used in a chain pattern.
Ask - The logic that swallows the
cleanupErrshould be reconsidered. Even if the initial setup failed, a teardown error indicates that some resources might be left behind and cleanup should be retried or handled more robustly.What version of containerd are you using?
Containerd v1.7.22+ or v1.6.37+.
Any other relevant information
The skipping behavior for
teardownPodNetworkwas added in #10744 and got cherrypicked into 1.7 and 1.6.Show configuration if it is related to CRI plugin.
N/A