IP Address Leak with Chained CNI Plugins When Teardown Fails

### Description

When using chained CNI plugins, if a pod fails to be created due to an error in a later CNI plugin, an IP address can be leaked from an earlier plugin in the chain in every iteration of sandbox creation. Consequently, a large number of IP addresses might be allocated and not cleaned up from a single pod creation in a relatively short amount of time, until no IP addresses available.

### Steps to reproduce the issue

This occurs when the following sequence of events happens:

1. During sandbox creation, `CNI ADD` is called for the CNI plugin chain.
2. The first plugin (e.g., `host-local` IPAM) successfully allocates an IP address.
3. A subsequent plugin in the chain fails the `CNI ADD` operation. This causes the overall `setupPodNetwork` step to fail.
4. Containerd then attempts to clean up the failed sandbox.
5. During cleanup, `teardownPodNetwork` is called, which invokes `CNI DEL` on the plugin chain.
6. The same CNI plugin that failed on `ADD` also fails on `DEL` (note in practice, this might be transient, e.g. CNI provider may happen to be loaded and unable to process the request).
7. Because the teardown failed, and the initial setup also failed (`CNIResult` was never populated), containerd ignores the teardown error and considers network cleanup finished.
8. As a result, the `CNI DEL` command is never sent to the first plugin, and the IP address it allocated is permanently leaked. Kubelet retries creating the pod, leading to multiple leaked IPs for a single pod creation attempt.


### Describe the results you received and expected

In the case illustrated above, a Kubernetes node will experience IP leakage and in worst case burning all available IP on the node in a short amount of time.

In a more general view, it seems containerd is more fragile to churns/errors in CNI plugins, especially the ones used in a chain pattern.

Ask - The logic that swallows the `cleanupErr` should be reconsidered. Even if the initial setup failed, a teardown error indicates that some resources might be left behind and cleanup should be retried or handled more robustly.

### What version of containerd are you using?

Containerd v1.7.22+ or v1.6.37+.

### Any other relevant information

The skipping behavior for `teardownPodNetwork` was added in https://github.com/containerd/containerd/pull/10744 and got cherrypicked into [1.7](https://github.com/containerd/containerd/pull/10767) and [1.6](https://github.com/containerd/containerd/pull/10767).

### Show configuration if it is related to CRI plugin.

N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP Address Leak with Chained CNI Plugins When Teardown Fails #12130

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Show configuration if it is related to CRI plugin.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IP Address Leak with Chained CNI Plugins When Teardown Fails #12130

Description

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Show configuration if it is related to CRI plugin.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions