Skip to content

IP Address Leak with Chained CNI Plugins When Teardown Fails #12130

@MrHohn

Description

@MrHohn

Description

When using chained CNI plugins, if a pod fails to be created due to an error in a later CNI plugin, an IP address can be leaked from an earlier plugin in the chain in every iteration of sandbox creation. Consequently, a large number of IP addresses might be allocated and not cleaned up from a single pod creation in a relatively short amount of time, until no IP addresses available.

Steps to reproduce the issue

This occurs when the following sequence of events happens:

  1. During sandbox creation, CNI ADD is called for the CNI plugin chain.
  2. The first plugin (e.g., host-local IPAM) successfully allocates an IP address.
  3. A subsequent plugin in the chain fails the CNI ADD operation. This causes the overall setupPodNetwork step to fail.
  4. Containerd then attempts to clean up the failed sandbox.
  5. During cleanup, teardownPodNetwork is called, which invokes CNI DEL on the plugin chain.
  6. The same CNI plugin that failed on ADD also fails on DEL (note in practice, this might be transient, e.g. CNI provider may happen to be loaded and unable to process the request).
  7. Because the teardown failed, and the initial setup also failed (CNIResult was never populated), containerd ignores the teardown error and considers network cleanup finished.
  8. As a result, the CNI DEL command is never sent to the first plugin, and the IP address it allocated is permanently leaked. Kubelet retries creating the pod, leading to multiple leaked IPs for a single pod creation attempt.

Describe the results you received and expected

In the case illustrated above, a Kubernetes node will experience IP leakage and in worst case burning all available IP on the node in a short amount of time.

In a more general view, it seems containerd is more fragile to churns/errors in CNI plugins, especially the ones used in a chain pattern.

Ask - The logic that swallows the cleanupErr should be reconsidered. Even if the initial setup failed, a teardown error indicates that some resources might be left behind and cleanup should be retried or handled more robustly.

What version of containerd are you using?

Containerd v1.7.22+ or v1.6.37+.

Any other relevant information

The skipping behavior for teardownPodNetwork was added in #10744 and got cherrypicked into 1.7 and 1.6.

Show configuration if it is related to CRI plugin.

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/criContainer Runtime Interface (CRI)kind/bug

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions