-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Milestone
Description
Description
We see a problem in production that containerd may leak IP on the node.
Steps to reproduce the issue:
- When pod network setup is quite slow,
RunPodSandboxmay timeout or fail; - Once
RunPodSandboxfails, it tries to teardown the pod network in defer; - However, because CNI is slow, the teardown also failed;
- At this point, the pod sandbox is gone, but the network is not properly tore down.
Proposed solution
We should probably change how RunPodSandbox works.
It should:
- Create the sandbox container first;
- Setup network for the sandbox container;
- Create the sandbox container task.
In this way, when there is any issue in RunPodSandbox, we can still try to cleanup in defer. However, if any cleanup step failed, the sandbox container on disk can still represent the sandbox, and kubelet will try to guarantee it is properly cleaned up eventually.
MrHohn, lyveng, vreon, justin0u0, BSWANG and 7 more