Skip to content

Conversation

@tnqn
Copy link
Contributor

@tnqn tnqn commented Jun 4, 2021

Similar to other deferred cleanup operations, teardownPodNetwork should use a different context as the original context may have expired, otherwise CNI wouldn't been invoked, leading to leak of network resources, e.g. IP addresses.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #5438

Similar to other deferred cleanup operations, teardownPodNetwork should
use a different context as the original context may have expired,
otherwise CNI wouldn't been invoked, leading to leak of network
resources, e.g. IP addresses.

Signed-off-by: Quan Tian <qtian@vmware.com>
@k8s-ci-robot
Copy link

Hi @tnqn. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tnqn
Copy link
Contributor Author

tnqn commented Jun 4, 2021

We saw IP addresses were leaked when using Containerd 1.4.4 and Antrea as the CNI, the symptom is that containerd cannot create sandbox container successfully after running a while, failed because of the error:

msg="Failed to destroy network for sandbox \"27902e644b711ed94ac9699f3f588e98a5d28d1dd6a8a25c2f6e829f2df49da3\"" error="netplugin failed with no error message: context canceled"
msg="RunPodSandbox for &PodSandboxMetadata{Name:nginx-rs-new-7d95d89897-nqkcl,Uid:1f5c3c02-b06e-4985-9a65-46f654b0fc57,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container task \"27902e644b711ed94ac9699f3f588e98a5d28d1dd6a8a25c2f6e829f2df49da3\": context canceled: unknown"

The reason that it failed to start sandbox container task may be opencontainers/runc#2865 and may have been fixed by opencontainers/runc#2871.
But on CNI side it didn't receive any CNI DEL call when containerd logged "netplugin failed with no error". I think it's because the context already expired so it failed immediately. #5438 had the same symptom with Weave as the CNI.
Other cleanup operations use a new context to avoid this issue, I think it should apply to CNI cleanup as well.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jun 4, 2021

Build succeeded.

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green

@dims
Copy link
Member

dims commented Jun 7, 2021

/ok-to-test

@dims
Copy link
Member

dims commented Jun 7, 2021

@fuweid let's make sure k8s CI jobs go green as well.

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 7d77b51 into containerd:master Jun 7, 2021
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late with my LGTM.. thx!

@estesp estesp added cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch and removed cherry-pick/1.5.x labels Jun 10, 2021
estesp added a commit that referenced this pull request Jun 11, 2021
[release/1.5] cherry pick #5569 to release 1.5 - Fix cleanup context of teardownPodNetwork
@fuweid fuweid added cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch and removed cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch labels Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRI fails to invoke CNI plugin to teardown network when RunPodSandbox times out

6 participants