Skip to content

Fix cleanup context of teardownPodNetwork#5569

Merged
estesp merged 1 commit into
containerd:masterfrom
tnqn:teardown-network-timeout
Jun 7, 2021
Merged

Fix cleanup context of teardownPodNetwork#5569
estesp merged 1 commit into
containerd:masterfrom
tnqn:teardown-network-timeout

Conversation

@tnqn

@tnqn tnqn commented Jun 4, 2021

Copy link
Copy Markdown
Contributor

Similar to other deferred cleanup operations, teardownPodNetwork should use a different context as the original context may have expired, otherwise CNI wouldn't been invoked, leading to leak of network resources, e.g. IP addresses.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #5438

Similar to other deferred cleanup operations, teardownPodNetwork should
use a different context as the original context may have expired,
otherwise CNI wouldn't been invoked, leading to leak of network
resources, e.g. IP addresses.

Signed-off-by: Quan Tian <qtian@vmware.com>
@k8s-ci-robot

Copy link
Copy Markdown

Hi @tnqn. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tnqn

tnqn commented Jun 4, 2021

Copy link
Copy Markdown
Contributor Author

We saw IP addresses were leaked when using Containerd 1.4.4 and Antrea as the CNI, the symptom is that containerd cannot create sandbox container successfully after running a while, failed because of the error:

msg="Failed to destroy network for sandbox \"27902e644b711ed94ac9699f3f588e98a5d28d1dd6a8a25c2f6e829f2df49da3\"" error="netplugin failed with no error message: context canceled"
msg="RunPodSandbox for &PodSandboxMetadata{Name:nginx-rs-new-7d95d89897-nqkcl,Uid:1f5c3c02-b06e-4985-9a65-46f654b0fc57,Namespace:default,Attempt:0,} failed, error" error="failed to start sandbox container task \"27902e644b711ed94ac9699f3f588e98a5d28d1dd6a8a25c2f6e829f2df49da3\": context canceled: unknown"

The reason that it failed to start sandbox container task may be opencontainers/runc#2865 and may have been fixed by opencontainers/runc#2871.
But on CNI side it didn't receive any CNI DEL call when containerd logged "netplugin failed with no error". I think it's because the context already expired so it failed immediately. #5438 had the same symptom with Weave as the CNI.
Other cleanup operations use a new context to avoid this issue, I think it should apply to CNI cleanup as well.

@theopenlab-ci

theopenlab-ci Bot commented Jun 4, 2021

Copy link
Copy Markdown

Build succeeded.

@fuweid fuweid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green

@dims

dims commented Jun 7, 2021

Copy link
Copy Markdown
Member

/ok-to-test

@dims

dims commented Jun 7, 2021

Copy link
Copy Markdown
Member

@fuweid let's make sure k8s CI jobs go green as well.

@estesp estesp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 7d77b51 into containerd:master Jun 7, 2021

@mikebrow mikebrow left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late with my LGTM.. thx!

@estesp estesp added cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch and removed cherry-pick/1.5.x labels Jun 10, 2021
estesp added a commit that referenced this pull request Jun 11, 2021
[release/1.5] cherry pick #5569 to release 1.5 - Fix cleanup context of teardownPodNetwork
@fuweid fuweid added cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch and removed cherry-picked/1.4.x PR commits are cherry picked into the release/1.4 branch labels Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRI fails to invoke CNI plugin to teardown network when RunPodSandbox times out

6 participants