Add check for CNI plugins before tearing down pod network#10744
Add check for CNI plugins before tearing down pod network#10744mxpv merged 1 commit intocontainerd:mainfrom
Conversation
Signed-off-by: Sameer <sameer.saeed@live.ca>
|
Hi @sameersaeed. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/test pull-containerd-k8s-e2e-ec2 |
|
Additionally, the following needs to be fixed in a follow up PR as the defer'd tear down will also fail and "block" the netns removal https://github.com/containerd/containerd/blob/release/1.6/pkg/cri/server/sandbox_run.go#L312. @fuweid FYI.. probably should not be deferring tear down before the network bring up, or if we do should ignore teardown failure when setup fails. Same for 1.6/7 |
|
Hi folks, posting here for more visibility - I just filed a bug on #12130 that is closely relevant to this change. Would appreciate your inputs there, thanks! |
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illusrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and leads to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally ignoring the teardown network when error is related to CNI initialization, while allowing teardown retry to happen when it makes sense.
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally ignoring the teardown network when error is related to CNI initialization, while allowing teardown retry to happen when it makes sense.
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally ignoring the teardown network when error is related to CNI initialization, while allowing teardown retry to happen when it makes sense. Signed-off-by: Zihong Zheng <zihongz@google.com>
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally ignoring the teardown network when error is related to CNI initialization, while allowing teardown retry to happen when it makes sense. Signed-off-by: Zihong Zheng <zihongz@google.com>
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially. Signed-off-by: Zihong Zheng <zihongz@google.com>
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially. Signed-off-by: Zihong Zheng <zihongz@google.com>
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially. Signed-off-by: Zihong Zheng <zihongz@google.com>
This is a follow up of both containerd#10744 and containerd#10839, where teardownPodNetwork may be skipped if setup networks failed in the first place. In the situation illustrated on containerd#12130, for chained CNI pattern the skipping logic may be problematic and can lead to network resource leakage (e.g. Pod IP). This PR attempts to make the sandbox network handling more robust by conditionally skipping the teardown network only when error is related to CNI initialization, while allowing teardown retry to happen if setup already allocated network resoruce partially. Signed-off-by: Zihong Zheng <zihongz@google.com>
Currently, locking behavior occurs if a user creates a sandbox pod and then tries to remove / delete it without having any CNI plugins initialized on their system.
Sample flow
pod-config.json:, per crictl docs
Running sandbox pod - receive CNI plugin error:
Pod shows up as NotReady:
Locking behavior - cannot stop / remove pod due to CNI plugin error:
Changes made
Current code:
containerd/internal/cri/server/sandbox_stop.go
Lines 114 to 116 in db97449
Added a minor change to the above
sandbox_stop.gocode so that the pod network teardown will only be performed if CNIResult returns a valid value:This avoids the locking behavior, so the pods are able to be stopped and / or removed as expected: