cri:fix containerd panic when can't find sandbox extension#11576
cri:fix containerd panic when can't find sandbox extension#11576estesp merged 1 commit intocontainerd:mainfrom
Conversation
|
Hi @ningmingxiao. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
20abca1 to
ad9d389
Compare
ad9d389 to
90c144b
Compare
7d11626 to
abe6dfe
Compare
| err := sbx.GetExtension(podsandbox.MetadataKey, &metadata) | ||
| if err != nil { | ||
| if errors.Is(err, errdefs.ErrNotFound) { | ||
| err = c.client.SandboxStore().Delete(ctx, sbx.ID) |
There was a problem hiding this comment.
Can we add error log here? Something like "could not recover sandbox X: missing metadata key"
There was a problem hiding this comment.
nod, need to log the error message that was being returned below "failed to get metadata for stored sandbox %q: %w", sbx.ID, err to this err not found case so we always get that as output when we run into it..
| err := sbx.GetExtension(podsandbox.MetadataKey, &metadata) | ||
| if err != nil { | ||
| if errors.Is(err, errdefs.ErrNotFound) { | ||
| err = c.client.SandboxStore().Delete(ctx, sbx.ID) |
There was a problem hiding this comment.
nod, need to log the error message that was being returned below "failed to get metadata for stored sandbox %q: %w", sbx.ID, err to this err not found case so we always get that as output when we run into it..
|
Is there some operation that needs to be done in a transaction or something? Ideally it wouldn't be possible to introduce an inconsistency between stores that later needs to be cleaned up, just by stopping containerd at the wrong time. Feels like all these error handling PRs are just to clean up after a problem that shouldn't be occurring in the first place? |
we don't have transaction support across two stores.. wip to ensure we are only modifying one store for both the object and object's meta ... if memory serves there was a split to support new sandbox types.. additionally there was a bug in network CNI tear down of a pod that was causing failure to remove the partially created pod in defer.. was also broken in stop pod when the network was never successfully setup in the first place .. the issue was no cni at all or just an error in running the cni setup/teardown would cause that problem.. this before we had the optional support for doing the minimum of loopback internally.... a number of PRs are all trying to fix the same problem |
|
Unrelated, but I feel like the errors in this section are switched: containerd/internal/cri/server/sandbox_run.go Lines 221 to 227 in d20482f Unless I'm misunderstanding, the first operation should log an
|
Do you have a link to the issue or PR for this one? Is it fixed in containerd 2.0.4? |
|
#10744 yes fixed in all releases see linked PRs.. |
original.. https://github.com/containerd/containerd/blob/release/1.5/pkg/cri/server/sandbox_run.go#L372-L388 |
|
for real what is going on here. This needs to be merged and backported to 2.0 asap. We have been shipping b9ab7a3 with k3s and rke2 for several months. |
|
/ok-to-test |
|
closing and reopening to attempt to reset the test results that have a bad link / can't be restarted... |
|
/cherry-pick release/2.1 |
|
/cherry-pick release/2.0 |
|
@estesp: new pull request created: #12076 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@estesp: new pull request created: #12077 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
can you review my pr 11967? @mikebrow |
fix #10848
I add some sleep at
create a pod when containerd is running at sleep 60 restart containerd, it will happen.