Skip to content

Conversation

@ningmingxiao
Copy link
Contributor

@ningmingxiao ningmingxiao commented Mar 21, 2025

fix container events sometimes lost when restart containerd

@mikebrow @AkihiroSuda @mxpv

@k8s-ci-robot
Copy link

Hi @ningmingxiao. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot bot added area/cri Container Runtime Interface (CRI) kind/bug labels Mar 21, 2025
@ningmingxiao ningmingxiao changed the title cri:fix containerd event lost when container exited at the same time … cri:fix container event can't be handled when restart Mar 21, 2025
@ningmingxiao ningmingxiao changed the title cri:fix container event can't be handled when restart cri:fix container events sometimes lost when restart containerd Mar 21, 2025
@ningmingxiao ningmingxiao force-pushed the fix_event_lost_new3 branch 2 times, most recently from 9eea9ec to 62a5780 Compare March 21, 2025 10:45
@ningmingxiao ningmingxiao force-pushed the fix_event_lost_new3 branch 2 times, most recently from 0a62d75 to 74a9943 Compare March 21, 2025 13:26
@mikebrow
Copy link
Member

/ok-to-test

@ningmingxiao
Copy link
Contributor Author

/retest

@ningmingxiao
Copy link
Contributor Author

/retest

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm :-)

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the tightening of the exit monitor window on restart..

That said, I still think we need to clean up the containers on restart that are not restartable. For example, can't load metadata.. those containers need to be cleaned up.

@ningmingxiao
Copy link
Contributor Author

That cleanup should create another pr.

Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
@ningmingxiao
Copy link
Contributor Author

/retest

@ningmingxiao
Copy link
Contributor Author

@fuweid @mxpv can this pr be merged ?

@ningmingxiao ningmingxiao force-pushed the fix_event_lost_new3 branch 2 times, most recently from 279819a to ead5c1e Compare March 31, 2025 09:16
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Mar 31, 2025
@fuweid fuweid added this pull request to the merge queue Mar 31, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 31, 2025
@fuweid fuweid added this pull request to the merge queue Mar 31, 2025
Merged via the queue into containerd:main with commit af01360 Mar 31, 2025
114 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Mar 31, 2025
@SergeyKanzhelev
Copy link
Contributor

@mikebrow is this PR critical enough to warrant the cherry pick?

@samuelkarp samuelkarp added the cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch label Apr 7, 2025
@samuelkarp
Copy link
Member

samuelkarp commented Apr 7, 2025

/cherrypick release/2.0

@mikebrow
Copy link
Member

mikebrow commented Apr 7, 2025

@mikebrow is this PR critical enough to warrant the cherry pick?

I'd like to see the test added first, and possibly another deeper fix. This does not eliminate the window of opportunity. We might have to change our init process up a bit to remove this possibility entirely. Should be a way to ensure the shim waits to notify the exit event or it is cached until we have properly completed our restart.

@k8s-infra-cherrypick-robot

@samuelkarp: new pull request created: #11657

Details

In response to this:

/cherrypick release/2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@samuelkarp
Copy link
Member

@mikebrow That's fair. I'll remove the cherry-pick label and close the PR it just made.

@samuelkarp samuelkarp removed the cherry-pick/2.0.x Change to be cherry picked to release/2.0 branch label Apr 7, 2025
@mikebrow
Copy link
Member

mikebrow commented Apr 7, 2025

or at least before we merge the cherry into 2.0.. let's get some legs under this issue

@mikebrow
Copy link
Member

mikebrow commented Apr 7, 2025

@mikebrow That's fair. I'll remove the cherry-pick label and close the PR it just made.

if you'd like to "test" it on 2.0 that won't hurt :-)

@SergeyKanzhelev
Copy link
Contributor

@ningmingxiao were you planning to work on it?

@ningmingxiao
Copy link
Contributor Author

I want to add some test, I can add some sleep in load container and then and then kill pidof container. but can't find a good way to add some sleep in testcae.I have to change containerd code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) kind/bug ok-to-test size/S

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants