Skip to content

Conversation

@abel-von
Copy link
Contributor

For remote sandbox controllers, the controller process may restart, we have to retry if the error indicates that it is the grpc disconnection.

remote sandbox controller may restart, the Wait call should be retried
if it is an grpc disconnetion error.

Signed-off-by: Abel Feng <fshb1988@gmail.com>
@k8s-ci-robot
Copy link

Hi @abel-von. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Burning1020
Copy link
Member

/ok-to-test

@kzys
Copy link
Member

kzys commented May 15, 2024

/retest

@abel-von
Copy link
Contributor Author

abel-von commented May 17, 2024

/cc @mxpv @mikebrow @dmcgowan @fuweid

@abel-von
Copy link
Contributor Author

/cc @fuweid

@k8s-ci-robot k8s-ci-robot requested a review from fuweid May 17, 2024 01:38
@cpuguy83
Copy link
Member

I like this, and may argue that it wouldn't hurt to do this in all cases, not just remote sandbox controller.

@cpuguy83
Copy link
Member

For reference https://github.com/cpuguy83/containerd-shim-systemd-v1 which does not really make sense as a sandbox controller but would likely be worth it to implement even just for this change.

retryInterval time.Duration = 128
)
for {
resp, err = s.client.Wait(ctx, &api.ControllerWaitRequest{SandboxID: sandboxID})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add failpoint testcase to check if it's infinite loop for ttrpc shim? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we need to add a mocked remote sandbox controller and then we can add failpoint testcast for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that test in the follow-up.

@fuweid fuweid added this pull request to the merge queue May 29, 2024
Merged via the queue into containerd:main with commit 5d2c988 May 29, 2024
@abel-von abel-von mentioned this pull request Jul 17, 2024
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants