-
Notifications
You must be signed in to change notification settings - Fork 3.8k
sandbox: remove sandboxStore from podsandbox controller #9463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @abel-von. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
0e49613 to
6208f4d
Compare
sandbox/controller.go
Outdated
| Stop(ctx context.Context, sandboxID string, opts ...StopOpt) error | ||
| // Wait blocks until sandbox process exits. | ||
| Wait(ctx context.Context, sandboxID string) (ExitStatus, error) | ||
| Wait(ctx context.Context, sandboxID string) (<-chan ExitStatus, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you change the definition here. You should change the comment as well.
Would you please add more comment about why it becomes async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/cri/server/podsandbox/store.go
Outdated
| return nil | ||
| } | ||
|
|
||
| func (s *Store) Get(id string) (*types.PodSandbox, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool seems duplicate information here, since that Save ensures the p is not nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, the existing is based on whether the return value is nil
| ch := make(chan sandbox.ExitStatus, 1) | ||
| go func() { | ||
| // because we don't set timeout of Wait, so it will not return err | ||
| status, _ := podSandbox.Wait(ctrdutil.NamespacedContext()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use ctx here. The caller should ensure that context should not share the same scope with RunPodSandbox one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
6208f4d to
b7bd664
Compare
fuweid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change is too large. It seems it mixed bugfix and refactor.
If possible, please separate changes into several pull requests.
sandbox/controller.go
Outdated
| // Stop will stop sandbox instance | ||
| Stop(ctx context.Context, sandboxID string, opts ...StopOpt) error | ||
| // Wait blocks until sandbox process exits. | ||
| // Wait asynchronously waits for the sandbox to exit, and sends the exit code to the returned channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the change.
| }) | ||
|
|
||
| if err != nil { | ||
| return sandbox.ExitStatus{}, fmt.Errorf("failed to wait sandbox %s: %w", sandboxID, errdefs.FromGRPC(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to change this line?
pkg/cri/server/sandbox_status.go
Outdated
| // In most cases, controller.Status() with verbose=true should have SandboxInfo in the return, | ||
| // but if controller.Status() returns a NotFound error, | ||
| // we should fallback to get SandboxInfo from cached sandbox itself. | ||
| func toCRISandboxInfo(sandbox sandboxstore.Sandbox) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please change this name to, like toDeletedCRISandboxInfo? toCRISandboxInfo looks weird because it's just used when sandbox is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/cri/server/sandbox_run.go
Outdated
| exitCh, err := sandboxClient.Wait(util.NamespacedContext()) | ||
| if err != nil { | ||
| log.G(ctx).WithError(err).Error("failed to wait for sandbox exit") | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not correct. Please return err here. The named return error is retErr instead of err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change this the next PR as I reverted commit of sandboxClient.
pkg/cri/server/restart.go
Outdated
| } | ||
|
|
||
| // Convert the old pause Container to the new Sandbox | ||
| info, _ := sb.Container.Info(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if err is not nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the commit is reverted, will consider this in the next PR
|
|
||
| for _, sb := range c.sandboxStore.List() { | ||
| sb := sb | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it bugfix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be considered as a bugfix, before this PR, the recover.go in podsandbox will handle the task exit and update the sandbox in the cri cache. Now as in this pr we removed the sandboxStore in podsandbox controller, it can not update the sandbox in cri cache anymore. so we have to call Wait of sandbox and startSandboxExitMonitor here.
I think it is also neccessary if we have more sandbox controllers other than podsandbox. we have to call Wait of sandbox and monitor it by ourselves.
| if err != nil { | ||
| if !errdefs.IsNotFound(err) { | ||
| return fmt.Errorf("failed to get sandbox container: %w", err) | ||
| return fmt.Errorf("failed to get podSandbox container: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/podSandbox/pod sandbox/g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| sandboxID, err) | ||
| podSandbox := c.store.Get(sandboxID) | ||
| if podSandbox == nil { | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you consider return ErrNotFound here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and let cri server log and ignore the ErrNotFound.
client/sandbox.go
Outdated
| func (s *sandboxClient) Wait(ctx context.Context) (<-chan api.ExitStatus, error) { | ||
| ch := make(chan api.ExitStatus, 1) | ||
| go func() { | ||
| status, err := s.client.SandboxController(s.metadata.Sandboxer).Wait(ctx, s.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to close ch in defer and remain that UnknownExitStatus if Wait returns error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7f4589d to
0daa015
Compare
8e06d18 to
2b18128
Compare
ok, I revert the pr of refator to use Sandbox Client. and will submit another PR with that refactor. |
pkg/cri/server/sandbox_status.go
Outdated
| } | ||
| } | ||
|
|
||
| // toCRISandboxInfo converts cached sandbox to CRI sandbox status response info map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align name with function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| case <-sandbox.Stopped(): | ||
| return nil | ||
| } | ||
| func (c *Controller) waitSandboxStop(ctx context.Context, sandbox *types.PodSandbox) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this function and call sandbox.Wait(ctx) directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // TODO: how to backoff | ||
| c.cri.BackOffEvent(id, e) | ||
| } | ||
| podSandbox.Exit(*containerd.NewExitStatus(exitStatus, exitedAt, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's correct. Line 107 has backoff event. And then the code still goes into set Exit.
And the Exit has close the channel. I think it might cause leak issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the codes of backoff event here because the startSandboxExitMonitor is responsible to handle backoff when sandbox exit event handle failed. So I removed the backoff logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to call podSandbox.Exit when the error is not a context error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to wait #9598 is merged, and then the backoff event can be handled again by the podsandbox controller.
| podSandbox := types.NewPodSandbox(info.ID, sandboxstore.Status{State: sandboxstore.StateUnknown}) | ||
| podSandbox.Metadata = metadata | ||
| podSandbox.Runtime = info.Runtime | ||
| _ = c.store.Save(podSandbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return c.store.Save(podSandbox)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/cri/server/podsandbox/recover.go
Outdated
| }() | ||
| } | ||
|
|
||
| _ = c.store.Save(podSandbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it only return error when podSandbox is nil and as we know it is not nil, so it will not return error. anyway, I add the check error logic in case that Save function returns some other error in the future.
a8a35f1 to
7daa93e
Compare
|
/retest-required |
21741d6 to
ec695af
Compare
| dctx, dcancel := context.WithTimeout(dctx, handleEventTimeout) | ||
| defer dcancel() | ||
| event := &eventtypes.TaskExit{ExitStatus: exitStatus, ExitedAt: protobuf.ToTimestamp(exitedAt)} | ||
| if err := handleSandboxTaskExit(dctx, p, event); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it going to shadow named return err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is for purpose because we don't want to return the handle task error here, anyway, I changed the name here from err to cleanErr, so to make this clear.
Signed-off-by: Abel Feng <fshb1988@gmail.com>
ec695af to
7dadd5f
Compare
fuweid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| dctx, dcancel := context.WithTimeout(dctx, handleEventTimeout) | ||
| defer dcancel() | ||
| event := &eventtypes.TaskExit{ExitStatus: exitStatus, ExitedAt: protobuf.ToTimestamp(exitedAt)} | ||
| if cleanErr := handleSandboxTaskExit(dctx, p, event); cleanErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some test cases to see what if fail to cleanup task and more than 2 StopPodSandbox requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-O ^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely decoupled now... still would like to see extra test cases, this is one of the touchiest areas of the code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will add some more tests here
| if taskStatus.Status == containerd.Running { | ||
| // Wait for the task for sandbox monitor. | ||
| // wait is a long running background request, no timeout needed. | ||
| status.State = sandboxstore.StateReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we setting ready if there is still a window for failure to wait for the container task (see return below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abel-von ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I think you are right, I have to set the status to NotReady before return, I will submit another PR to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For decoupling of podsandbox controller to cri service, podsandbox controller should create its own cache of PodSandbox object.