sandbox: use sandboxService in CRI plugin instead of calling controller API directly #9617

abel-von · 2024-01-09T02:14:23Z

To make codes clean, CRI can store a Sandbox client in each sandbox in cache, and call the APIs in the client directly, so that we don't need to find the Controller everytime we need to call APIs of sandbox controller.

k8s-ci-robot · 2024-01-09T02:14:32Z

Hi @abel-von. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

abel-von · 2024-01-09T02:16:35Z

needs rebase after #9463 is merged

abel-von · 2024-01-09T06:36:28Z

@fuweid @dmcgowan @mxpv

mikebrow

see questions comments :-)

mikebrow · 2024-01-09T17:47:31Z

pkg/cri/server/restart.go

 			}
-			exitCh <- *containerd.NewExitStatus(exit.ExitStatus, exit.ExitedAt, nil)
-		}()
+			// Sandbox was in running state, but its task has been deleted,


ready/notready/unknown could it be in unknown here after restart? should we always overwrite here...

As sandbox client clearly returns NotFound, I think we can make sure that the sandbox does not exist anymore. I think we can set the state to not ready.

if it was unknown before restart... does that mean unknown is not valid after restart.. or are we removing unknown...

I'm probably missing something.

I think the status of sandbox is a little bit simpler than container, when we call ListPodSandbox, we can only get two states of whether Ready or NotReady. we only set the sandbox status to Unknown in cri cache temporary when we do RunPodSandbox and we do not store the state to db. So when containerd restart, it will check the status of sandbox by calling the Status or Wait APIs of sandbox controller.
We can also see that in codes of recover.go and restart.go that the status of sandbox is set to Ready and NotReady, without a third state.

are you saying this:
https://github.com/containerd/containerd/blob/main/pkg/cri/server/podsandbox/sandbox_run.go#L285-L288](https://github.com/containerd/containerd/blob/main/pkg/cri/store/sandbox/status.go#L76-L78)

only applies to cache.. hmm is that new and we just didn't update the state diagram?

Ok I missed the move from sandbox having a store that was actually stored to an in memory cache only..

hmm...

missed this change: #7401

Should I be concerned here:
https://github.com/containerd/containerd/blob/main/pkg/cri/server/podsandbox/sandbox_run.go#L265-L288
about not using a lock on this cached map (writing directly to the map contents to update state from unknown to ready, and other changes, without doing some sort of lock first via an update api?)

Guessing my confusion is more to the in between status of the cri sandbox store and it's meta information and the moving of that to the new sandbox meta...

about not using a lock on this cached map (writing directly to the map contents to update state from unknown to ready, and other changes, without doing some sort of lock first via an update api?

I guess you are right, I reconstructed the code of PodSandbox about the status and add a commit in this PR, please take a look.

please take a look again. @mikebrow

pkg/cri/server/restart.go

pkg/cri/server/sandbox_run.go

dmcgowan · 2024-01-16T19:03:21Z

call the APIs in the client directly, so that we don't need to find the Controller everytime we need to call APIs of sandbox controller.

I'm hesitant on this one. We want to decouple the plugins from the client package rather than add more to client to cleanup plugins. It seems we are missing a service interface for controllers that should be in a plugin. The client container interface is not a good example to follow and also one we want to build a service interface around.

abel-von · 2024-01-30T16:50:01Z

It seems we are missing a service interface for controllers that should be in a plugin.

Removed the client dependency and add some methods in the sandboxService in cri. and fixed the bug that PinnedImages has no struct tag. please take a look.

abel-von · 2024-01-30T16:50:34Z

/cc @dmcgowan

abel-von · 2024-02-07T01:30:07Z

@dmcgowan @mxpv Do we still have to discuss this PR? as I did't use Sandbox Client anymore.

Burning1020 · 2024-02-22T11:26:24Z

@mxpv Does it still need disscussion? We've disscussed at community meeting on Jan 25 before adding this label: https://docs.google.com/document/d/1Q8KyVJd26oAQ3MafbnkVBgJCopPl2Bw45H9L9br9Vus/edit#heading=h.3f10hgwx9mc

mxpv · 2024-02-22T20:08:31Z

internal/cri/server/sandbox_service.go

+func (c *criSandboxService) SandboxStatus(ctx context.Context, sandboxer string, sandboxID string, verbose bool) (sandbox.ControllerStatus, error) {
+	sbController, ok := c.sandboxers[sandboxer]
+	if !ok {
+		return sandbox.ControllerStatus{}, fmt.Errorf("failed to get sandbox controller by %s", sandboxer)


I'd suggest introducing a helper so you don't have to replicate same error message in every function:

func controller(name string) { sbController, ok := c.sandboxers[sandboxer] if !ok { return nil, fmt.Errorf("sandbox controller %q not found", sandboxer) } return sbController } `` And then you could just return an error: ```go controller, err := c.controller(sandboxer) if err != nil { return nil, err }

There is already an exported method named SandboxController, I was hesitant on calling it because we still need to check the error after calling, It seems we didn't reduce the code lines or complexity, So I gave up calling it.

I think we can just call the SandboxController.

func (c *criSandboxService) SandboxStatus(ctx context.Context, sandboxer string, sandboxID string, verbose bool) (sandbox.ControllerStatus, error) { ctrl, err := c.SandboxController(sandboxer) if err != nil { return sandbox.ControllerStatus{}, err } return ctrl.Status(ctx, sandboxID, verbose) }

mxpv

One minor suggestion, but overall look good to me once the CI is green.

fuweid

Leave some comments. Basically, It looks good.

fuweid · 2024-02-23T03:49:30Z

internal/cri/server/podsandbox/controller_test.go

 	assert.Equal(t, s.State, sandboxstore.StateReady.String())

-	sb.Exit(*containerd.NewExitStatus(exitStatus, exitedAt, nil))
+	if err := sb.Exit(exitStatus, exitedAt); err != nil {


suggest using assert.NoError(sb.Exit()).

fuweid · 2024-02-25T08:32:58Z

internal/cri/server/sandbox_service.go

+}
+
+func (c *criSandboxService) CreateSandbox(ctx context.Context, info sandbox.Sandbox, opts ...sandbox.CreateOpt) error {
+	sbController, ok := c.sandboxers[info.Sandboxer]


It's not related to this pull request. But just my two cents, I think Controller name is better than Sandboxer.

fuweid · 2024-02-25T08:39:08Z

internal/cri/server/service.go

 	}

 	// Initialize pod sandbox controller
 	// TODO: Get this from options, NOT client


consider to remove this TODO

fuweid · 2024-02-25T08:46:15Z

internal/cri/server/sandbox_service.go

+func (c *criSandboxService) SandboxStatus(ctx context.Context, sandboxer string, sandboxID string, verbose bool) (sandbox.ControllerStatus, error) {
+	sbController, ok := c.sandboxers[sandboxer]
+	if !ok {
+		return sandbox.ControllerStatus{}, fmt.Errorf("failed to get sandbox controller by %s", sandboxer)


I think we can just call the SandboxController.

func (c *criSandboxService) SandboxStatus(ctx context.Context, sandboxer string, sandboxID string, verbose bool) (sandbox.ControllerStatus, error) { ctrl, err := c.SandboxController(sandboxer) if err != nil { return sandbox.ControllerStatus{}, err } return ctrl.Status(ctx, sandboxID, verbose) }

fuweid · 2024-02-25T08:48:38Z

internal/cri/server/podsandbox/types/podsandbox_test.go

+	assert.Equal(t, p.Status.Get().State, sandbox.StateUnknown)
+	assert.Equal(t, p.ID, "test")
+	p.Metadata = sandbox.Metadata{ID: "test", NetNSPath: "/test"}
+	assert.Equal(t, p.Metadata.NetNSPath, "/test")


It seems we don't need to verify the assignment here

fuweid · 2024-02-25T08:49:24Z

internal/cri/server/podsandbox/types/podsandbox_test.go

+		return status, nil
+	})
+	if err != nil {
+		t.Fatalf("Update pod sandbox status failed %v", err)


requires.NoError is equal to t.Fatalf

fuweid · 2024-02-25T08:51:52Z

internal/cri/server/podsandbox/types/podsandbox_test.go

+		assert.Equal(t, exitTime, exitAt)
+	}()
+	time.Sleep(time.Second)
+	if err := p.Exit(uint32(128), exitAt); err != nil {


Suggest sync and drain two goroutines after call p.Exit.
Otherwise, it's easy to be flaky and go-test might detect race condition and mark it failure.

fuweid · 2024-02-25T09:01:34Z

internal/cri/server/podsandbox/controller.go

 		event := &eventtypes.TaskExit{ExitStatus: exitStatus, ExitedAt: protobuf.ToTimestamp(exitedAt)}
-		if cleanErr := handleSandboxTaskExit(dctx, p, event); cleanErr != nil {
+		if err := handleSandboxTaskExit(dctx, p, event); err != nil {
+			// TODO will backoff the event to the controller's own EventMonitor, but not cri's,


Not sure I understand your comment correctly: maybe we should let CRI plugin to retry all the failures? so that we don't need to implement other retry for shim-type sandbox controller.

handleSandboxExit in CRI plugin will only update the sandbox status in its sandboxStore and should ignore the implementation details of sandbox(So it should not know that if there is a task or shim for the sandbox), I think it is the podsandbox controller's responsibility to do cleanup of the legacy sandbox. and that is what we do in handleSandboxTaskExit.

And the spliting of EventMonitor of podsandbox from CRI plugin is done in #9598, The retry of sandbox container task cleanup is done in that PR

so that we cri service don't have to get sandbox controller everytime it needs to call sandbox controller api. Signed-off-by: Abel Feng <fshb1988@gmail.com>

Signed-off-by: Abel Feng <fshb1988@gmail.com>

fuweid

LGTM

k8s-ci-robot added needs-ok-to-test size/XXL labels Jan 9, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from 75994f9 to c6b25ae Compare January 9, 2024 02:28

k8s-ci-robot added size/L and removed size/XXL labels Jan 9, 2024

abel-von mentioned this pull request Jan 9, 2024

Sandbox API work continued #9431

Open

19 tasks

mikebrow reviewed Jan 9, 2024

View reviewed changes

abel-von force-pushed the sandbox-plugin-0109 branch from c6b25ae to c536aa8 Compare January 10, 2024 02:11

abel-von requested a review from mikebrow January 10, 2024 03:05

abel-von force-pushed the sandbox-plugin-0109 branch from c536aa8 to 1bd9df7 Compare January 12, 2024 03:57

k8s-ci-robot added needs-rebase size/XL and removed size/L labels Jan 12, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from 1bd9df7 to 70107d1 Compare January 12, 2024 03:59

k8s-ci-robot removed the needs-rebase label Jan 12, 2024

abel-von force-pushed the sandbox-plugin-0109 branch 2 times, most recently from c39a539 to 6558b99 Compare January 12, 2024 06:16

k8s-ci-robot added the needs-rebase label Jan 20, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from 6558b99 to e968bf6 Compare January 30, 2024 16:32

k8s-ci-robot added size/L and removed size/XL labels Jan 30, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from e968bf6 to a6d9316 Compare January 30, 2024 16:47

k8s-ci-robot requested a review from dmcgowan January 30, 2024 16:50

k8s-ci-robot removed the needs-rebase label Jan 31, 2024

k8s-ci-robot added the needs-rebase label Feb 3, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from a6d9316 to 721d1fb Compare February 4, 2024 07:43

k8s-ci-robot removed the needs-rebase label Feb 4, 2024

mxpv added the status/needs-discussion Needs discussion and decision from maintainers label Feb 5, 2024

dims added the area/cri Container Runtime Interface (CRI) label Feb 7, 2024

abel-von mentioned this pull request Feb 21, 2024

sandbox: add event monitor for podsandbox controller #9598

Merged

abel-von changed the title ~~sandbox: use Sandbox client in CRI plugin instead of controller~~ sandbox: use sandboxService in CRI plugin instead of calling controller API directly Feb 21, 2024

k8s-ci-robot added the needs-rebase label Feb 22, 2024

mxpv reviewed Feb 22, 2024

View reviewed changes

mxpv approved these changes Feb 22, 2024

View reviewed changes

abel-von force-pushed the sandbox-plugin-0109 branch from 721d1fb to a576130 Compare February 23, 2024 02:05

k8s-ci-robot removed the needs-rebase label Feb 23, 2024

abel-von force-pushed the sandbox-plugin-0109 branch from a576130 to 20c511d Compare February 23, 2024 03:25

fuweid reviewed Feb 25, 2024

View reviewed changes

abel-von added 3 commits February 26, 2024 10:10

sandbox: add methods to sandboxService

0f1d274

so that we cri service don't have to get sandbox controller everytime it needs to call sandbox controller api. Signed-off-by: Abel Feng <fshb1988@gmail.com>

sandbox: optimize the lock in PodSandbox

a0b73ae

Signed-off-by: Abel Feng <fshb1988@gmail.com>

sandbox: add struct tags for PinnedImages

a60e52f

Signed-off-by: Abel Feng <fshb1988@gmail.com>

abel-von force-pushed the sandbox-plugin-0109 branch from 20c511d to a60e52f Compare February 26, 2024 02:17

fuweid approved these changes Feb 28, 2024

View reviewed changes

fuweid added this pull request to the merge queue Feb 28, 2024

fuweid removed the status/needs-discussion Needs discussion and decision from maintainers label Feb 28, 2024

fuweid removed this pull request from the merge queue due to a manual request Feb 28, 2024

fuweid added ok-to-test and removed needs-ok-to-test labels Feb 28, 2024

fuweid added this pull request to the merge queue Feb 28, 2024

Merged via the queue into containerd:main with commit 2cdf012 Feb 28, 2024

sandbox: use sandboxService in CRI plugin instead of calling controller API directly #9617

sandbox: use sandboxService in CRI plugin instead of calling controller API directly #9617

Uh oh!

Conversation

abel-von commented Jan 9, 2024

Uh oh!

k8s-ci-robot commented Jan 9, 2024

Uh oh!

abel-von commented Jan 9, 2024

Uh oh!

abel-von commented Jan 9, 2024

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dmcgowan commented Jan 16, 2024

Uh oh!

abel-von commented Jan 30, 2024

Uh oh!

abel-von commented Jan 30, 2024

Uh oh!

abel-von commented Feb 7, 2024

Uh oh!

Burning1020 commented Feb 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxpv left a comment

Choose a reason for hiding this comment

Uh oh!

fuweid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment