Support unknown state by Random-Liu · Pull Request #1037 · containerd/cri

Random-Liu · 2019-02-05T08:18:07Z

This PR added unknown state support. When a container/sandbox fails to be loaded, we'll not skip it directly. The sandbox/container will be loaded in unknown state instead.

This PR also defined the state machine for container and sandbox. Based on the state machine, a container/sandbox in unknown state can only be stopped. This ensures that the resource associated with the unknown container/sandbox is correctly released.

This is an important bug fix, I hope we can cherrypick it into release/1.2 as well.

Signed-off-by: Lantao Liu <lantaol@google.com>

Random-Liu · 2019-02-05T08:31:24Z

pkg/server/restart.go

-			return nil, err
-		}
-		defer func() {
+	err = func() error {


No logic change, only moved the logic into a function.

Random-Liu · 2019-02-05T08:31:55Z

pkg/server/restart.go

-	} else {
-		// Task is found. Get task status.
-		s, err = t.Status(ctx)
+	s, err := func() (sandboxstore.Status, error) {


No logic change, just moved the logic into a function.

mikebrow

See comments..

mikebrow · 2019-02-05T15:13:38Z

pkg/store/sandbox/status.go

+
 // State is the sandbox state we use in containerd/cri.
-// It has init state defined.
+// It has init and unknown state defined.


confusing comment given there are other states defined..

I want to express that there are no init and unknown in the CRI sandbox state, but we have them for internal state management. :)

Let me rephrase it a little bit.

Changed to "It includes init and unknown, which are internal states not defined in CRI."

mikebrow · 2019-02-05T15:28:13Z

pkg/server/container_remove.go

+		// Do not remove container if it's still running or unknown.
 		if status.State() == runtime.ContainerState_CONTAINER_RUNNING {
-			return status, errors.New("container is still running")
+			return status, errors.New("container is still running, need stop first")


to stop first

mikebrow · 2019-02-05T15:28:29Z

pkg/server/container_remove.go

+			return status, errors.New("container is still running, need stop first")
+		}
+		if status.State() == runtime.ContainerState_CONTAINER_UNKNOWN {
+			return status, errors.New("container state is unknown, need stop first")


to stop first

mikebrow · 2019-02-05T17:07:09Z

pkg/server/container_status.go

+		// CRI doesn't allow CreatedAt == 0.
+		info, err := container.Container.Info(ctx)
+		if err != nil {
+			return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state")


suggest sending info to debug log...

suggest outputting the actual info.State vs presuming it's unknown...

mikebrow · 2019-02-05T17:14:05Z

pkg/server/container_stop.go

-	if state != runtime.ContainerState_CONTAINER_RUNNING {
+	if state != runtime.ContainerState_CONTAINER_RUNNING &&
+		state != runtime.ContainerState_CONTAINER_UNKNOWN {
 		logrus.Infof("Container to stop %q is not running, current state %q",


is not running -> must be in running or unknown state, current state is %q

mikebrow · 2019-02-05T18:09:52Z

pkg/server/sandbox_list_test.go

 		},
+		"sandbox state unknown": {
+			state:         sandboxstore.StateUnknown,
+			expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY,


from the state machine it looks like this should be expected ready? please confirm.

In the internal state machine, StateUnknown is just state unknown.
In CRI there is only NOTREADY state, but it is good enough for unknown handling:

Kubelet will not continue using the sandbox, and will restart it.

Before restart kubelet will always try to stop the previous NOTREADY sandbox until success.

So we can just return the internal unknown state as NOTREADY in CRI.

mikebrow · 2019-02-05T18:13:55Z

pkg/server/sandbox_status.go

+		// CRI doesn't allow CreatedAt == 0.
+		info, err := sandbox.Container.Info(ctx)
+		if err != nil {
+			return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state")


get CreatedAt from the container sandbox in unknown state

mikebrow · 2019-02-05T18:14:13Z

pkg/server/sandbox_status_test.go

 		},
+		"sandbox state unknown": {
+			state:         sandboxstore.StateUnknown,
+			expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY,


ditto on earlier question.

mikebrow · 2019-02-05T18:14:55Z

pkg/server/sandbox_stop.go

+	state := sandbox.Status.Get().State
+	if state == sandboxstore.StateReady || state == sandboxstore.StateUnknown {
 		if err := c.stopSandboxContainer(ctx, sandbox); err != nil {
 			return nil, errors.Wrapf(err, "failed to stop sandbox container %q", id)


add state pls for debug purposes..

mikebrow · 2019-02-05T18:18:35Z

pkg/server/sandbox_stop.go

+	// Handle unknown state.
+	// The cleanup logic is the same with container unknown state.
+	if state == sandboxstore.StateUnknown {
+		status := unknownExitStatus()


ditto for use a get func here

Random-Liu · 2019-02-05T19:06:38Z

@mikebrow Addressed most comments, and replied some of them.

mikebrow

/LGTM

Signed-off-by: Lantao Liu <lantaol@google.com>

k8s-ci-robot · 2019-02-05T19:56:36Z

New changes are detected. LGTM label has been removed.

Random-Liu · 2019-02-05T19:57:00Z

Just squashed commits. Repply LGTM based on #1037 (review)

mikebrow

/LGTM

Cherrypick #1037 release 1.2

Change StateUnknown to StateInit

bfd25c8

Signed-off-by: Lantao Liu <lantaol@google.com>

Random-Liu assigned mikebrow Feb 5, 2019

k8s-ci-robot added the size/XL label Feb 5, 2019

Random-Liu added this to the v1.2 milestone Feb 5, 2019

Random-Liu commented Feb 5, 2019

View reviewed changes

Random-Liu mentioned this pull request Feb 5, 2019

Fix potential containerd panic. containerd/containerd#2976

Merged

Random-Liu force-pushed the support-unknown-state branch from 90b16a0 to 6c31d26 Compare February 5, 2019 08:44

k8s-ci-robot added size/XXL and removed size/XL labels Feb 5, 2019

Random-Liu mentioned this pull request Feb 5, 2019

'failed to reserve sandbox name' error after hard reboot #1014

Closed

mikebrow reviewed Feb 5, 2019

View reviewed changes

Random-Liu force-pushed the support-unknown-state branch from 4b2ba34 to abdfc53 Compare February 5, 2019 19:29

mikebrow approved these changes Feb 5, 2019

View reviewed changes

k8s-ci-robot added the lgtm label Feb 5, 2019

Random-Liu added 4 commits February 5, 2019 11:56

Add state machine for sandbox and container

4dc6f6d

Signed-off-by: Lantao Liu <lantaol@google.com>

Support unknown state for sandbox and container

83af4da

Signed-off-by: Lantao Liu <lantaol@google.com>

Add integration test for unknown state

f8b3450

Signed-off-by: Lantao Liu <lantaol@google.com>

Update containerd to 5ba368748b0275d8f45f909413d94738992f0050.

c27a12d

Signed-off-by: Lantao Liu <lantaol@google.com>

Random-Liu force-pushed the support-unknown-state branch from abdfc53 to c27a12d Compare February 5, 2019 19:56

k8s-ci-robot removed the lgtm label Feb 5, 2019

Random-Liu added the lgtm label Feb 5, 2019

mikebrow approved these changes Feb 5, 2019

View reviewed changes

Random-Liu merged commit 7c2498d into containerd:master Feb 5, 2019

Random-Liu deleted the support-unknown-state branch February 5, 2019 22:06

Random-Liu mentioned this pull request Feb 5, 2019

Cherrypick #1037 release 1.2 #1038

Merged

Random-Liu added a commit that referenced this pull request Feb 6, 2019

Merge pull request #1038 from Random-Liu/cherrypick-#1037-release-1.2

c3cf754

Cherrypick #1037 release 1.2

Random-Liu mentioned this pull request Feb 6, 2019

Better handle CreatedAt timestamp for unknown container. #1039

Closed

Random-Liu mentioned this pull request Jun 13, 2022

Setup pod network after creating the sandbox container containerd/containerd#5904

Merged

Conversation

Random-Liu commented Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Random-Liu Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Random-Liu Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Random-Liu commented Feb 5, 2019

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Feb 5, 2019

Uh oh!

Random-Liu commented Feb 5, 2019

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Random-Liu commented Feb 5, 2019 •

edited

Loading

Random-Liu Feb 5, 2019 •

edited

Loading

Random-Liu Feb 5, 2019 •

edited

Loading