Skip to content
This repository was archived by the owner on Mar 9, 2022. It is now read-only.

Support unknown state#1037

Merged
Random-Liu merged 5 commits intocontainerd:masterfrom
Random-Liu:support-unknown-state
Feb 5, 2019
Merged

Support unknown state#1037
Random-Liu merged 5 commits intocontainerd:masterfrom
Random-Liu:support-unknown-state

Conversation

@Random-Liu
Copy link
Copy Markdown
Member

@Random-Liu Random-Liu commented Feb 5, 2019

Fixes #1014.

This PR added unknown state support. When a container/sandbox fails to be loaded, we'll not skip it directly. The sandbox/container will be loaded in unknown state instead.

This PR also defined the state machine for container and sandbox. Based on the state machine, a container/sandbox in unknown state can only be stopped. This ensures that the resource associated with the unknown container/sandbox is correctly released.

This is an important bug fix, I hope we can cherrypick it into release/1.2 as well.

Signed-off-by: Lantao Liu <lantaol@google.com>
@Random-Liu Random-Liu added this to the v1.2 milestone Feb 5, 2019
return nil, err
}
defer func() {
err = func() error {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logic change, only moved the logic into a function.

} else {
// Task is found. Get task status.
s, err = t.Status(ctx)
s, err := func() (sandboxstore.Status, error) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logic change, just moved the logic into a function.

Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments..


// State is the sandbox state we use in containerd/cri.
// It has init state defined.
// It has init and unknown state defined.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing comment given there are other states defined..

Copy link
Copy Markdown
Member Author

@Random-Liu Random-Liu Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to express that there are no init and unknown in the CRI sandbox state, but we have them for internal state management. :)

Let me rephrase it a little bit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "It includes init and unknown, which are internal states not defined in CRI."

// Do not remove container if it's still running or unknown.
if status.State() == runtime.ContainerState_CONTAINER_RUNNING {
return status, errors.New("container is still running")
return status, errors.New("container is still running, need stop first")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to stop first

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return status, errors.New("container is still running, need stop first")
}
if status.State() == runtime.ContainerState_CONTAINER_UNKNOWN {
return status, errors.New("container state is unknown, need stop first")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to stop first

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// CRI doesn't allow CreatedAt == 0.
info, err := container.Container.Info(ctx)
if err != nil {
return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest sending info to debug log...

suggest outputting the actual info.State vs presuming it's unknown...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if state != runtime.ContainerState_CONTAINER_RUNNING {
if state != runtime.ContainerState_CONTAINER_RUNNING &&
state != runtime.ContainerState_CONTAINER_UNKNOWN {
logrus.Infof("Container to stop %q is not running, current state %q",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not running -> must be in running or unknown state, current state is %q

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

},
"sandbox state unknown": {
state: sandboxstore.StateUnknown,
expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the state machine it looks like this should be expected ready? please confirm.

Copy link
Copy Markdown
Member Author

@Random-Liu Random-Liu Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the internal state machine, StateUnknown is just state unknown.
In CRI there is only NOTREADY state, but it is good enough for unknown handling:

  1. Kubelet will not continue using the sandbox, and will restart it.
  2. Before restart kubelet will always try to stop the previous NOTREADY sandbox until success.

So we can just return the internal unknown state as NOTREADY in CRI.

// CRI doesn't allow CreatedAt == 0.
info, err := sandbox.Container.Info(ctx)
if err != nil {
return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get CreatedAt from the container sandbox in unknown state

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

},
"sandbox state unknown": {
state: sandboxstore.StateUnknown,
expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on earlier question.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

state := sandbox.Status.Get().State
if state == sandboxstore.StateReady || state == sandboxstore.StateUnknown {
if err := c.stopSandboxContainer(ctx, sandbox); err != nil {
return nil, errors.Wrapf(err, "failed to stop sandbox container %q", id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add state pls for debug purposes..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Handle unknown state.
// The cleanup logic is the same with container unknown state.
if state == sandboxstore.StateUnknown {
status := unknownExitStatus()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for use a get func here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Random-Liu
Copy link
Copy Markdown
Member Author

@mikebrow Addressed most comments, and replied some of them.

@Random-Liu Random-Liu force-pushed the support-unknown-state branch from 4b2ba34 to abdfc53 Compare February 5, 2019 19:29
Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
@Random-Liu Random-Liu force-pushed the support-unknown-state branch from abdfc53 to c27a12d Compare February 5, 2019 19:56
@k8s-ci-robot
Copy link
Copy Markdown

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 5, 2019
@Random-Liu
Copy link
Copy Markdown
Member Author

Just squashed commits. Repply LGTM based on #1037 (review)

@Random-Liu Random-Liu added the lgtm label Feb 5, 2019
Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@Random-Liu Random-Liu merged commit 7c2498d into containerd:master Feb 5, 2019
@Random-Liu Random-Liu deleted the support-unknown-state branch February 5, 2019 22:06
Random-Liu added a commit that referenced this pull request Feb 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'failed to reserve sandbox name' error after hard reboot

3 participants