Conversation
Signed-off-by: Lantao Liu <lantaol@google.com>
| return nil, err | ||
| } | ||
| defer func() { | ||
| err = func() error { |
There was a problem hiding this comment.
No logic change, only moved the logic into a function.
| } else { | ||
| // Task is found. Get task status. | ||
| s, err = t.Status(ctx) | ||
| s, err := func() (sandboxstore.Status, error) { |
There was a problem hiding this comment.
No logic change, just moved the logic into a function.
90b16a0 to
6c31d26
Compare
pkg/store/sandbox/status.go
Outdated
|
|
||
| // State is the sandbox state we use in containerd/cri. | ||
| // It has init state defined. | ||
| // It has init and unknown state defined. |
There was a problem hiding this comment.
confusing comment given there are other states defined..
There was a problem hiding this comment.
I want to express that there are no init and unknown in the CRI sandbox state, but we have them for internal state management. :)
Let me rephrase it a little bit.
There was a problem hiding this comment.
Changed to "It includes init and unknown, which are internal states not defined in CRI."
pkg/server/container_remove.go
Outdated
| // Do not remove container if it's still running or unknown. | ||
| if status.State() == runtime.ContainerState_CONTAINER_RUNNING { | ||
| return status, errors.New("container is still running") | ||
| return status, errors.New("container is still running, need stop first") |
pkg/server/container_remove.go
Outdated
| return status, errors.New("container is still running, need stop first") | ||
| } | ||
| if status.State() == runtime.ContainerState_CONTAINER_UNKNOWN { | ||
| return status, errors.New("container state is unknown, need stop first") |
pkg/server/container_status.go
Outdated
| // CRI doesn't allow CreatedAt == 0. | ||
| info, err := container.Container.Info(ctx) | ||
| if err != nil { | ||
| return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state") |
There was a problem hiding this comment.
suggest sending info to debug log...
suggest outputting the actual info.State vs presuming it's unknown...
pkg/server/container_stop.go
Outdated
| if state != runtime.ContainerState_CONTAINER_RUNNING { | ||
| if state != runtime.ContainerState_CONTAINER_RUNNING && | ||
| state != runtime.ContainerState_CONTAINER_UNKNOWN { | ||
| logrus.Infof("Container to stop %q is not running, current state %q", |
There was a problem hiding this comment.
is not running -> must be in running or unknown state, current state is %q
| }, | ||
| "sandbox state unknown": { | ||
| state: sandboxstore.StateUnknown, | ||
| expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY, |
There was a problem hiding this comment.
from the state machine it looks like this should be expected ready? please confirm.
There was a problem hiding this comment.
In the internal state machine, StateUnknown is just state unknown.
In CRI there is only NOTREADY state, but it is good enough for unknown handling:
- Kubelet will not continue using the sandbox, and will restart it.
- Before restart kubelet will always try to stop the previous
NOTREADYsandbox until success.
So we can just return the internal unknown state as NOTREADY in CRI.
pkg/server/sandbox_status.go
Outdated
| // CRI doesn't allow CreatedAt == 0. | ||
| info, err := sandbox.Container.Info(ctx) | ||
| if err != nil { | ||
| return nil, errors.Wrap(err, "failed to get CreatedAt in unknown state") |
There was a problem hiding this comment.
get CreatedAt from the container sandbox in unknown state
| }, | ||
| "sandbox state unknown": { | ||
| state: sandboxstore.StateUnknown, | ||
| expectedState: runtime.PodSandboxState_SANDBOX_NOTREADY, |
pkg/server/sandbox_stop.go
Outdated
| state := sandbox.Status.Get().State | ||
| if state == sandboxstore.StateReady || state == sandboxstore.StateUnknown { | ||
| if err := c.stopSandboxContainer(ctx, sandbox); err != nil { | ||
| return nil, errors.Wrapf(err, "failed to stop sandbox container %q", id) |
There was a problem hiding this comment.
add state pls for debug purposes..
pkg/server/sandbox_stop.go
Outdated
| // Handle unknown state. | ||
| // The cleanup logic is the same with container unknown state. | ||
| if state == sandboxstore.StateUnknown { | ||
| status := unknownExitStatus() |
|
@mikebrow Addressed most comments, and replied some of them. |
4b2ba34 to
abdfc53
Compare
Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
abdfc53 to
c27a12d
Compare
|
New changes are detected. LGTM label has been removed. |
|
Just squashed commits. Repply LGTM based on #1037 (review) |
Cherrypick #1037 release 1.2
Fixes #1014.
This PR added unknown state support. When a container/sandbox fails to be loaded, we'll not skip it directly. The sandbox/container will be loaded in unknown state instead.
This PR also defined the state machine for container and sandbox. Based on the state machine, a container/sandbox in unknown state can only be stopped. This ensures that the resource associated with the unknown container/sandbox is correctly released.
This is an important bug fix, I hope we can cherrypick it into release/1.2 as well.