Skip to content

Inconsistent state on pod termination #2438

@lbernail

Description

@lbernail

Description

We just had an issue with containerd: an application was killed several times by the oom killer because it reached its cgroup memory limit. Containers on the host are now in a really weird state:

  • ok according to crictl ps
  • crictl exec fails with cannot exec in a stopped state: unknown
  • ctr -n k8s.io t ls hangs without any output
  • ps auxf shows many containerd-shim without any child process (or sometime only the pause container)
  • runc --root /run/containerd/runc/k8s.io list shows some containers in stopped state
  • the associated containerd-shim process is still running without any child

It seems that sometimes when a container process is oom-killed because it has reached its cgroup memory limit the containerd state becomes inconsistent. Once this has happened it's no longer possible to delete containers. When trying to delete a pod, the containerd logs show:

  • containerd tries to stop it (StopContainer)
  • stop container xx timed out
  • then error=“an error occurs during waiting for container xxx to stop: wait container xxx is cancelled”
  • the container is stopped but not removed

Steps to reproduce the issue:

  1. Run kubernetes using containerd as CRI
  2. Create a pod with a memory limit
  3. Allocate more memory than the limit
  4. After several OOM kills, it should no longer be possible to interact with containerd

Describe the results you received:
containerd seems to be stuck in a inconsistent state and no longer able to fulfill CRI requests

Describe the results you expected:
containerd should clean up oom killed containers and remain consistent

Output of containerd --version:

containerd --version
containerd github.com/containerd/containerd v1.1.0 209a7fc3e4a32ef71a8c7b50c68fc8398415badf```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions