Skip to content

docker-runc does not terminate and leave docker-shim hanging in 17.12 #36010

@phsiao

Description

@phsiao

Description

This seems to be new to 17.12.

docker-runc hangs sometimes and needs to be killed in order for the rest of the system to continue to work properly. Because we run Kubernetes, the most noticeable symptom for us is kubelet on the host would start to show PLEG timeout. It appears the docker-shim responsible for the runc and the container stops responding.

We can still interact with docker for the most part, and I don't believe we see issues other than kubelet not able to report container events. docker ps -a would show the container ID with status created (but not docker ps, but docker inspect the container would hang. docker-containerd-ctr shows the container in stopped state.

Another thing that is strange is so far the affected container is kubernetes' infra container. I have yet to see other container being left in such state.

Steps to reproduce the issue:

Can't reproduce it reliably, but happens several times a day across 150 or so docker installations.

Describe the results you received:

docker-runc does not terminate, and docker inspect <cid> hangs.

Describe the results you expected:

docker-runc should terminate, and docker inspect <cid> should not hang.

Additional information you deem important (e.g. issue happens only occasionally):

Issue happens occasionally, but killing the hung docker-runc process restores the system.

Here is what strace of the stuck docker-runcc looks like

# strace -p 43620
strace: Process 43620 attached
openat(AT_FDCWD, "/var/run/docker/runtime-runc/moby/fc2da052c68f4f0a120184ad8eea49fec3e903dd6da9db848bb722f94fb25ba4/exec.fifo", O_RDONLY|O_CLOEXEC

Output of docker version:

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:10:14 2017
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:12:46 2017
  OS/Arch:	linux/amd64
  Experimental:	false

Output of docker info:

Containers: 15
 Running: 13
 Paused: 0
 Stopped: 2
Images: 13
Server Version: 17.12.0-ce
Storage Driver: devicemapper
 Pool Name: docker-images
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data Space Used: 3.437GB
 Data Space Total: 797.8GB
 Data Space Available: 794.4GB
 Metadata Space Used: 8.884MB
 Metadata Space Total: 566.2MB
 Metadata Space Available: 557.3MB
 Thin Pool Minimum Free Space: 79.78GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.13-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 56
Total Memory: 125.8GiB
Name: bnbvxk2
ID: L3JD:VTHD:IQKY:E5IJ:TA3F:PBKH:XYYQ:APCB:LWL3:SHVZ:QKMU:ACJ3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 techops.site=slukd1
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.):

Servers are all bare-metal running CentOS 7.4.1708 with kernel 4.14.11 to 4.14.13.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/runtimeRuntimekind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.version/17.12

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions