-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
When a container exits the state change will bubble up through libcontainerd into the daemon causing the container to be reset. As part of this it will close the LogDriver associated with the container and the driver will close all of its children. We have observed a race condition between closing a reader and receiving the final data from the log.
followLogs will receive a IN_MODIFY event while it is trying to close the fileWatcher. This causes a deadlock as https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L266 is trying to put the modify event on the Events channel, but because https://github.com/docker/docker/blob/master/daemon/logger/jsonfilelog/read.go#L235 is trying to process the close it cannot take the event off the channel. Remove is blocked because it is waiting on a condition to be broadcast https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L157, this would be triggered by receiving an IN_IGNORE event https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L289, but due to the code stuck trying to push the IN_MODIFY onto the Events channel this is never processed.
We initially saw this behaviour with v1.12.1, but have replicated it building the daemon from master. Additionally, we first observed this using the HTTP API to follow the logs, but the same behaviour can been seen with docker logs -f
Here is a gist that should show you the order of events that is causing this deadlock: https://gist.github.com/tombooth/e38175f07233c683704528cf406c3120
Steps to reproduce the issue:
- Run the attached script, eventually the
docker logs -fcommand will hang
#!/bin/bash
while true
do
container_id="$(docker run -d ubuntu:latest /bin/bash -e -c "echo -n \"foo\"")"
docker logs -f "${container_id}"
done
Describe the results you received:
The logs function hung
Describe the results you expected:
It should have exited
Additional information you deem important (e.g. issue happens only occasionally):
This doesn't occur on every execution of the the loop block. On my local machine, running Linux in a container, it seems to hang one in two-three times.
Output of docker version:
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64
Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64
Output of docker info:
Containers: 209
Running: 1
Paused: 0
Stopped: 208
Images: 109
Server Version: 1.12.2
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 505
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-45-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.859 GiB
Name: vagrant
ID: QXD3:X4JO:E52F:CWCS:IY3J:RN36:EV36:I2KK:7JH4:WOJ3:MSL5:7O7P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8
Additional environment details (AWS, VirtualBox, physical, etc.):
We've seen this on AWS, in VMWare Fusion and on a physical machine.