Skip to content

Race condition closing a JSONFileLogger reader #27779

@tombooth

Description

@tombooth

Description

When a container exits the state change will bubble up through libcontainerd into the daemon causing the container to be reset. As part of this it will close the LogDriver associated with the container and the driver will close all of its children. We have observed a race condition between closing a reader and receiving the final data from the log.

followLogs will receive a IN_MODIFY event while it is trying to close the fileWatcher. This causes a deadlock as https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L266 is trying to put the modify event on the Events channel, but because https://github.com/docker/docker/blob/master/daemon/logger/jsonfilelog/read.go#L235 is trying to process the close it cannot take the event off the channel. Remove is blocked because it is waiting on a condition to be broadcast https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L157, this would be triggered by receiving an IN_IGNORE event https://github.com/docker/docker/blob/master/vendor/src/gopkg.in/fsnotify.v1/inotify.go#L289, but due to the code stuck trying to push the IN_MODIFY onto the Events channel this is never processed.

We initially saw this behaviour with v1.12.1, but have replicated it building the daemon from master. Additionally, we first observed this using the HTTP API to follow the logs, but the same behaviour can been seen with docker logs -f

Here is a gist that should show you the order of events that is causing this deadlock: https://gist.github.com/tombooth/e38175f07233c683704528cf406c3120

Steps to reproduce the issue:

  1. Run the attached script, eventually the docker logs -f command will hang
#!/bin/bash

while true
do
        container_id="$(docker run -d ubuntu:latest /bin/bash -e -c "echo -n \"foo\"")"
        docker logs -f "${container_id}"
done

Describe the results you received:
The logs function hung

Describe the results you expected:
It should have exited

Additional information you deem important (e.g. issue happens only occasionally):
This doesn't occur on every execution of the the loop block. On my local machine, running Linux in a container, it seems to hang one in two-three times.

Output of docker version:

Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64

Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 18:29:41 2016
OS/Arch: linux/amd64

Output of docker info:

Containers: 209
Running: 1
Paused: 0
Stopped: 208
Images: 109
Server Version: 1.12.2
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 505
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-45-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.859 GiB
Name: vagrant
ID: QXD3:X4JO:E52F:CWCS:IY3J:RN36:EV36:I2KK:7JH4:WOJ3:MSL5:7O7P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):

We've seen this on AWS, in VMWare Fusion and on a physical machine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions