-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
I ran into a sporadic API consumer hang when reading stdout/stderr from a very short-executing container. The API consumer calls ContainerCreate, ContainerAttach, ContainerStart, ContainerWait in order. When the problem occurs, the server does not write the container stdout/stderr to the ContainerAttach connection, and after the ContainerWait call completes, the server does not close the ContainerAttach connection, causing the API consumer to hang.
By inspection I found the following suspected race conditions in https://github.com/docker/docker/blob/master/daemon/attach.go:
-
ContainerAttach returns "200 OK" to the caller at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L37 before the stdout/stderr streams are actually attached to the container at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61.
During this window, it is actually possible for a container to be started and to run to completion. If this happens, the API consumer doesn't receive the container stdout/stderr, and the consumer also misses its connection being closed as part of the container teardown in https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/container/stream/streams.go#L90.
The easiest way I've found to reproduce this requires modifying the server: insert
time.Sleep(10 * time.Second)before https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61 and then compile and run the test program below.Note: as an additional protection measure IMO it would be worth making broadcaster.Unbuffered stateful and unreusable to protect against new Writers being
Add()ed afterClean()has been called. -
I suspect
containerAttach(https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L76) to be racy: logs are harvested and returned at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L77-L106, then I believe there is a window during which any new logs would be lost to the consumer beforeAttachsucceeds at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L131. At the moment, I believe a caller who sets logs = true and stream = true is not guaranteed not to lose logging data. -
I suspect the invocation of
container.IsPaused()at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L32 must be racy because I don't see any protection against the container state changing in parallel before the containerAttach completes.
I believe that if these issues are valid, they indicate that overall the attach and logging logic needs some rework to be more closely integrated into pre-existing container state / synchronisation primitives, and consideration also needs to be given about if/how to atomically read past logs and start streaming without missing data.
Steps to reproduce the issue:
- Insert
time.Sleep(10 * time.Second)before https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61; compile and run Docker server docker pull busybox- Compile and run the following golang program:
package main
import (
"context"
"os"
"github.com/docker/docker/api/types"
"github.com/docker/docker/api/types/container"
"github.com/docker/docker/api/types/network"
"github.com/docker/docker/client"
"github.com/docker/docker/pkg/stdcopy"
)
func main() {
ctx := context.Background()
cli, err := client.NewEnvClient()
if err != nil {
panic(err)
}
c, err := cli.ContainerCreate(ctx, &container.Config{Image: "busybox", Entrypoint: []string{"sh", "-c", "echo hello world"}}, &container.HostConfig{}, &network.NetworkingConfig{}, "")
if err != nil {
panic(err)
}
hi, err := cli.ContainerAttach(ctx, c.ID, types.ContainerAttachOptions{Stream: true, Stdout: true})
if err != nil {
panic(err)
}
ch := make(chan error, 1)
go func() {
_, copyerr := stdcopy.StdCopy(os.Stdout, os.Stdout, hi.Reader)
ch <- copyerr
}()
err = cli.ContainerStart(ctx, c.ID, types.ContainerStartOptions{})
if err != nil {
panic(err)
}
code, err := cli.ContainerWait(ctx, c.ID)
if err != nil {
panic(err)
}
err = <-ch
if err != nil {
panic(err)
}
os.Exit(int(code))
}Describe the results you received:
no output; hang
Describe the results you expected:
output "hello world", exit 0.
Output of docker version:
Client:
Version: 1.14.0-dev
API version: 1.26
Go version: go1.7.4
Git commit: 9d88498
Built: Fri Dec 9 20:01:40 2016
OS/Arch: linux/amd64
Server:
Version: 1.14.0-dev
API version: 1.26 (minimum version 1.12)
Go version: go1.7.4
Git commit: 9d88498
Built: Fri Dec 9 20:01:40 2016
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.14.0-dev
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.2.3-300.fc23.x86_64
Operating System: Fedora 23 (Twenty Three)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.797 GiB
Name: openshiftdev
ID: YHFD:22Z2:JM6H:AUT2:XA72:6POC:DFYQ:JTFZ:KE3Y:QPUM:ZY33:DOVO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false