Skip to content

Suspected race conditions in daemon/attach.go #29285

@jim-minter

Description

@jim-minter

Description

I ran into a sporadic API consumer hang when reading stdout/stderr from a very short-executing container. The API consumer calls ContainerCreate, ContainerAttach, ContainerStart, ContainerWait in order. When the problem occurs, the server does not write the container stdout/stderr to the ContainerAttach connection, and after the ContainerWait call completes, the server does not close the ContainerAttach connection, causing the API consumer to hang.

By inspection I found the following suspected race conditions in https://github.com/docker/docker/blob/master/daemon/attach.go:

  1. ContainerAttach returns "200 OK" to the caller at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L37 before the stdout/stderr streams are actually attached to the container at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61.

    During this window, it is actually possible for a container to be started and to run to completion. If this happens, the API consumer doesn't receive the container stdout/stderr, and the consumer also misses its connection being closed as part of the container teardown in https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/container/stream/streams.go#L90.

    The easiest way I've found to reproduce this requires modifying the server: insert time.Sleep(10 * time.Second) before https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61 and then compile and run the test program below.

    Note: as an additional protection measure IMO it would be worth making broadcaster.Unbuffered stateful and unreusable to protect against new Writers being Add()ed after Clean() has been called.

  2. I suspect containerAttach (https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L76) to be racy: logs are harvested and returned at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L77-L106, then I believe there is a window during which any new logs would be lost to the consumer before Attach succeeds at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L131. At the moment, I believe a caller who sets logs = true and stream = true is not guaranteed not to lose logging data.

  3. I suspect the invocation of container.IsPaused() at https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L32 must be racy because I don't see any protection against the container state changing in parallel before the containerAttach completes.

I believe that if these issues are valid, they indicate that overall the attach and logging logic needs some rework to be more closely integrated into pre-existing container state / synchronisation primitives, and consideration also needs to be given about if/how to atomically read past logs and start streaming without missing data.

Steps to reproduce the issue:

  1. Insert time.Sleep(10 * time.Second) before https://github.com/docker/docker/blob/9d884986f5c001cacb60aa3c50036575ed2dd22d/daemon/attach.go#L61; compile and run Docker server
  2. docker pull busybox
  3. Compile and run the following golang program:
package main

import (
	"context"
	"os"

	"github.com/docker/docker/api/types"
	"github.com/docker/docker/api/types/container"
	"github.com/docker/docker/api/types/network"
	"github.com/docker/docker/client"
	"github.com/docker/docker/pkg/stdcopy"
)

func main() {
	ctx := context.Background()

	cli, err := client.NewEnvClient()
	if err != nil {
		panic(err)
	}

	c, err := cli.ContainerCreate(ctx, &container.Config{Image: "busybox", Entrypoint: []string{"sh", "-c", "echo hello world"}}, &container.HostConfig{}, &network.NetworkingConfig{}, "")
	if err != nil {
		panic(err)
	}

	hi, err := cli.ContainerAttach(ctx, c.ID, types.ContainerAttachOptions{Stream: true, Stdout: true})
	if err != nil {
		panic(err)
	}

	ch := make(chan error, 1)
	go func() {
		_, copyerr := stdcopy.StdCopy(os.Stdout, os.Stdout, hi.Reader)
		ch <- copyerr
	}()

	err = cli.ContainerStart(ctx, c.ID, types.ContainerStartOptions{})
	if err != nil {
		panic(err)
	}

	code, err := cli.ContainerWait(ctx, c.ID)
	if err != nil {
		panic(err)
	}

	err = <-ch
	if err != nil {
		panic(err)
	}

	os.Exit(int(code))
}

Describe the results you received:
no output; hang

Describe the results you expected:
output "hello world", exit 0.

Output of docker version:

Client:
 Version:      1.14.0-dev
 API version:  1.26
 Go version:   go1.7.4
 Git commit:   9d88498
 Built:        Fri Dec  9 20:01:40 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.14.0-dev
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.4
 Git commit:   9d88498
 Built:        Fri Dec  9 20:01:40 2016
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.14.0-dev
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 51371867a01c467f08af739783b8beafc154c4d7
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.2.3-300.fc23.x86_64
Operating System: Fedora 23 (Twenty Three)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.797 GiB
Name: openshiftdev
ID: YHFD:22Z2:JM6H:AUT2:XA72:6POC:DFYQ:JTFZ:KE3Y:QPUM:ZY33:DOVO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions