Skip to content

Starting container fails with 'System error: read parent: connection reset by peer' #14203

@mfrister

Description

@mfrister

On our CI server, we run tests in Docker containers using docker-compose. We link 2-15 containers during one run. We ensure that test jobs running concurrently have different docker-compose project names.

Since we upgraded to Docker 1.7.0 (from 1.6), docker-compose to 1.3.1 (from 1.2), and started killing containers instead of stopping them (faster, remove them anyway), we twice had containers failing to start with the following message from compose:

Creating x_db_1...
Creating x_1...
Cannot start container 10bbc5af8ec0d3bb39b207a6474ec70a0954bff01ff94389684a8b9f52df6067: [8] System error: read parent: connection reset by peer

/var/log/docker.log contains the following:

time="2015-06-25T10:29:44.322521665+02:00" level=info msg="POST /v1.18/containers/19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2/start" 
time="2015-06-25T10:29:44.690044235+02:00" level=warning msg="signal: killed" 
time="2015-06-25T10:29:44.915997839+02:00" level=error msg="Handler for POST /containers/{name:.*}/start returned error: Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" 
time="2015-06-25T10:29:44.916111471+02:00" level=error msg="HTTP Error" err="Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" statusCode=500 

The container is created, but doesn't start. Trying to manually start it using docker start fails with the same error message. Memory is available and the kernel log doesn't show any message from the OOM killer.

Restarting docker temporarily solves the problem, so I assume this is a problem with docker itself, not with docker-compose.

docker version:

Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64

docker info:

Containers: 30
Images: 451
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 517
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
CPUs: 4
Total Memory: 15.52 GiB
Name: <hostname>
ID: HYYT:WNZW:UPU7:VI2O:HUTP:EZVV:2MQ2:WCRJ:3SHJ:LZXF:MVLS:P3XC
WARNING: No memory limit support
WARNING: No swap limit support

uname -a:

<hostname> 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 (2015-04-27) x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.): Physical machine

Steps to Reproduce:

  1. docker-compose run ..., kill and rm 50-200 (estimated) containers with links using docker-compose
  2. At some point, docker fails to start a container.

Actual Results: Starting the container fails.

Expected Results: Container starts.

Additional info:

Kubernetes seemed to have a similar problem, apparently some sort of race. The issue also links to a few occurences where people had a similar problem (mainly on IRC).

Metadata

Metadata

Assignees

Labels

kind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.priority/P2Normal priority: default priority applied.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions