-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Output of docker version:
Client:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 15:54:52 2016
OS/Arch: linux/amd64
Server:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 15:54:52 2016
OS/Arch: linux/amd64
Output of docker info:
Containers: 3605
Running: 3
Paused: 0
Stopped: 3602
Images: 95
Server Version: 1.10.3
Storage Driver: aufs
Root Dir: /data/docker/aufs
Backing Filesystem: xfs
Dirs: 7313
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: null host overlay bridge
Kernel Version: 4.2.0-34-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 188.9 GiB
Name: docker3
ID: DHJL:NC3K:JSL4:AB3M:DSIA:5NX3:GRBK:4BKF:J4I5:OROS:52EY:BRBU
Debug mode (server): true
File Descriptors: 264
Goroutines: 964
System Time: 2016-03-24T18:03:14.608035906+01:00
EventsListeners: 4
Init SHA1: 14bbd54b64d2a737269118ce4f1613787e2d0ce8
Init Path: /usr/lib/docker/dockerinit
Docker Root Dir: /data/docker
WARNING: No swap limit support
Cluster store: consul://beta-consul.REDACTED:8500/docker-beta
Cluster advertise: REDACTED:2375
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical.
Steps to reproduce the issue:
- Create Marathon application which spawns Docker containers on an overlay network with
-p 8888 --net=busybox-net1, imagebusybox, commandsleep 60(Mesos prependssh -cto commands) without additional options. Pretty basic container setup without bells and whistles. - Scale to 100 container instances across 4 nodes, wait some time to pass. Containers will exit and get started again, as expected.
- After some time (usually less than 6 hours), one by one, nodes will be unable to run containers in this network, verified by manual execution on the command line:
docker3# docker run -d --net=busybox-net1 --name=busybox-test-12341238 busybox sleep 600 54255b316c4f0b54701bd4268f8b37044078e0260ab2065dce862dce45fb513a docker: Error response from daemon: subnet sandbox join failed for "10.140.140.0/24": error creating vxlan interface: file exists.
Debug log of an error: https://gist.github.com/isavcic/afa10cfecce0f760ad32
Describe the results you received:
docker3# docker run -d --net=busybox-net1 --name=busybox-test-12341238 busybox sleep 600 54255b316c4f0b54701bd4268f8b37044078e0260ab2065dce862dce45fb513a docker: Error response from daemon: subnet sandbox join failed for "10.140.140.0/24": error creating vxlan interface: file exists.
Describe the results you expected:
On the other node which is currently unaffected, docker run without issues:
docker4# docker run -d --net=busybox-net1 --name=busybox-test-12341239 busybox sleep 600 a2ba83dd315b2a269fedc1ea62ed6f70d545d631744d627fb0a8ad10f5d54b5a
Debug log of a successful run: https://gist.github.com/isavcic/f17afd54fd3f75b8c70c
Additional information you deem important (e.g. issue happens only occasionally):
Lines containing GET /v1.15/containers were omitted from the debug log snippets because they occur all the time. After service docker stop && service docker start on affected nodes everything works okay again.
I will leave the affected nodes in current state, if I need to provide some additional information do tell.
Thanks.