Skip to content

Containers on overlay network and "error creating vxlan interface: file exists" #21482

@isavcic

Description

@isavcic

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 3605
 Running: 3
 Paused: 0
 Stopped: 3602
Images: 95
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /data/docker/aufs
 Backing Filesystem: xfs
 Dirs: 7313
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host overlay bridge
Kernel Version: 4.2.0-34-generic
Operating System: Ubuntu 14.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 188.9 GiB
Name: docker3
ID: DHJL:NC3K:JSL4:AB3M:DSIA:5NX3:GRBK:4BKF:J4I5:OROS:52EY:BRBU
Debug mode (server): true
 File Descriptors: 264
 Goroutines: 964
 System Time: 2016-03-24T18:03:14.608035906+01:00
 EventsListeners: 4
 Init SHA1: 14bbd54b64d2a737269118ce4f1613787e2d0ce8
 Init Path: /usr/lib/docker/dockerinit
 Docker Root Dir: /data/docker
WARNING: No swap limit support
Cluster store: consul://beta-consul.REDACTED:8500/docker-beta
Cluster advertise: REDACTED:2375

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical.

Steps to reproduce the issue:

  1. Create Marathon application which spawns Docker containers on an overlay network with -p 8888 --net=busybox-net1, image busybox, command sleep 60 (Mesos prepends sh -c to commands) without additional options. Pretty basic container setup without bells and whistles.
  2. Scale to 100 container instances across 4 nodes, wait some time to pass. Containers will exit and get started again, as expected.
  3. After some time (usually less than 6 hours), one by one, nodes will be unable to run containers in this network, verified by manual execution on the command line:
    docker3# docker run -d --net=busybox-net1 --name=busybox-test-12341238 busybox sleep 600 54255b316c4f0b54701bd4268f8b37044078e0260ab2065dce862dce45fb513a docker: Error response from daemon: subnet sandbox join failed for "10.140.140.0/24": error creating vxlan interface: file exists.

Debug log of an error: https://gist.github.com/isavcic/afa10cfecce0f760ad32

Describe the results you received:

docker3# docker run -d --net=busybox-net1 --name=busybox-test-12341238 busybox sleep 600 54255b316c4f0b54701bd4268f8b37044078e0260ab2065dce862dce45fb513a docker: Error response from daemon: subnet sandbox join failed for "10.140.140.0/24": error creating vxlan interface: file exists.

Describe the results you expected:

On the other node which is currently unaffected, docker run without issues:
docker4# docker run -d --net=busybox-net1 --name=busybox-test-12341239 busybox sleep 600 a2ba83dd315b2a269fedc1ea62ed6f70d545d631744d627fb0a8ad10f5d54b5a

Debug log of a successful run: https://gist.github.com/isavcic/f17afd54fd3f75b8c70c

Additional information you deem important (e.g. issue happens only occasionally):

Lines containing GET /v1.15/containers were omitted from the debug log snippets because they occur all the time. After service docker stop && service docker start on affected nodes everything works okay again.

I will leave the affected nodes in current state, if I need to provide some additional information do tell.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/networkingNetworkingkind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions