Skip to content

Overlay network completelly unstable on 17.11 #35592

@zigmund

Description

@zigmund

Description
Multiple network issues...

Some published ports dissapeared awhile after service deployed:

zigmund@docker-m1.alahd.kz.dev:~$ docker service inspect sys-docker-lb | jq ".[0].Endpoint.Spec"
{
  "Mode": "vip",
  "Ports": [
    {
      "Protocol": "tcp",
      "TargetPort": 8888,
      "PublishedPort": 8888,
      "PublishMode": "ingress"
    },
    {
      "Protocol": "tcp",
      "TargetPort": 8889,
      "PublishedPort": 8889,
      "PublishMode": "ingress"
    }
  ]
}
zigmund@docker-m1.alahd.kz.dev:~$ sudo netstat -tapn | grep LISTEN
tcp        0      0 0.0.0.0:10050           0.0.0.0:*               LISTEN      1104/zabbix_agentd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      888/sshd        
tcp6       0      0 :::10050                :::*                    LISTEN      1104/zabbix_agentd
tcp6       0      0 :::2375                 :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::2377                 :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::7946                 :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::30000                :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::30001                :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::30002                :::*                    LISTEN      896/dockerd     
tcp6       0      0 :::22                   :::*                    LISTEN      888/sshd

Some overlay networks dissapeared from some workers awhile after service deployed despite there are multiple containers using that networks:

zigmund@docker-w4.alahd.kz.dev:~$ docker inspect 7ee981a9f6bb | jq ".[0].NetworkSettings.Networks"
{
  "dev-net1": {
    "IPAMConfig": {
      "IPv4Address": "10.253.0.70"
    },
    "Links": null,
    "Aliases": [
      "7ee981a9f6bb"
    ],
    "NetworkID": "681a6sb6qw4hpm1ia0ew0i8vd",
    "EndpointID": "72f8f86ccffaf895b641131a7cc85f9c5069d10d14e25d7e3382453166825199",
    "Gateway": "",
    "IPAddress": "10.253.0.70",
    "IPPrefixLen": 24,
    "IPv6Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "MacAddress": "02:42:0a:fd:00:46",
    "DriverOpts": null
  },
  "stack-iagent-iag-1179-olx_internal": {
    "IPAMConfig": {
      "IPv4Address": "10.0.3.19"
    },
    "Links": null,
    "Aliases": [
      "7ee981a9f6bb"
    ],
    "NetworkID": "gljuv1tigjiex3f3ldsbc7pvl",
    "EndpointID": "998b724be97286417a3e7cf39a02239abb2d7e07be89081baa52d6538d2c35cd",
    "Gateway": "",
    "IPAddress": "10.0.3.19",
    "IPPrefixLen": 24,
    "IPv6Gateway": "",
    "GlobalIPv6Address": "",
    "GlobalIPv6PrefixLen": 0,
    "MacAddress": "02:42:0a:00:03:13",
    "DriverOpts": null
  }
}
zigmund@docker-w4.alahd.kz.dev:~$ docker network ls
NETWORK ID          NAME                           DRIVER              SCOPE
5d60bfb01064        bridge                         bridge              local
9d43a1206727        docker_gwbridge                bridge              local
7db7a6fabb3a        host                           host                local
dji8om8m2pzi        ingress                        overlay             swarm
81f975167792        none                           null                local
itcun61mv59l        stack-iagent-master_internal   overlay             swarm

For example, networks on manager:

zigmund@docker-m1.alahd.kz.dev:~$ docker network ls
NETWORK ID          NAME                                           DRIVER              SCOPE
f625a38baaea        bridge                                         bridge              local
681a6sb6qw4h        dev-net1                                       overlay             swarm
ab04ee31f9b6        docker_gwbridge                                bridge              local
217274a17e93        host                                           host                local
dji8om8m2pzi        ingress                                        overlay             swarm
fec3a4031436        none                                           null                local
ufzq8mppid4w        stack-api-kolesa-api-1146-phiremock_backend    overlay             swarm
xkopm3d2to0o        stack-api-krisha-api-1146-phiremock_backend    overlay             swarm
n8cnhtpx4mzx        stack-api-market-api-1146-phiremock_backend    overlay             swarm
jea413490gap        stack-iagent-bugfix-iag-1064_internal          overlay             swarm
zge94zci9vdr        stack-iagent-bugfix-iag-1097_internal          overlay             swarm
bibufh1st2ie        stack-iagent-bugfix-iag-807-landing_internal   overlay             swarm
baa9ixtprd73        stack-iagent-bugfix-iag-808-landing_internal   overlay             swarm
uty2gtr22g9j        stack-iagent-bugfix-iag-988_internal           overlay             swarm
586pdzpzxmzn        stack-iagent-iag-1103_internal                 overlay             swarm
xyucbskx6bat        stack-iagent-iag-1119_internal                 overlay             swarm
ksnw6zwo33hv        stack-iagent-iag-1146_internal                 overlay             swarm
gljuv1tigjie        stack-iagent-iag-1179-olx_internal             overlay             swarm
h5lqdzdg7cl3        stack-iagent-iag-928_internal                  overlay             swarm
itcun61mv59l        stack-iagent-master_internal                   overlay             swarm

Steps to reproduce the issue:

  1. Form swarm cluster of few 17.11 docker nodes.
  2. Deploy some services (~50)
  3. Wait a little.
  4. Watch docker network going crazy.

Describe the results you received:
Unusable swarm network (overlay, published ports).

Describe the results you expected:
Stable swarm network.

Additional information you deem important (e.g. issue happens only occasionally):
First issues I got after swarm update from 17.10 to 17.11, so I desided to form clean new swarm. Brake old one, wiped /var/lib/docker on all nodes and formed new. But this doesn't helped.

Output of docker version:

Client:
 Version:      17.11.0-ce
 API version:  1.34
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:37:39 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.11.0-ce
 API version:  1.34 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:36:09 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.11.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: moelv396xsep51hmkqx18ozqj
 Is Manager: true
 ClusterID: os8oryt2yez9mxqy83jn8cv7s
 Managers: 3
 Nodes: 7
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.9.243.1
 Manager Addresses:
  10.9.243.1:2377
  10.9.243.2:2377
  10.9.243.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.13.0-17-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 988.6MiB
Name: docker-m1.alahd.kz.dev
ID: F3CI:R4DV:UPQB:JYXY:XAOT:R7FJ:7L7K:X7OG:GKTK:RS2L:IJXT:THVN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu 16.04.3 LTS 4.13.0-17-generic, Proxmox VM.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions