Description
I have docker swarm cluster version 1.12.3, with 3 managers (in drain mode) and 10 workers.
I created 2 overlay network with the command:
docker network create -d overlay network1
docker network create -d overlay network2
and deployed several services, some attached to network1 and some on network2
One service (serviceA) on network1 is communicating with another service (serviceB) on the same network using the internal servicename LB
After couple of days, we scaled serviceA and noticed that the communication to some of the serviceB instances (tasks) is unreachable.
After investigating the problem, we found that on several workers, the parameter:
"com.docker.network.driver.overlay.vxlanid_list"
of network1 isn't consistent accross the whole cluster, in some hosts it was 258
and on some it was 257.
On the managers and most of the workers the value for network1 is 258, and
for network2 the value is 257
Steps to reproduce the issue:
- docker network create -d overlay network1
- docker network create -d overlay network2
- docker service create ... serviceA
- docker service create ... serviceB
- docker service scale serviceA=4
Describe the results you received:
Communication from serviceA to serviceB sometimes works, and sometimes not
telnet serviceB 8080
telnet: connect to address serviceB : Connection refused
Describe the results you expected:
telnet serviceB 8080
Connected to ServiceB
Additional information you deem important (e.g. issue happens only occasionally):
There is no firewall between the servers
At first, we installed 1.12.1 but upgraded to 1.12.3
the linux is redhat 7.3
And as i said, the problem is occasionally accord, depends on the value of "com.docker.network.driver.overlay.vxlanid_list" on the worker host that the destination task is running
Output of docker version:
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
Output of docker info:
Unfortunately I do not have the output available now
Additional environment details (AWS, VirtualBox, physical, etc.):
All the hosts are virtual machine on top of vmware
Description
I have docker swarm cluster version 1.12.3, with 3 managers (in drain mode) and 10 workers.
I created 2 overlay network with the command:
docker network create -d overlay network1
docker network create -d overlay network2
and deployed several services, some attached to network1 and some on network2
One service (serviceA) on network1 is communicating with another service (serviceB) on the same network using the internal servicename LB
After couple of days, we scaled serviceA and noticed that the communication to some of the serviceB instances (tasks) is unreachable.
After investigating the problem, we found that on several workers, the parameter:
"com.docker.network.driver.overlay.vxlanid_list"
of network1 isn't consistent accross the whole cluster, in some hosts it was 258
and on some it was 257.
On the managers and most of the workers the value for network1 is 258, and
for network2 the value is 257
Steps to reproduce the issue:
Describe the results you received:
Communication from serviceA to serviceB sometimes works, and sometimes not
telnet serviceB 8080
telnet: connect to address serviceB : Connection refused
Describe the results you expected:
telnet serviceB 8080
Connected to ServiceB
Additional information you deem important (e.g. issue happens only occasionally):
There is no firewall between the servers
At first, we installed 1.12.1 but upgraded to 1.12.3
the linux is redhat 7.3
And as i said, the problem is occasionally accord, depends on the value of "com.docker.network.driver.overlay.vxlanid_list" on the worker host that the destination task is running
Output of
docker version:Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
Output of
docker info:Unfortunately I do not have the output available now
Additional environment details (AWS, VirtualBox, physical, etc.):
All the hosts are virtual machine on top of vmware