Skip to content

Wrong ipvs address weight during a deployment process #41896

@IMMORTALxJO

Description

@IMMORTALxJO

Description

Hello,

Sometimes my docker Swarm cluster doesn't update weights of IPVS addresses during a deployment process.
It should set weight=0 during a rollout, but it just skips some endpoints. So I get wrong load balancer config and errors during a deployment:

### Old containers are waiting for graceshutdown
  -> 10.0.108.161:0               Masq    0      0          0
  -> 10.0.108.162:0               Masq    1      0          3 <- Problem
  -> 10.0.108.163:0               Masq    0      0          0
  -> 10.0.108.164:0               Masq    0      0          0

### New containers are ready
  -> 10.0.108.165:0               Masq    1      0          4
  -> 10.0.108.166:0               Masq    1      0          2
  -> 10.0.108.167:0               Masq    1      0          4
  -> 10.0.108.168:0               Masq    1      0          5

The more containers in the service and the more network peers, the greater the chance of catching an error. On services with 32 replicas on 20 nodes I face issue with 100% chance.

I've started investigations and looks I found some kind of a race condition between the BulkSync procedure and a logic of DisableService method.

Let me explain on the example:

  1. NodeA disabled EndpointA via DisableService method ( https://github.com/moby/libnetwork/blob/fa125a3512ee0f6187721c88582bf8c4378bd4d7/sandbox.go#L712 ) that set ServiceDisabled=true on the endpoint record locally, increased LTime and sent UPDATE event to all networks peers
  2. NodeA started BulkSync with NodeB
  3. NodeB got updated value of EndpointA endpoint from the bulkSync procedure, processed it as CREATE event and saved it in local state
  4. NodeB got UPDATE event from the point 1 but ignores it since endpointA with the same LTime already exists in networkdb ( https://github.com/moby/libnetwork/blob/d0951081b35fa4216fc4f0064bf065beeb55a74b/networkdb/delegate.go#L174 )
  5. NodeA stopped container, removed endpoint and sent DELETE event to all network peers
  6. NodeB get DELETE event and removed endpoint from IPVS

Usually UPDATE event processing sets weight=0 to disable endpoint. CREATE events processing doesn't have any logic related to ServiceDisabled=true flag. Since we ignored UPDATE event, we got IPVS address with weight=1 until address has been deleted.

In such situations we have two "handleEpTableEvent ADD" for a single endpoint in logs because bulkSync processes the record as a new endpoint adding:

time="2021-01-19T06:24:16.721195937Z" level=debug msg="handleEpTableEvent ADD 96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6 R:{test-app_php-fpm.3.y8k0h8wumjhbvrsqd8w66s1wy test-app_php-fpm 21ivc1mb008pv9gu2if59go9k 10.0.108.8 10.0.108.72 [] [php-fpm] [fcec15cea6a3] false}"
time="2021-01-19T06:39:50.899346960Z" level=debug msg="handleEpTableEvent ADD 96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6 R:{test-app_php-fpm.3.y8k0h8wumjhbvrsqd8w66s1wy test-app_php-fpm 21ivc1mb008pv9gu2if59go9k 10.0.108.8 10.0.108.72 [] [php-fpm] [fcec15cea6a3] true}"
time="2021-01-19T06:40:51.504938421Z" level=debug msg="handleEpTableEvent DEL 96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6 R:{test-app_php-fpm.3.y8k0h8wumjhbvrsqd8w66s1wy test-app_php-fpm 21ivc1mb008pv9gu2if59go9k 10.0.108.8 10.0.108.72 [] [php-fpm] [fcec15cea6a3] true}"

I've collected tableEvents log from a node also. Endpoint 96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6 didn't change weight to 0 in IPVS and we got unstable service during a deployment:

## Common events related to online endpoint from bulk sync
2021-01-19T06:38:10.886180117Z {Type:CREATE,LTime:93557892,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:38:40.905904777Z {Type:CREATE,LTime:93557892,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:39:10.885634958Z {Type:CREATE,LTime:93557892,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:39:41.193478062Z {Type:CREATE,LTime:93557892,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}

## Deploy process has been started, node 1b906f7d2be0 set ServiceDisabled=true and updated LTime of the endpoint record
## We got CREATE event from bulk sync earlier than UPDATE event from gossip
2021-01-19T06:39:50.899209788Z {Type:CREATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
## CREATE event ignored ServiceDisabled=true flag and didn't update IPVS table

## Next events are ignored because LTime is the same as before
## during these events endpoint has weight 1 in IPVS table
2021-01-19T06:39:51.372175542Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:39:51.520065112Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:39:52.240751438Z {Type:CREATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:39:53.174523912Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:01.415950175Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:01.920180779Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:02.098081468Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:05.270438843Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:05.953138848Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:06.920765495Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:07.142009496Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:08.304224480Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:10.999052920Z {Type:CREATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:11.372201674Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:13.022040640Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:14.097906052Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:16.670842605Z {Type:UPDATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}
2021-01-19T06:40:42.339699998Z {Type:CREATE,LTime:93561596,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:0}

## Application started to have timeouts and connection problems, because container is already DOWN, but has weight 1 in IPVS
## Container shutdowned, LTime changed, record disappeared from the ipvs
2021-01-19T06:40:51.504647244Z {Type:DELETE,LTime:93561883,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:1800}
2021-01-19T06:40:51.520386920Z {Type:DELETE,LTime:93561883,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:1800}
2021-01-19T06:40:51.710730260Z {Type:DELETE,LTime:93561883,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:1800}
2021-01-19T06:40:51.791670427Z {Type:DELETE,LTime:93561883,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:1800}
2021-01-19T06:40:51.992225524Z {Type:DELETE,LTime:93561883,NodeName:1b906f7d2be0,NetworkID:f312pn3rfujkpupqzeyumtvpm,TableName:endpoint_table,Key:96309316d9ca369cfae8bd02f9ecfa4cc468e4fd60cd0e33863ec33ef4a0bdd6,ResidualReapTime:1800}
...

Can you help me figure out the problem?

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:52 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:23 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 22
  Running: 17
  Paused: 0
  Stopped: 5
 Images: 20
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: mn8g3iocbtq39mw8wtp1hqry6
  Is Manager: true
  ClusterID: bknezdiqnpzy2nwj4vj0imxwz
  Managers: 3
  Nodes: 25
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.10.10.1
  Manager Addresses:
   10.10.10.1:2377
   10.10.10.2:2377
   10.10.10.3:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-0.bpo.8-amd64
 Operating System: Debian GNU/Linux 9 (stretch)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.24GiB
 Name: s88-swarm-manager-03
 ID: QJN7:73T3:BB2Q:ADZD:J36N:EXDQ:IUFF:73JN:3R6C:SBU7:6JOJ:S7NP
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 492
  Goroutines: 745
  System Time: 2021-01-19T11:54:20.986673668Z
  EventsListeners: 19
 Registry: https://index.docker.io/v1/
 Labels:
  provider=generic
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
physical cluster with 22 workers and 3 managers

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions