-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
On service update/restart/etc some old tasks stays semi-alive:
zigmund@docker-m1.alahd.kz.dev:~$ docker service ps stack-iagent-bugfix-iag-808-landing_app-cli
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
nmup4588wtny stack-iagent-bugfix-iag-808-landing_app-cli.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w4.alahd.kz.dev Running Running 9 minutes ago
yut8tf0hyfat \_ stack-iagent-bugfix-iag-808-landing_app-cli.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w1.alahd.kz.dev Shutdown Rejected 9 minutes ago "Failed joining stack-iagent-b…"
i2ogh9fo27ou \_ stack-iagent-bugfix-iag-808-landing_app-cli.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w2.alahd.kz.dev Shutdown Shutdown 9 minutes ago
ruj69i7cqpi5 \_ stack-iagent-bugfix-iag-808-landing_app-cli.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w4.alahd.kz.dev Shutdown Rejected 3 hours ago "Failed joining stack-iagent-b…"
Record the task i2ogh9fo27ou and go to worker:
zigmund@docker-w2.alahd.kz.dev:~$ docker ps | grep i2ogh9fo27ou
c28c0b79670c registry:5000/stack/iagent/app "/bin/sh -c ${ENTRYP…" 12 minutes ago Up 11 minutes stack-iagent-bugfix-iag-808-landing_app-cli.1.i2ogh9fo27ouxxjdqw5gom21c
Zombie task cannot be killed or removed by hand:
zigmund@docker-w2.alahd.kz.dev:~$ docker rm -f c28c0b79670c
Error response from daemon: Could not kill running container c28c0b79670c6aa998bb560cd6cf4251365187feb46453f6cb49819d86dfeede, cannot remove - Cannot kill container c28c0b79670c6aa998bb560cd6cf4251365187feb46453f6cb49819d86dfeede: process c28c0b79670c6aa998bb560cd6cf4251365187feb46453f6cb49819d86dfeede not found: not found
A lot of our services failed to update:
zigmund@docker-m1.alahd.kz.dev:~$ docker service ps stack-iagent-bugfix-iag-808-landing_app-web --no-trunc
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
javaqsim2m9ujw7j6go5wcneu stack-iagent-bugfix-iag-808-landing_app-web.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w2.alahd.kz.dev Shutdown Rejected 14 minutes ago "Failed joining stack-iagent-bugfix-iag-808-landing_internal-endpoint to sandbox stack-iagent-bugfix-iag-808-landing_internal-sbox: container stack-iagent-bugfix-iag-808-landing_internal-sbox: endpoint create on GW Network failed: endpoint with name gateway_stack-iagent already exists in network docker_gwbridge"
ygpnrkf9ncjaq9yi2trxmdpvv \_ stack-iagent-bugfix-iag-808-landing_app-web.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w2.alahd.kz.dev Shutdown Rejected 14 minutes ago "Failed joining stack-iagent-bugfix-iag-808-landing_internal-endpoint to sandbox stack-iagent-bugfix-iag-808-landing_internal-sbox: container stack-iagent-bugfix-iag-808-landing_internal-sbox: endpoint create on GW Network failed: endpoint with name gateway_stack-iagent already exists in network docker_gwbridge"
xjysl4jnpanz7390mo3iq3ty9 \_ stack-iagent-bugfix-iag-808-landing_app-web.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w2.alahd.kz.dev Shutdown Shutdown 3 hours ago
z89ryyydrezpov6fg3d7kha2k \_ stack-iagent-bugfix-iag-808-landing_app-web.1 registry:5000/stack/iagent/app@sha256:29255041db3e95d8159447a4fdbe3a111b3857309296365c427e303b3de81726 docker-w2.alahd.kz.dev Shutdown Failed 3 hours ago "task: non-zero exit (1)"
Sometimes swarm shows zombie tasks:
zigmund@docker-m1.alahd.kz.dev:~$ docker service ls | grep stack-iagent-iag-1119_nginx
ngn4p1knm40g stack-iagent-iag-1119_nginx replicated 3/1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d
zigmund@docker-m1.alahd.kz.dev:~$ docker service ps stack-iagent-iag-1119_nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jelsqcbrvd56 stack-iagent-iag-1119_nginx.1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d docker-w2.alahd.kz.dev Running Running 24 minutes ago
jpz79v1808pu \_ stack-iagent-iag-1119_nginx.1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d docker-w2.alahd.kz.dev Shutdown Running 25 minutes ago
znar2fbopy71 \_ stack-iagent-iag-1119_nginx.1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d docker-w2.alahd.kz.dev Shutdown Rejected 26 minutes ago "Failed creating stack-iagent-…"
i0yowxygqid0 \_ stack-iagent-iag-1119_nginx.1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d docker-w3.alahd.kz.dev Shutdown Running 2 hours ago
fuhh6f8gg6bg \_ stack-iagent-iag-1119_nginx.1 registry:5000/stack/iagent/nginx@sha256:2859e18e47b45211cf0cc062c4e9bc136cb01339e446a484726de310219db09d docker-w1.alahd.kz.dev Shutdown Running 3 hours ago
We have to update service over and over. At this point swarn on 17.11 completelly useless.
Steps to reproduce the issue:
- Form swarm cluster of few 17.11 docker nodes.
- Deploy some services (~50)
- Deploy/update/restart services, do what you do usually with swarm.
- Get zombie tasks and task allocation failures.
Describe the results you received:
Broken service updates.
Describe the results you expected:
Working service updates.
Additional information you deem important (e.g. issue happens only occasionally):
Recreated swarm from scratch two times, tried different kernel versions from 4.4 to 4.13 (read somewhere that instabillity might be related) - no luck.
Don't know if it related to overall network instabillity: #35592
Output of docker version:
Client:
Version: 17.11.0-ce
API version: 1.34
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:37:39 2017
OS/Arch: linux/amd64
Server:
Version: 17.11.0-ce
API version: 1.34 (minimum version 1.12)
Go version: go1.8.3
Git commit: 1caf76c
Built: Mon Nov 20 18:36:09 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.11.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: gelf
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: moelv396xsep51hmkqx18ozqj
Is Manager: true
ClusterID: os8oryt2yez9mxqy83jn8cv7s
Managers: 3
Nodes: 7
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.9.243.1
Manager Addresses:
10.9.243.1:2377
10.9.243.2:2377
10.9.243.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.8.0-58-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.953GiB
Name: docker-m1.alahd.kz.dev
ID: F3CI:R4DV:UPQB:JYXY:XAOT:R7FJ:7L7K:X7OG:GKTK:RS2L:IJXT:THVN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu 16.04.3 LTS 4.8.0-58-generic, Proxmox VM.