-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Description
In Docker 17.05.0-ce (#30261), the --update-failure-action=rollback option was introduced. If an update with the options --update-order=start-first --update-failure-action=rollback --update-parallelism=0 --rollback-order=start-first fails, the previously running, healthy containers are replaced with new ones as part of the rollback, instead of keeping the old ones. That causes a window of downtime if an update fails and is rolled back.
Steps to reproduce the issue:
- Create two images which sleep for 10 seconds before running nginx. One has 1 health check retry, the other has 15. The health check interval is 1 second for each.
cat > main.sh << EOF
#!/bin/sh -ex
sleep 10
touch /health
nginx -g "daemon off;"
EOF
chmod +x main.sh
cat > Dockerfile.1retry << EOF
FROM nginx:alpine
HEALTHCHECK --interval=1s --retries=1 CMD test -r /health || exit 1
COPY main.sh /main.sh
ENTRYPOINT /main.sh
CMD ["nginx", "-g", "daemon off;"]
EOF
docker build -t 10s:1retry -f Dockerfile.1retry .
cat > Dockerfile.15retries << EOF
FROM 10s:1retry
HEALTHCHECK --interval=1s --retries=15 CMD test -r /health || exit 1
EOF
docker build -t 10s:15retries -f Dockerfile.15retries .- Create the service and wait for it to be healthy:
docker service create --detach=false --name=10s --replicas=2 --update-monitor=1s --update-failure-action=rollback --update-order=start-first --rollback-order=start-first --rollback-monitor=1s --publish 80:80 10s:15retries nginx -g "daemon off;"- Start separate shells for
watch -n 1 docker ps,watch -n 1 docker service ps 10s,watch -n 1 docker service inspect --pretty 10s,watch -n 1 curl -sS localhost, anddocker service logs -f 10sto see what's going on - Update the service with the 1 retry image and observe downtime:
docker service update --detach=false --image 10s:1retry --rollback-order=start-first --rollback-parallelism=1 --update-parallelism=0 --rollback-order=start-first 10sDescribe the results you received:
- The containers/tasks running prior to the update are stopped and removed as part of the rollback
- There is an extended window of downtime as experienced when
curling the ingress port
I attached a GIF of what I experienced.

Describe the results you expected:
- The healthy containers running prior to the update should simply stay in place
- I should not experience any downtime when
curling on the ingress port
Additional information you deem important (e.g. issue happens only occasionally):
Nope.
Output of docker version:
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:23:31 2017
OS/Arch: linux/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:19:04 2017
OS/Arch: linux/amd64
Experimental: trueOutput of docker info:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 53
Server Version: 17.06.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: j3bnlf3fdmrfmtth4cmsmfbrf
Is Manager: true
ClusterID: vaiauv39cxsnfq0r14014lwi5
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.108.103.132
Manager Addresses:
10.108.103.132:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.12.1-041201-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.55GiB
Name: lx64pc0265
ID: 3FQU:UDEZ:SNTQ:AE34:UABY:5FO2:DRHR:LC2U:RY7E:NMNL:PM65:DYH5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://localhost:3128/
Https Proxy: http://localhost:3128/
No Proxy: localhost,127.0.0.0/8
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: falseAdditional environment details (AWS, VirtualBox, physical, etc.):
Physical bare metal workstation. Yes I know I'm on Linux 4.12.1, but that's hardly likely to be the issue here.