-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Hi.
Description
I'm testing how much docker health checks can be handled in a single docker node.
Basically I know that docker health check uses docker_exec and I think that the performance of docker_exec depends on how fast processes fork the system can do.
I've tested a simple script that calls wget on localhost with interval 1s, timeout 1s, with health check, and docker service with replicas set to 20. (20 health checks per second)
At some point, we found that all of the health check of the Containers failed. Of course, after re-scheduling, the health check will succeed, but it will be repeated again after a certain time.
I found that there're many logs like "context deadline exceeded" or "context cancelled"
Is this a normal cases when use docker health check or create 20 docker_exec per second is harsh conditions ?
Steps to reproduce the issue:
-
My System
CPU : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
MEM: 48G
DISTRIBUTION : CentOS Linux release 7.2.1511 (Core)
KERNEL : 4.7.2 (main line) -
Health Check Command
# docker service create --health-cmd "wget -q -s http://localhost:80 || exit 1" --health-interval 1s --health-timeout 1s --name health-check --replicas 20 nginx:alpineDescribe the results you received:
# docker service ps health-check
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
kzn2jwipwy58 health-check.1 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
411mzt9ash7o \_ health-check.1 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 16 seconds ago
i0oxjj7rwlot health-check.2 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
dj05zx2oa6dv \_ health-check.2 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
knr5d3smc4jk health-check.3 nginx:alpine cdev-r01n002-lb.shipdock Running Running 8 seconds ago
raxzla6gpu5i \_ health-check.3 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 15 seconds ago
i7pe6lpr1d9y health-check.4 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
vpwinsaii1ay \_ health-check.4 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 16 seconds ago
kl8sbv8pvsip health-check.5 nginx:alpine cdev-r01n002-lb.shipdock Running Running 12 seconds ago
875vjel7ik3f \_ health-check.5 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 19 seconds ago
sx16mfdgq3kw health-check.6 nginx:alpine cdev-r01n002-lb.shipdock Running Running 10 seconds ago
2dbenb1bci0u \_ health-check.6 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
pt7yf7ebwlp5 health-check.7 nginx:alpine cdev-r01n002-lb.shipdock Running Running 10 seconds ago
71g93zxon8yx \_ health-check.7 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
jmtsq0w8i59i health-check.8 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
tb7q0tq0izzl \_ health-check.8 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 16 seconds ago
r8n97vkrkf63 health-check.9 nginx:alpine cdev-r01n002-lb.shipdock Running Running about a minute ago
d9ipqlfsywt4 health-check.10 nginx:alpine cdev-r01n002-lb.shipdock Running Running 8 seconds ago
bl6appmeb9ye \_ health-check.10 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 15 seconds ago
9q5asbsxddyz health-check.11 nginx:alpine cdev-r01n002-lb.shipdock Running Running 12 seconds ago
3zqldue0cwov \_ health-check.11 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 19 seconds ago
sx1q30spukuo health-check.12 nginx:alpine cdev-r01n002-lb.shipdock Running Running about a minute ago
gf8elji3uj9g health-check.13 nginx:alpine cdev-r01n002-lb.shipdock Running Running 8 seconds ago
wisxy9a8hj1f \_ health-check.13 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 15 seconds ago
wdgi8heatk9k health-check.14 nginx:alpine cdev-r01n002-lb.shipdock Running Running 10 seconds ago
uzsn0x1ggkzk \_ health-check.14 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
q7b80ep11pzq health-check.15 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
xalmhfshb6mg \_ health-check.15 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
gahlpp9zym91 health-check.16 nginx:alpine cdev-r01n002-lb.shipdock Running Running 8 seconds ago
3gs91m7tcf27 \_ health-check.16 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 15 seconds ago
rly2zb1qdf54 health-check.17 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
3iq7y4588iti \_ health-check.17 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 16 seconds ago
wl0650ivcvgl health-check.18 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
bt84s1bgtx8p \_ health-check.18 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago
s094c05s5p6a health-check.19 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
rwpfwwa3daqd \_ health-check.19 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 16 seconds ago
rqhuwmak9pdj health-check.20 nginx:alpine cdev-r01n002-lb.shipdock Running Running 9 seconds ago
xhd6459w4zpr \_ health-check.20 nginx:alpine cdev-r01n002-lb.shipdock Shutdown Complete 17 seconds ago Describe the results you expected:
I expected docker node can handle more than 20 health-check per seconds but there're many errors.
Additional information you deem important (e.g. issue happens only occasionally):
It looks like health check fails when libcontainerd run queue is full
Output of docker version:
# docker version
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:29 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:10:29 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
# docker info
Containers: 104
Running: 10
Paused: 0
Stopped: 94
Images: 21
Server Version: 17.05.0-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: 6wtha553yeabvzvm2i1f57yqq
Is Manager: true
ClusterID: qwc97sxpolg33q2aahp5g1oe2
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.113.129.10
Manager Addresses:
10.113.129.10:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.7.2-Docker
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 46.74GiB
Name: cdev-r01n002-lb.shipdock
ID: DH3W:ODJJ:KY3X:GCUT:EYNM:EKZZ:EQLG:WVWK:S427:7WHV:GIGJ:EFWK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: bridge-nf-call-ip6tables is disabled
Additional environment details (AWS, VirtualBox, physical, etc.):
Physical Machine