Skip to content

How many docker health checks can be handled in a single docker node ? #33933

@albamc

Description

@albamc

Hi.

Description

I'm testing how much docker health checks can be handled in a single docker node.
Basically I know that docker health check uses docker_exec and I think that the performance of docker_exec depends on how fast processes fork the system can do.
I've tested a simple script that calls wget on localhost with interval 1s, timeout 1s, with health check, and docker service with replicas set to 20. (20 health checks per second)
At some point, we found that all of the health check of the Containers failed. Of course, after re-scheduling, the health check will succeed, but it will be repeated again after a certain time.
I found that there're many logs like "context deadline exceeded" or "context cancelled"
Is this a normal cases when use docker health check or create 20 docker_exec per second is harsh conditions ?

Steps to reproduce the issue:

  1. My System
    CPU : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
    MEM: 48G
    DISTRIBUTION : CentOS Linux release 7.2.1511 (Core)
    KERNEL : 4.7.2 (main line)

  2. Health Check Command

# docker service create --health-cmd "wget -q -s http://localhost:80 || exit 1" --health-interval 1s --health-timeout 1s --name health-check --replicas 20 nginx:alpine

Describe the results you received:

# docker service ps health-check
ID                  NAME                  IMAGE               NODE                       DESIRED STATE       CURRENT STATE                ERROR               PORTS
kzn2jwipwy58        health-check.1        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
411mzt9ash7o         \_ health-check.1    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 16 seconds ago                          
i0oxjj7rwlot        health-check.2        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
dj05zx2oa6dv         \_ health-check.2    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
knr5d3smc4jk        health-check.3        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 8 seconds ago                            
raxzla6gpu5i         \_ health-check.3    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 15 seconds ago                          
i7pe6lpr1d9y        health-check.4        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
vpwinsaii1ay         \_ health-check.4    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 16 seconds ago                          
kl8sbv8pvsip        health-check.5        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 12 seconds ago                           
875vjel7ik3f         \_ health-check.5    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 19 seconds ago                          
sx16mfdgq3kw        health-check.6        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 10 seconds ago                           
2dbenb1bci0u         \_ health-check.6    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
pt7yf7ebwlp5        health-check.7        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 10 seconds ago                           
71g93zxon8yx         \_ health-check.7    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
jmtsq0w8i59i        health-check.8        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
tb7q0tq0izzl         \_ health-check.8    nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 16 seconds ago                          
r8n97vkrkf63        health-check.9        nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running about a minute ago                       
d9ipqlfsywt4        health-check.10       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 8 seconds ago                            
bl6appmeb9ye         \_ health-check.10   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 15 seconds ago                          
9q5asbsxddyz        health-check.11       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 12 seconds ago                           
3zqldue0cwov         \_ health-check.11   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 19 seconds ago                          
sx1q30spukuo        health-check.12       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running about a minute ago                       
gf8elji3uj9g        health-check.13       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 8 seconds ago                            
wisxy9a8hj1f         \_ health-check.13   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 15 seconds ago                          
wdgi8heatk9k        health-check.14       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 10 seconds ago                           
uzsn0x1ggkzk         \_ health-check.14   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
q7b80ep11pzq        health-check.15       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
xalmhfshb6mg         \_ health-check.15   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
gahlpp9zym91        health-check.16       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 8 seconds ago                            
3gs91m7tcf27         \_ health-check.16   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 15 seconds ago                          
rly2zb1qdf54        health-check.17       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
3iq7y4588iti         \_ health-check.17   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 16 seconds ago                          
wl0650ivcvgl        health-check.18       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
bt84s1bgtx8p         \_ health-check.18   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          
s094c05s5p6a        health-check.19       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
rwpfwwa3daqd         \_ health-check.19   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 16 seconds ago                          
rqhuwmak9pdj        health-check.20       nginx:alpine        cdev-r01n002-lb.shipdock   Running             Running 9 seconds ago                            
xhd6459w4zpr         \_ health-check.20   nginx:alpine        cdev-r01n002-lb.shipdock   Shutdown            Complete 17 seconds ago                          

Describe the results you expected:
I expected docker node can handle more than 20 health-check per seconds but there're many errors.

Additional information you deem important (e.g. issue happens only occasionally):
It looks like health check fails when libcontainerd run queue is full

Output of docker version:

# docker version
Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:29 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:29 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

# docker info
Containers: 104
 Running: 10
 Paused: 0
 Stopped: 94
Images: 21
Server Version: 17.05.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: 6wtha553yeabvzvm2i1f57yqq
 Is Manager: true
 ClusterID: qwc97sxpolg33q2aahp5g1oe2
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.113.129.10
 Manager Addresses:
  10.113.129.10:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.7.2-Docker
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 46.74GiB
Name: cdev-r01n002-lb.shipdock
ID: DH3W:ODJJ:KY3X:GCUT:EYNM:EKZZ:EQLG:WVWK:S427:7WHV:GIGJ:EFWK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical Machine

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions