Skip to content

Healthcheck consumes a lot of CPU with Windows Server containers #33096

@georgyturevich

Description

@georgyturevich

Hello,

I created a Docker image with a simple Python Flask web application which can response to http-request with configurable sleep time.

def hello_world():
    out = strftime("%Y-%m-%d %H:%M:%S", gmtime())
    out = out + '; Flask Dockerized; Sleeping: ' + os.environ['WEB_SLEEP_TIME'] + ' second(s) ... '

    sleep(float(os.environ['WEB_SLEEP_TIME']))
    
    out = out + strftime("%Y-%m-%d %H:%M:%S", gmtime())

    return out

This image also has a healtcheck with interval/timeout and retries attributes.

HEALTHCHECK --interval=30s --timeout=6s --retries=3 CMD powershell.exe -executionpolicy bypass -nologo -noprofile c:\app\healthcheck.ps1

where healthcheck.ps1 is a check of http request

try {
    curl http://localhost:5000/ -UseBasicParsing
    Write-Output 0
}
catch {
    Write-Output 1
}

I pushed this image to georgyturevich/win_healthcheck:healthcheck_win_flask. Full code you can find there https://github.com/georgyturevich/win_healthcheck/tree/master/image

I tried to perform three experiments:

  1. Run 120 containers without healtch (--no-healthcheck)
1..120 | foreach {
    echo "$(date): Starting $_";
    docker run -d -l "win_healthcheck=true" -e "WEB_SLEEP_TIME=0" --no-healthcheck georgyturevich/win_healthcheck:healthcheck_win_flask
    echo "$(date): Started $_. Sleeping ...";
}
  1. Run 120 containers with enabled healtcheck timeout (8 sec) lower then web page sleep time (10 seconds). So all containers were unhealthy and Docker Daemon tried to exec health check 1 times each 30 seconds.
1..120 | foreach {
    echo "$(date): Starting $_";
    docker run -d -l "win_healthcheck=true" -e "WEB_SLEEP_TIME=10" --health-timeout 8s georgyturevich/win_healthcheck:healthcheck_win_flask
    echo "$(date): Started $_. Sleeping ...";
}
  1. Run 120 containers with long healtcheck timeout (60 sec). So all containers were healthy and Docker Daemon tried to exec health check only 1 times each 60 seconds.
1..120 | foreach {
    echo "$(date): Starting $_";
    docker run -d -l "win_healthcheck=true" -e "WEB_SLEEP_TIME=0" --health-timeout 60s --health-interval 60s georgyturevich/win_healthcheck:healthcheck_win_flask
    echo "$(date): Started $_. Sleeping ...";
}

After all containers were started I got following constant CPU consumption:

  1. Without healtcheck: 2-7% (avg 3.68%)
  2. Unhealthy case with short health check timeout: 23-100% (avg ~ 87%)
  3. Healthy case with long timeout and interval: 23-96% (avg ~ 56%)

This is a server with 64 virtual CPU cores and 256Gb of Memory. I would say that the load from 2nd and 3d cases is very small, and must not create such big CPU consumption.

Thanks!

Output of docker version:

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Fri May  5 15:36:11 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.24)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Fri May  5 15:36:11 2017
 OS/Arch:      windows/amd64
 Experimental: false

Output of docker info:

Containers: 142
 Running: 120
 Paused: 0
 Stopped: 22
Images: 875
Server Version: 17.05.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: l2bridge l2tunnel nat null overlay transparent
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.953.amd64fre.rs1_release_inmarket.170303-1614)
Operating System: Windows Server 2016 Datacenter
OSType: windows
Architecture: x86_64
CPUs: 64
Total Memory: 256GiB
Name: EC2..skiped..C87
ID: VJVM:..skiped..:G6EE:UXQT
Docker Root Dir: E:\docker_storage_1_13
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 1249
 System Time: 2017-05-08T21:40:06.265291Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
It is AWS m4.16xlarge instance

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions