Skip to content

dockerd doesn't kill healthcheck processes after timeout #43737

@seleznev

Description

@seleznev

Description

dockerd already has logic to gracefully stop healthcheck processes after timeout (/daemon/exec.go#L277-L291).

But it seems completely broken because of using canceled context in daemon.containerd.SignalProcess() call (/daemon/exec.go#L279). SignalProcess() just returns context canceled error and does nothing.

Steps to reproduce the issue:

  1. Create Dockerfile:
    FROM ubuntu:22.04
    
    HEALTHCHECK --interval=5s --timeout=5s \
       CMD ["sleep", "3600"]
    
    CMD ["sleep", "infinity"]
    
  2. Build image:
    docker build --tag=healthcheck-test .
    
  3. Start container:
    docker run -d --rm --name=healthcheck-test healthcheck-test
    
  4. Wait some health intervals:
    sleep 30
    
  5. Check processes in the container:
    docker exec healthcheck-test ps axuf
    
  • Cleanup:
    docker rm --force healthcheck-test # remove container
    docker rmi healthcheck-test # remove image
    

Describe the results you received:

More then one sleep 3600 processes:

$ docker build --tag=healthcheck-test .
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM ubuntu:22.04
 ---> 27941809078c
Step 2/3 : HEALTHCHECK --interval=5s --timeout=5s    CMD ["sleep", "3600"]
 ---> Running in 248a9dcfaa6f
Removing intermediate container 248a9dcfaa6f
 ---> 16d09d0a1b09
Step 3/3 : CMD ["sleep", "infinity"]
 ---> Running in 55ef832b3170
Removing intermediate container 55ef832b3170
 ---> 7e8b71425a0a
Successfully built 7e8b71425a0a
Successfully tagged healthcheck-test:latest
$ docker run -d --rm --name=healthcheck-test healthcheck-test
41e8e2eb21d0bdd485e647c6ec1273474b19ba616d284d48d53ea607edd96841
$ sleep 30
$ docker exec healthcheck-test ps axuf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          25  0.0  0.0   7060  1664 ?        Rs   12:01   0:00 ps axuf
root          19  0.2  0.0   2788  1036 ?        Ss   12:01   0:00 sleep 3600
root          13  0.0  0.0   2788  1036 ?        Ss   12:00   0:00 sleep 3600
root           7  0.0  0.0   2788  1056 ?        Ss   12:00   0:00 sleep 3600
root           1  0.0  0.0   2788  1108 ?        Ss   12:00   0:00 sleep infinity

Describe the results you expected:

Zero or one sleep 3600 process:

$ docker exec healthcheck-test ps axuf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          25  0.0  0.0   7060  1664 ?        Rs   12:01   0:00 ps axuf
root          19  0.2  0.0   2788  1036 ?        Ss   12:01   0:00 sleep 3600
root           1  0.0  0.0   2788  1108 ?        Ss   12:00   0:00 sleep infinity

Additional information you deem important (e.g. issue happens only occasionally):

N/A

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:02:57 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:03 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 10
  Running: 1
  Paused: 0
  Stopped: 9
 Images: 119
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.13.0-51-generic
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.34GiB
 Name: uk-ubnt-61
 ID: BXVR:65LG:FDYH:IX7Q:U2LT:LQI6:P5B5:ZFEG:EIMS:WPWM:D3ND:RN4H
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 90
  Goroutines: 74
  System Time: 2022-06-22T15:03:06.064800737+03:00
  EventsListeners: 0
 Username: 2gis
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  docker-hub.2gis.ru:5444
  127.0.0.0/8
 Registry Mirrors:
  https://docker-registry-proxy.2gis.io/
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/claimedversion/20.10

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions