-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Issue with containerStop and long running tasks prematurely exiting #45731
Description
Description
It seems that behavior has changed in docker 23+ in regards to containerStop and stopping long running containers via the API.
Current behavior it seems when sending a POST to /stop?t=3600 the connection doesn't return a 204 until the container has fully stopped, so the connection in theory stays open for an hour. If this connection is dropped then the container is killed.
Here's an example log of initiating the request and cancelling the curl.
Jun 12 15:24:41 dg11 dockerd[1263909]: time="2023-06-12T15:24:41.686151349Z" level=debug msg="Calling POST /v1.41/containers/1e70fb6382cb/stop?t=3600"
Jun 12 15:24:41 dg11 dockerd[1263909]: time="2023-06-12T15:24:41.686276060Z" level=debug msg="Sending kill signal 15 to container 1e70fb6382cb3d323f92ba2477d9430ea870f341e61af5623d02dc5056d6af7e"
Jun 12 15:30:52 dg11 dockerd[1263909]: time="2023-06-12T15:30:52.095812601Z" level=info msg="Container failed to exit within 1h0m0s of signal 15 - using the force" container=1e70fb6382cb3d323f92ba2477d9430ea870f341e61af5623d02dc5056d6af7e
Jun 12 15:30:52 dg11 dockerd[1263909]: time="2023-06-12T15:30:52.095878637Z" level=debug msg="Sending kill signal 9 to container 1e70fb6382cb3d323f92ba2477d9430ea870f341e61af5623d02dc5056d6af7e"
Docker 20.10 seems to return the 204 status code before the container is stopped, as our nomad client times out sending the POST request on docker 23+.
Reproduce
Start a long running container that emulates something like Haproxy spinning down open connections and tell it to stop.
curl -v --unix-socket /var/run/docker.sock -X POST http://localhost/v1.43/containers/1e70fb6382cb/stop?t=3600
If you cancel the curl request the container will be killed immediately.
This is the client code we initially noticed the bug with, on docker 23+ the request times out, on 20.10 our stop behaves as normal. https://github.com/fsouza/go-dockerclient/blob/main/container_stop.go
### Expected behavior
Is this the expected behavior of this endpoint?
I plan on also opening an issue with the nomad team as well as this is the client I originally noticed the issue with, but wanted to see if this was expected behavior or not as it seems to have changed between docker versions.
### docker version
```bash
Client: Docker Engine - Community
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:52:22 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:52:22 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
nvidia:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 46
Running: 27
Paused: 0
Stopped: 19
Images: 56
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: 3lm7d688ghjrw9zr0sqpmkdrx
Is Manager: true
ClusterID: 87z2x0y7sdhccy5ch5wzepop7
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.0.5.10
Manager Addresses:
10.0.5.10:2377
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.4.0-137-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 125.8GiB
Name: dg11
ID: b1793812-e978-432b-a86d-5f548d7da15f
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 295
Goroutines: 535
System Time: 2023-06-12T15:32:00.856319114Z
EventsListeners: 1
Username:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
Live Restore Enabled: falseAdditional Info
No response