-
Notifications
You must be signed in to change notification settings - Fork 88
Description
- This is a bug report
- This is a feature request
- I searched existing issues before opening this one
Expected behavior
The expected behavior is that Docker starts successfully if you set the default runtime.
Actual behavior
Docker fails to start.
Steps to reproduce the behavior
Install Docker CE on Ubuntu 18.04 as per the official docs.
Observe that the docker service starts successfully with systemctl status docker.service.
Set up /etc/docker/daemon.json as per the nvidia-container-runtime docs:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
Restart the service with systemctl restart docker.service and observe its failure:
$ sudo systemctl restart docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
Check the details with journalctl -u docker:
dec 21 12:51:15 foo001linux systemd[1]: Stopping Docker Application Container Engine...
dec 21 12:51:15 foo001linux dockerd[13609]: time="2020-12-21T12:51:15.941688599+01:00" level=info msg="Processing signal 'terminated'"
dec 21 12:51:15 foo001linux dockerd[13609]: time="2020-12-21T12:51:15.942673639+01:00" level=error msg="Sending SIGTERM to plugin failed with error: no such container"
dec 21 12:51:15 foo001linux dockerd[13609]: time="2020-12-21T12:51:15.942969089+01:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
dec 21 12:51:15 foo001linux dockerd[13609]: time="2020-12-21T12:51:15.943295639+01:00" level=info msg="Daemon shutdown complete"
dec 21 12:51:15 foo001linux systemd[1]: Stopped Docker Application Container Engine.
dec 21 12:51:15 foo001linux systemd[1]: Starting Docker Application Container Engine...
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.988506011+01:00" level=info msg="Starting up"
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.989081451+01:00" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resol
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.989656211+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.989672021+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.989685861+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.989694091+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.990297612+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.990319632+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.990342272+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:15 foo001linux dockerd[15126]: time="2020-12-21T12:51:15.990355202+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:16 foo001linux dockerd[15126]: panic: runtime error: invalid memory address or nil pointer dereference
dec 21 12:51:16 foo001linux dockerd[15126]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x56476e7cb5e1]
dec 21 12:51:16 foo001linux dockerd[15126]: goroutine 157 [running]:
dec 21 12:51:16 foo001linux dockerd[15126]: github.com/docker/docker/plugin/executor/containerd.(*Executor).Create(0xc000886840, 0xc0002a7e80, 0x40, 0xc0001d0050, 0x9, 0xc0008081e0, 0xc0009bca00, 0x0, 0x0, 0xc0
dec 21 12:51:16 foo001linux dockerd[15126]: /go/src/github.com/docker/docker/plugin/executor/containerd/containerd.go:71 +0xa1
dec 21 12:51:16 foo001linux dockerd[15126]: github.com/docker/docker/plugin.(*Manager).enable(0xc000806180, 0xc0001cfb80, 0xc000992460, 0x1, 0x5f, 0x0)
dec 21 12:51:16 foo001linux dockerd[15126]: /go/src/github.com/docker/docker/plugin/manager_linux.go:64 +0x5b9
dec 21 12:51:16 foo001linux dockerd[15126]: github.com/docker/docker/plugin.(*Manager).reload.func1(0xc0006dd980, 0xc000806180, 0xc000992460, 0xc0001cfb80)
dec 21 12:51:16 foo001linux dockerd[15126]: /go/src/github.com/docker/docker/plugin/manager.go:254 +0x25a
dec 21 12:51:16 foo001linux dockerd[15126]: created by github.com/docker/docker/plugin.(*Manager).reload
dec 21 12:51:16 foo001linux dockerd[15126]: /go/src/github.com/docker/docker/plugin/manager.go:214 +0x31d
dec 21 12:51:16 foo001linux systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
dec 21 12:51:16 foo001linux systemd[1]: docker.service: Failed with result 'exit-code'.
dec 21 12:51:16 foo001linux systemd[1]: Failed to start Docker Application Container Engine.
dec 21 12:51:18 foo001linux systemd[1]: docker.service: Service hold-off time over, scheduling restart.
dec 21 12:51:18 foo001linux systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.
dec 21 12:51:18 foo001linux systemd[1]: Stopped Docker Application Container Engine.
dec 21 12:51:18 foo001linux systemd[1]: Starting Docker Application Container Engine...
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.156373810+01:00" level=info msg="Starting up"
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.156761790+01:00" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resol
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.157316660+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.157330980+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.157350020+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.157359320+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.158229260+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.158262950+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.158288340+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:18 foo001linux dockerd[15181]: time="2020-12-21T12:51:18.158301400+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:18 foo001linux dockerd[15181]: panic: runtime error: invalid memory address or nil pointer dereference
dec 21 12:51:18 foo001linux dockerd[15181]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55ce99f5a5e1]
dec 21 12:51:18 foo001linux dockerd[15181]: goroutine 63 [running]:
dec 21 12:51:18 foo001linux dockerd[15181]: github.com/docker/docker/plugin/executor/containerd.(*Executor).Create(0xc000b871a0, 0xc000a34e80, 0x40, 0xc0001c6050, 0x9, 0xc00085e000, 0xc000a14980, 0x0, 0x0, 0xc0
dec 21 12:51:18 foo001linux dockerd[15181]: /go/src/github.com/docker/docker/plugin/executor/containerd/containerd.go:71 +0xa1
dec 21 12:51:18 foo001linux dockerd[15181]: github.com/docker/docker/plugin.(*Manager).enable(0xc000490180, 0xc0003f5340, 0xc000b853e0, 0x1, 0x5f, 0x55ce9cd69d40)
dec 21 12:51:18 foo001linux dockerd[15181]: /go/src/github.com/docker/docker/plugin/manager_linux.go:64 +0x5b9
dec 21 12:51:18 foo001linux dockerd[15181]: github.com/docker/docker/plugin.(*Manager).reload.func1(0xc000b5e4a0, 0xc000490180, 0xc000b853e0, 0xc0003f5340)
dec 21 12:51:18 foo001linux dockerd[15181]: /go/src/github.com/docker/docker/plugin/manager.go:254 +0x25a
dec 21 12:51:18 foo001linux dockerd[15181]: created by github.com/docker/docker/plugin.(*Manager).reload
dec 21 12:51:18 foo001linux dockerd[15181]: /go/src/github.com/docker/docker/plugin/manager.go:214 +0x31d
dec 21 12:51:18 foo001linux systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
dec 21 12:51:18 foo001linux systemd[1]: docker.service: Failed with result 'exit-code'.
dec 21 12:51:18 foo001linux systemd[1]: Failed to start Docker Application Container Engine.
dec 21 12:51:20 foo001linux systemd[1]: docker.service: Service hold-off time over, scheduling restart.
dec 21 12:51:20 foo001linux systemd[1]: docker.service: Scheduled restart job, restart counter is at 2.
dec 21 12:51:20 foo001linux systemd[1]: Stopped Docker Application Container Engine.
dec 21 12:51:20 foo001linux systemd[1]: Starting Docker Application Container Engine...
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.407439810+01:00" level=info msg="Starting up"
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.407844090+01:00" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resol
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.408370670+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.408384970+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.408403290+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.408412540+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.409249821+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.409274591+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.409294801+01:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock <nil> 0 <nil>}] <nil> <
dec 21 12:51:20 foo001linux dockerd[15237]: time="2020-12-21T12:51:20.409308561+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
dec 21 12:51:20 foo001linux dockerd[15237]: panic: runtime error: invalid memory address or nil pointer dereference
dec 21 12:51:20 foo001linux dockerd[15237]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x55660a4f75e1]
dec 21 12:51:20 foo001linux dockerd[15237]: goroutine 130 [running]:
dec 21 12:51:20 foo001linux dockerd[15237]: github.com/docker/docker/plugin/executor/containerd.(*Executor).Create(0xc000737860, 0xc0008c09c0, 0x40, 0xc00018a1b7, 0x9, 0xc0009d8000, 0xc000888260, 0x0, 0x0, 0xc0
dec 21 12:51:20 foo001linux dockerd[15237]: /go/src/github.com/docker/docker/plugin/executor/containerd/containerd.go:71 +0xa1
dec 21 12:51:20 foo001linux dockerd[15237]: github.com/docker/docker/plugin.(*Manager).enable(0xc00085c000, 0xc0008022c0, 0xc000804fa0, 0x1, 0x5f, 0x0)
dec 21 12:51:20 foo001linux dockerd[15237]: /go/src/github.com/docker/docker/plugin/manager_linux.go:64 +0x5b9
dec 21 12:51:20 foo001linux dockerd[15237]: github.com/docker/docker/plugin.(*Manager).reload.func1(0xc0008ac4f0, 0xc00085c000, 0xc000804fa0, 0xc0008022c0)
dec 21 12:51:20 foo001linux dockerd[15237]: /go/src/github.com/docker/docker/plugin/manager.go:254 +0x25a
dec 21 12:51:20 foo001linux dockerd[15237]: created by github.com/docker/docker/plugin.(*Manager).reload
dec 21 12:51:20 foo001linux dockerd[15237]: /go/src/github.com/docker/docker/plugin/manager.go:214 +0x31d
dec 21 12:51:20 foo001linux systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
dec 21 12:51:20 foo001linux systemd[1]: docker.service: Failed with result 'exit-code'.
dec 21 12:51:20 foo001linux systemd[1]: Failed to start Docker Application Container Engine.
dec 21 12:51:22 foo001linux systemd[1]: docker.service: Service hold-off time over, scheduling restart.
dec 21 12:51:22 foo001linux systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
dec 21 12:51:22 foo001linux systemd[1]: Stopped Docker Application Container Engine.
dec 21 12:51:22 foo001linux systemd[1]: docker.service: Start request repeated too quickly.
dec 21 12:51:22 foo001linux systemd[1]: docker.service: Failed with result 'exit-code'.
dec 21 12:51:22 foo001linux systemd[1]: Failed to start Docker Application Container Engine.
Now modify daemon.json to exclude the default-runtime setting:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Restart Docker with systemctl restart docker.service and observe that it starts successfully. If you put back the default runtime setting again, it fails again.
At this point you'll possibly also realize that Docker tries to restart itself too often, so even if you remove the the default runtime, you might not be able to restart Docker right away, because systemd blocks it for a while:
dec 21 12:56:10 foo001linux systemd[1]: Failed to start Docker Application Container Engine.
dec 21 12:56:12 foo001linux systemd[1]: docker.service: Start request repeated too quickly.
(Which hightlights another problem, that RestartSec=2 should definitely be increased to 15 seconds or so in /lib/systemd/system/docker.service, but this ticket is not about that, I just want to point it out so that you don't run into this, as I did. Always check the failure cause with journalctl -fu docker.service before concluding that your config is wrong.)
Now that you have successfully started Docker with the runtime being defined but without being set as the default, confirm that the runtime is actually operable when set explicitly during use:
$ docker run -it --rm nvidia/cuda:10.2-base
root@229a967be4dc:/# nvidia-smi
bash: nvidia-smi: command not found
root@229a967be4dc:/# exit
$ docker run -it --rm --runtime=nvidia nvidia/cuda:10.2-base
root@4736d5de3809:/# nvidia-smi
Mon Dec 21 12:06:16 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01 Driver Version: 440.95.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 1030 On | 00000000:26:00.0 On | N/A |
| 38% 48C P0 N/A / 30W | 1691MiB / 1998MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Check for the available docker-ce versions:
$ apt-cache madison docker-ce
docker-ce | 5:20.10.1~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:20.10.0~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
docker-ce | 5:19.03.14~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
...
Try downgrading to 20.10.0:
sudo apt install docker-ce=5:20.10.0~3-0~ubuntu-bionic
And observe that the Docker service still fails to start with the default runtime set. Downgrade to 19.03.14:
sudo apt install docker-ce=5:19.03.14~3-0~ubuntu-bionic
And observe that the Docker service starts successfully even with the default runtime set. Now your container output will behave as it should:
$ docker run -it --rm nvidia/cuda:10.2-base
root@2369cae10959:/# nvidia-smi
Mon Dec 21 12:31:12 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01 Driver Version: 440.95.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 1030 On | 00000000:26:00.0 On | N/A |
| 38% 51C P0 N/A / 30W | 1479MiB / 1998MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Instead of the previous bash: nvidia-smi: command not found error.
We've been using this configuration for over a year now. To me it seems like a regression in Docker CE 20. Please advise.
Output of docker version:
Client: Docker Engine - Community
Version: 20.10.1
API version: 1.41
Go version: go1.13.15
Git commit: 831ebea
Built: Tue Dec 15 04:34:59 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.1
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: f001486
Built: Tue Dec 15 04:32:40 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Output of docker info:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.0-docker)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 38
Server Version: 20.10.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-58-generic
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 31.37GiB
Name: adas001linux
ID: QE7M:3DIU:APRD:YDF2:XT4U:KBOI:JVFI:3ERX:P5QV:F6G4:EKWU:T2OP
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support
Additional environment details (AWS, VirtualBox, physical, etc.)
Ubuntu 18.04.5 with all updates installed, on several physical computers.