-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
Containerd sits in "activating" for >15m when Systemd unit type is set to "notify".
I noticed that, with 1.4, it's advised we set the Systemd unit type to "notify" from "simple". I beleive this is in order to allow separation of Containerd and Containerd-shims during outages of Containerd.
The problem is that, on reboot of the system, Containerd then takes a long time to come back up (and proportional to the amount of Containers). By default, Systemd will wait 90s so most times the unit will just timeout and be killed and marked as failed (which is really undesirable on a host that's running a lot of containers, for example a Kubernetes worker. I have set a 15m timeout on an example host and, when there are around 10 containers, the timeout is hit.
The following is logged during the "activation" stage of Systemd:
Mar 08 20:01:55 host-0 containerd[717]: time="2021-03-08T20:01:55.060887967Z" level=info msg="cleaning up dead shim"
Mar 08 20:01:55 host-0 containerd[717]: time="2021-03-08T20:01:55.466914487Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:01:55Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=4191\n"
Mar 08 20:03:35 host-0 containerd[717]: time="2021-03-08T20:03:35.481334647Z" level=warning msg="cleaning up after shim disconnected" id=1149c93d0f90807495a933b1ba012b10e44fe10c68f1c76a73633392af9a1c02 namespace=k8s.io
Mar 08 20:03:35 host-0 containerd[717]: time="2021-03-08T20:03:35.481424867Z" level=info msg="cleaning up dead shim"
Mar 08 20:03:35 host-0 containerd[717]: time="2021-03-08T20:03:35.521538862Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:03:35Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=5818\n"
Mar 08 20:05:15 host-0 containerd[717]: time="2021-03-08T20:05:15.530648356Z" level=warning msg="cleaning up after shim disconnected" id=1a91d938ed83ed6cbb66dafaffa7e3441e648908b70d36edc2c41bd98c84ff3e namespace=k8s.io
Mar 08 20:05:15 host-0 containerd[717]: time="2021-03-08T20:05:15.530791314Z" level=info msg="cleaning up dead shim"
Mar 08 20:05:15 host-0 containerd[717]: time="2021-03-08T20:05:15.566240491Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:05:15Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=7763\n"
Mar 08 20:06:55 host-0 containerd[717]: time="2021-03-08T20:06:55.574521945Z" level=warning msg="cleaning up after shim disconnected" id=1db7e5d07b75cc9e59cc3d4a9a1850a158074fe4b980a0e4ccaf317317149113 namespace=k8s.io
Mar 08 20:06:55 host-0 containerd[717]: time="2021-03-08T20:06:55.575039760Z" level=info msg="cleaning up dead shim"
Mar 08 20:06:55 host-0 containerd[717]: time="2021-03-08T20:06:55.612126081Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:06:55Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=9720\n"
Mar 08 20:08:35 host-0 containerd[717]: time="2021-03-08T20:08:35.624132908Z" level=warning msg="cleaning up after shim disconnected" id=38bb32a3e6b2b2878376095443d8e88db9d5d7b1a3b73ca78c935a4207cb3c13 namespace=k8s.io
Mar 08 20:08:35 host-0 containerd[717]: time="2021-03-08T20:08:35.624276841Z" level=info msg="cleaning up dead shim"
Mar 08 20:08:35 host-0 containerd[717]: time="2021-03-08T20:08:35.655040570Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:08:35Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=11634\n"
Mar 08 20:10:15 host-0 containerd[717]: time="2021-03-08T20:10:15.660781378Z" level=warning msg="cleaning up after shim disconnected" id=488602d4b1f896a7b57a1c26b9356ed23d302b74b151701d39b253f00041c948 namespace=k8s.io
Mar 08 20:10:15 host-0 containerd[717]: time="2021-03-08T20:10:15.660916684Z" level=info msg="cleaning up dead shim"
Mar 08 20:10:15 host-0 containerd[717]: time="2021-03-08T20:10:15.698839748Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:10:15Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=13504\n"
Mar 08 20:11:55 host-0 containerd[717]: time="2021-03-08T20:11:55.708836342Z" level=warning msg="cleaning up after shim disconnected" id=4a7aeab4dc4b762990804111cb413f90816533cd0fdc640fb907388001279f80 namespace=k8s.io
Mar 08 20:11:55 host-0 containerd[717]: time="2021-03-08T20:11:55.709178447Z" level=info msg="cleaning up dead shim"
Mar 08 20:11:55 host-0 containerd[717]: time="2021-03-08T20:11:55.738282838Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:11:55Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=15438\n"
Mar 08 20:13:35 host-0 containerd[717]: time="2021-03-08T20:13:35.747393245Z" level=warning msg="cleaning up after shim disconnected" id=546b9f9b4d2bb07b320c5f3d76190e4b3a8f17c31ed8f163837da4a6b180c686 namespace=k8s.io
Mar 08 20:13:35 host-0 containerd[717]: time="2021-03-08T20:13:35.747512919Z" level=info msg="cleaning up dead shim"
Mar 08 20:13:35 host-0 containerd[717]: time="2021-03-08T20:13:35.780239328Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:13:35Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=17372\n"
Mar 08 20:15:15 host-0 containerd[717]: time="2021-03-08T20:15:15.788328005Z" level=warning msg="cleaning up after shim disconnected" id=54a9b9856bec1182e67f4c89e6ca876112d2b5aa5e3ea9c94aeb6ed949731d3b namespace=k8s.io
Mar 08 20:15:15 host-0 containerd[717]: time="2021-03-08T20:15:15.788432145Z" level=info msg="cleaning up dead shim"
Mar 08 20:15:15 host-0 containerd[717]: time="2021-03-08T20:15:15.825718714Z" level=warning msg="cleanup warnings time=\"2021-03-08T20:15:15Z\" level=info msg=\"starting signal loop\" namespace=k8s.io pid=19226\n"
Unit file:
ExecStart=containerd
Restart=always
TimeoutStopSec=30
TimeoutStartSec=900
Type=notify
KillMode=process
KillSignal=SIGTERM
Delegate=true
Steps to reproduce the issue:
- Set type="notify" in the Systemd unit
- Reboot machine
Describe the results you received:
Timeouts due to notify, leading to containerd service being marked as failed.
Describe the results you expected:
Containerd to come back up.
What version of containerd are you using:
1.4.4