Sometimes we can see kubelets stuck in a restarting loop and never getting healthy again.
Kubelet restarts and recovers.
root@kube-node01:~# cat /proc/self/mountinfo | grep cgroup
30 21 0:26 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
31 30 0:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw
32 30 0:28 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,xattr,name=systemd
35 30 0:31 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,rdma
36 30 0:32 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,cpu,cpuacct
37 30 0:33 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,freezer
38 30 0:34 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,blkio
39 30 0:35 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,pids
40 30 0:36 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,cpuset
41 30 0:37 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,memory
42 30 0:38 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,net_cls,net_prio
43 30 0:39 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,hugetlb
44 30 0:40 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,perf_event
45 30 0:41 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,devices
945 25 0:159 / /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:606 - tmpfs tmpfs rw,mode=755
979 945 0:28 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,xattr,name=systemd
1471 945 0:31 / /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,rdma
1696 945 0:32 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,cpu,cpuacct
1714 945 0:33 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,freezer
2099 945 0:34 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,blkio
2345 945 0:35 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,pids
2390 945 0:36 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,cpuset
2409 945 0:37 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,memory
2428 945 0:38 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,net_cls,net_prio
2447 945 0:39 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,hugetlb
2466 945 0:40 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,perf_event
2485 945 0:41 /kubepods/pod2eb4976a-5001-4c2b-b5cf-df562549e3d4/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c /run/containerd/io.containerd.runtime.v1.linux/k8s.io/2e75a956e9db372ddf40ef4c32d148100010386140238c6173ad6e4d45fd8e1c/rootfs/sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,devices
What happened:
Sometimes we can see kubelets stuck in a restarting loop and never getting healthy again.
The kubelet at the end fails with
The issue seems to be a leak of mount information from our
csi-nodepluginpod.The issue is that
/proc/self/mountinfocontains cgroup mount information from the pod which should not be the case. Because of that the kubelet tries to use the path of the leaked mount as root cgroup, which is wrong.Because of that the pod/container in
Terminatingdid leak cgroup mount information to the host.Some time later the kubelet had a restart and was not able to recover
What you expected to happen:
Kubelet restarts and recovers.
How to reproduce it (as minimally and precisely as possible):
On a node:
Fix afterwards:
Anything else we need to know?:
I was able to "fix" it by unmounting the leaked cgroups:
Maybe related to Pods dead because of Failed to update QoS cgroup #49835
Could get partially fixed similar to https://github.com/google/cadvisor/pull/1792/files in https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/helpers_linux.go#L198
I did already test that but fixing this resulted in a ready kubelet, not able to reconcile pods:
In this case containerd still fails but I think this needs to be addressed in containerd.
Environment:
kubectl version):cat /etc/os-release):uname -a):Linux kube-node01 5.3.0-46-generic #38-Ubuntu SMP Fri Mar 27 17:37:05 UTC 2020 x86_64 x86_64 x86_64 GNU/LinuxcgroupfsExample content of non-working content in
/proc/self/mountinfo:On healthy nodes there are no entries having
/run/containerd/io.containerd.runtime.v1.linux/in the paths