-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
Some goroutine hang at syscall openat /run/containerd/io.containerd.runtime.v2.task/k8s.io/xxx/log.
[root@VM]# ps aux | grep 14915 | grep -v gdb
root 14915 118 1.1 2036340 91152 ? Ssl 1/04 1276:16 /usr/local/bin/containerd
[root@VM]# for pid in `ps -T 14915 | awk '{ print $2 }' | grep -v SPID`; do cat /proc/$pid/syscall; done
202 0x5613ef954a08 0x80 0x0 0x0 0x0 0x0 0x7ffee5b5afd0 0x5613ecd41bb3
35 0x7f0ff7cfedb0 0x0 0x0 0x0 0x0 0x0 0x7f0ff7cfedb0 0x5613ecd4161d
202 0x5613ef971678 0x80 0x0 0x0 0x0 0x0 0x7f0ff5cfada0 0x5613ecd41bb3
257 0xffffffffffffff9c 0xc0001eac40 0x80000 0x0 0x0 0x0 0xc000145c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc000286780 0x80000 0x0 0x0 0x0 0xc00034bc58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0005d0760 0x80000 0x0 0x0 0x0 0xc0005e5c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0005d0520 0x80000 0x0 0x0 0x0 0xc0003bac58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0000b5000 0x80000 0x0 0x0 0x0 0xc000143c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0001eacc0 0x80000 0x0 0x0 0x0 0xc000146c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0000b53e0 0x80000 0x0 0x0 0x0 0xc00054fc58 0x5613ecdb5c5a257 => openat
0xffffffffffffff9c => AT_FDCWD
We got 0xc0001eac40 /proc/14915/fd/12 by gdb, and /proc/14915/fd/12 is /run/containerd/io.containerd.runtime.v2.task/k8s.io/3bb94c1b96dd65e1dc3dc5f991a6ed158dbb697d88286cf5e682b0d501b1ca03/log.
Steps to reproduce the issue:
We can produce this by creating and deleting pod quickly, but I'm not sure. I got it by eviction manager. We can reproduce it by kill shim manually.
Output of containerd --version:
containerd github.com/containerd/containerd v1.3.4
Any other relevant information:
Here is goroutine info:
7 syscall.Syscall6+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:44
syscall.openat+0x142 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:68
syscall.Open+0x80 /usr/local/go/src/syscall/syscall_linux.go:138
os.openFileNolog+0xba /usr/local/go/src/os/file_unix.go:201
os.OpenFile+0x98 /usr/local/go/src/os/file.go:284
github.com/containerd/containerd/vendor/github.com/containerd/fifo.OpenFifo.func2+0x45e /go/src/github.com/containerd/containerd/vendor/github.com/containerd/fifo/fifo.go:110
When containerd create a shim, it will call openShimLog to create the FIFO /run/containerd/io.containerd.runtime.v2.task/k8s.io/xxx/log and wait containerd-shim openLog for logrus.
func openShimLog(ctx context.Context, bundle *Bundle, _ func(string, time.Duration) (net.Conn, error)) (io.ReadCloser, error) {
return fifo.OpenFifo(ctx, filepath.Join(bundle.Path, "log"), unix.O_RDONLY|unix.O_CREAT|unix.O_NONBLOCK, 0700)
}fifo.OpenFifo will remove the flag O_NONBLOCK and call OpenFile with the only flag unix.O_RDONLY.
flag &= ^syscall.O_CREAT
flag &= ^syscall.O_NONBLOCK
file, err = os.OpenFile(fn, flag, 0)If shim never call openLog or died, some thread of containerd will hang on openat.
We do some fix on containerd and containerd/fifo both.