Skip to content

Some goroutine hang at syscall openat /run/containerd/io.containerd.runtime.v2.task/k8s.io/xxx/log #4905

@payall4u

Description

@payall4u

Description
Some goroutine hang at syscall openat /run/containerd/io.containerd.runtime.v2.task/k8s.io/xxx/log.

[root@VM]# ps aux | grep 14915 | grep -v gdb
root       14915  118  1.1 2036340 91152 ?       Ssl  1/04 1276:16 /usr/local/bin/containerd

[root@VM]# for pid in `ps -T 14915 | awk '{ print $2 }' | grep -v SPID`; do cat /proc/$pid/syscall; done
202 0x5613ef954a08 0x80 0x0 0x0 0x0 0x0 0x7ffee5b5afd0 0x5613ecd41bb3
35 0x7f0ff7cfedb0 0x0 0x0 0x0 0x0 0x0 0x7f0ff7cfedb0 0x5613ecd4161d
202 0x5613ef971678 0x80 0x0 0x0 0x0 0x0 0x7f0ff5cfada0 0x5613ecd41bb3
257 0xffffffffffffff9c 0xc0001eac40 0x80000 0x0 0x0 0x0 0xc000145c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc000286780 0x80000 0x0 0x0 0x0 0xc00034bc58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0005d0760 0x80000 0x0 0x0 0x0 0xc0005e5c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0005d0520 0x80000 0x0 0x0 0x0 0xc0003bac58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0000b5000 0x80000 0x0 0x0 0x0 0xc000143c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0001eacc0 0x80000 0x0 0x0 0x0 0xc000146c58 0x5613ecdb5c5a
257 0xffffffffffffff9c 0xc0000b53e0 0x80000 0x0 0x0 0x0 0xc00054fc58 0x5613ecdb5c5a

257 => openat
0xffffffffffffff9c => AT_FDCWD

We got 0xc0001eac40 /proc/14915/fd/12 by gdb, and /proc/14915/fd/12 is /run/containerd/io.containerd.runtime.v2.task/k8s.io/3bb94c1b96dd65e1dc3dc5f991a6ed158dbb697d88286cf5e682b0d501b1ca03/log.

Steps to reproduce the issue:
We can produce this by creating and deleting pod quickly, but I'm not sure. I got it by eviction manager. We can reproduce it by kill shim manually.

Output of containerd --version:

containerd github.com/containerd/containerd v1.3.4

Any other relevant information:
Here is goroutine info:
7 syscall.Syscall6+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:44
syscall.openat+0x142 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:68
syscall.Open+0x80 /usr/local/go/src/syscall/syscall_linux.go:138
os.openFileNolog+0xba /usr/local/go/src/os/file_unix.go:201
os.OpenFile+0x98 /usr/local/go/src/os/file.go:284
github.com/containerd/containerd/vendor/github.com/containerd/fifo.OpenFifo.func2+0x45e /go/src/github.com/containerd/containerd/vendor/github.com/containerd/fifo/fifo.go:110

When containerd create a shim, it will call openShimLog to create the FIFO /run/containerd/io.containerd.runtime.v2.task/k8s.io/xxx/log and wait containerd-shim openLog for logrus.

func openShimLog(ctx context.Context, bundle *Bundle, _ func(string, time.Duration) (net.Conn, error)) (io.ReadCloser, error) {
	return fifo.OpenFifo(ctx, filepath.Join(bundle.Path, "log"), unix.O_RDONLY|unix.O_CREAT|unix.O_NONBLOCK, 0700)
}

fifo.OpenFifo will remove the flag O_NONBLOCK and call OpenFile with the only flag unix.O_RDONLY.

flag &= ^syscall.O_CREAT
flag &= ^syscall.O_NONBLOCK

file, err = os.OpenFile(fn, flag, 0)

If shim never call openLog or died, some thread of containerd will hang on openat.

We do some fix on containerd and containerd/fifo both.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions