-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Description
Very rarely somewhen between creating a container and starting it, the containerd-shim fails miserably with a segmentation fault error. When trying to start it, the clients will receive a ttrpc: closed: unknown error.
The difficulty with debugging this problem is that it is very difficult to reproduce. Once it appears on a machine, it doesn't reappear for 10-12 hours (or unless some specific conditions happen).
The problem itself looks like some race condition or incorrect error handling down the road.
Steps to reproduce the issue:
- Run containers in a loop.
- Pray for the issue to appear.
Describe the results you received:
/var/log/syslog with containerd-shim debugging turned on:
containerd[1751]: time="2020-01-11T18:17:27Z" level=debug msg="registering ttrpc server"
containerd[1751]: time="2020-01-11T18:17:27Z" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
containerd[1751]: panic: runtime error: invalid memory address or nil pointer dereference
containerd[1751]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x459367]
containerd[1751]: goroutine 12 [running]:
containerd[1751]: strings.(*Builder).WriteString(...)
containerd[1751]: #011/usr/local/go/src/strings/builder.go:122
containerd[1751]: strings.Join(0xc000073a40, 0x2, 0x2, 0x73bdf4, 0x1, 0x1, 0xc000092400)
containerd[1751]: #011/usr/local/go/src/strings/strings.go:439 +0x145
containerd[1751]: path/filepath.join(0xc000073a40, 0x2, 0x2, 0x0, 0x0)
containerd[1751]: #011/usr/local/go/src/path/filepath/path_unix.go:45 +0xa7
containerd[1751]: path/filepath.Join(...)
containerd[1751]: #011/usr/local/go/src/path/filepath/path.go:210
containerd[1751]: github.com/containerd/containerd/runtime/v1/shim.shouldKillAllOnExit(0x0, 0x74, 0x0, 0x0, 0x0)
containerd[1751]: #011/go/src/github.com/containerd/containerd/runtime/v1/shim/service.go:546 +0x92
containerd[1751]: github.com/containerd/containerd/runtime/v1/shim.(*Service).checkProcesses(0xc000010240, 0xbf7ea22e2d95bbda, 0x435a6125, 0x9d6b60, 0x486c, 0x0)
containerd[1751]: #011/go/src/github.com/containerd/containerd/runtime/v1/shim/service.go:514 +0x56
containerd[1751]: github.com/containerd/containerd/runtime/v1/shim.(*Service).processExits(0xc000010240)
containerd[1751]: #011/go/src/github.com/containerd/containerd/runtime/v1/shim/service.go:498 +0xbe
containerd[1751]: created by github.com/containerd/containerd/runtime/v1/shim.NewService
containerd[1751]: #011/go/src/github.com/containerd/containerd/runtime/v1/shim/service.go:92 +0x4be
containerd[1751]: time="2020-01-11T18:17:28.799258075Z" level=error msg="not found"
containerd[1751]: time="2020-01-11T18:17:28.799251177Z" level=info msg="shim reaped" id=56c0c998e1d7d4b002fa2f1697188657eadc53135ceac2eb438bca6e96a18cb9
containerd[1751]: time="2020-01-11T18:17:28.799309696Z" level=warning msg="cleaning up after killed shim" id=56c0c998e1d7d4b002fa2f1697188657eadc53135ceac2eb438bca6e96a18cb9 namespace=moby
dockerd[1761]: time="2020-01-11T18:17:28.800822192Z" level=error msg="failed to delete task after fail start" container=56c0c998e1d7d4b002fa2f1697188657eadc53135ceac2eb438bca6e96a18cb9 error="not found" module=libcontainerd namespace=moby
containerd[1751]: time="2020-01-11T18:17:28.844152523Z" level=debug msg="event published" ns=moby topic="/containers/delete" type=containerd.events.ContainerDelete
containerd[1751]: time="2020-01-11T18:17:28.924443319Z" level=debug msg="event published" ns=moby topic="/tasks/exit" type=containerd.events.TaskExit
dockerd[1761]: time="2020-01-11T18:17:28.924855540Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
containerd[1751]: time="2020-01-11T18:17:28.928954904Z" level=debug msg="event published" ns=moby topic="/tasks/delete" type=containerd.events.TaskDelete
dockerd[1761]: time="2020-01-11T18:17:28.932231844Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/delete
dockerd[1761]: time="2020-01-11T18:17:28.932264274Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
dockerd[1761]: time="2020-01-11T18:17:29.000495852Z" level=error msg="56c0c998e1d7d4b002fa2f1697188657eadc53135ceac2eb438bca6e96a18cb9 cleanup: failed to delete container from containerd: no such container"
dockerd[1761]: time="2020-01-11T18:17:29.000556322Z" level=error msg="Handler for POST /v1.35/containers/56c0c998e1d7d4b002fa2f1697188657eadc53135ceac2eb438bca6e96a18cb9/start returned error: ttrpc: closed: unknown"
Describe the results you expected:
Containers should be created and started correctly.
Output of containerd --version:
containerd containerd.io 1.2.10 b34a5c8af56e510852c35414db4c1f4fa6172339
Any other relevant information:
Docker server version:
Server Version: 19.03.5