Skip to content

Buildkitd logs "failed to kill process in container id <cid>: buildkit-runc did not terminate successfully: container not running" for hours #4483

@dnephin

Description

@dnephin

Hello! 👋

Problem

We are seeing the buildkitd logs full of the following lines:

level=debug msg="sending sigkill to process in container 7l2ykurlf4pziiza5pnf97raw"
level=error msg="failed to kill process in container id 7l2ykurlf4pziiza5pnf97raw: buildkit-runc did not terminate successfully: exit status 1: container not running\n"

The container id is always the same. These lines are repeated for many hours, sometimes days. The frequency is about 1000/minute (~15/s). We are seeing the buildkitd healthcheck fail, and it's not clear if the cause is these errors or if the errors are simply making it more difficult to determine the underlying cause.

A pod restart clears out the problem for a bit, but it generally happens again within a few hours.

Seems like the logs come from:

bklog.G(ctx).Debugf("sending sigkill to process in container %s", k.id)
defer func() {
if err != nil {
bklog.G(ctx).Errorf("failed to kill process in container id %s: %+v", k.id, err)

Before the log spam starts I don't see anything particularlly interesting in the logs

debug time="2023-12-12T08:38:23Z" level=debug msg="removed snapshot" key=buildkit/10364/nekk455whng4seiauy6yz3w4k-view snapshotter=overlayfs
debug time="2023-12-12T08:38:23Z" level=debug msg="content garbage collected" d=29.017431ms
debug time="2023-12-12T08:38:23Z" level=debug msg="snapshot garbage collected" d=415.90514ms snapshotter=overlayfs
debug time="2023-12-12T08:38:23Z" level=debug msg="gc cleaned up 2099235886 bytes"
debug time="2023-12-12T08:39:20Z" level=debug msg="session finished: <nil>" spanID=e936167880750bc7 traceID=ec6a365de81b3b59b41ae2dc13e6e08c
debug time="2023-12-12T08:39:20Z" level=debug msg="sending sigkill to process in container 7l2ykurlf4pziiza5pnf97raw"

Details

Version: buildkit:v0.12.3-rootless (we also saw this problem with 0.11.0 before upgrade)

We run 4 buildkitd runner with --oci-worker-no-process-sandbox. All of these seem to manifest this behaviour at one time or another (not necessarily at the same time).

Any ideas about how to fix this? Thank you!

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions