-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Hello! 👋
Problem
We are seeing the buildkitd logs full of the following lines:
level=debug msg="sending sigkill to process in container 7l2ykurlf4pziiza5pnf97raw"
level=error msg="failed to kill process in container id 7l2ykurlf4pziiza5pnf97raw: buildkit-runc did not terminate successfully: exit status 1: container not running\n"
The container id is always the same. These lines are repeated for many hours, sometimes days. The frequency is about 1000/minute (~15/s). We are seeing the buildkitd healthcheck fail, and it's not clear if the cause is these errors or if the errors are simply making it more difficult to determine the underlying cause.
A pod restart clears out the problem for a bit, but it generally happens again within a few hours.
Seems like the logs come from:
buildkit/executor/runcexecutor/executor.go
Lines 526 to 529 in f84cfe3
| bklog.G(ctx).Debugf("sending sigkill to process in container %s", k.id) | |
| defer func() { | |
| if err != nil { | |
| bklog.G(ctx).Errorf("failed to kill process in container id %s: %+v", k.id, err) |
Before the log spam starts I don't see anything particularlly interesting in the logs
debug time="2023-12-12T08:38:23Z" level=debug msg="removed snapshot" key=buildkit/10364/nekk455whng4seiauy6yz3w4k-view snapshotter=overlayfs
debug time="2023-12-12T08:38:23Z" level=debug msg="content garbage collected" d=29.017431ms
debug time="2023-12-12T08:38:23Z" level=debug msg="snapshot garbage collected" d=415.90514ms snapshotter=overlayfs
debug time="2023-12-12T08:38:23Z" level=debug msg="gc cleaned up 2099235886 bytes"
debug time="2023-12-12T08:39:20Z" level=debug msg="session finished: <nil>" spanID=e936167880750bc7 traceID=ec6a365de81b3b59b41ae2dc13e6e08c
debug time="2023-12-12T08:39:20Z" level=debug msg="sending sigkill to process in container 7l2ykurlf4pziiza5pnf97raw"
Details
Version: buildkit:v0.12.3-rootless (we also saw this problem with 0.11.0 before upgrade)
We run 4 buildkitd runner with --oci-worker-no-process-sandbox. All of these seem to manifest this behaviour at one time or another (not necessarily at the same time).
Any ideas about how to fix this? Thank you!