Summary
Every boxlite invocation after boxlite run -d triggers recovery, which misdetects the live shim as dead, deletes its PID file, and marks the box Stopped — without killing the actual shim process. Subsequent boxlite exec then spawns a fresh shim, leaving the original as an orphan that still holds host TCP ports (from image EXPOSE), vsock channels, file handles, and the box's full --memory allocation.
For most workloads this is a silent leak (RAM accumulates, top eventually shows N zombie boxlite-shim processes). For images with EXPOSE directives (docker:dind, anything with TCP ports), the leaked shim blocks the next box's gvproxy from binding the same host port → EADDRINUSE → gvproxy fails to create the virtual network → entire box has no outbound network (ARP probes go nowhere). This is what blocks test:integration:dind from passing on a host that has previously run any dind box.
Reproduction
boxlite run -d --name a docker:dind sleep infinity # spawns shim, binds host :2375 :2376
boxlite ls # recovery fires; pid file removed, box marked Stopped
# shim still alive, still binding :2375 :2376
boxlite rm -f a # state record removed; shim STILL alive
ss -tlnp | grep 2375 # → libkrun VM pid=XXXXX, orphan
# Now any dind box fails:
make test:integration:dind # → DNS timeout, ARP INCOMPLETE, build fails
Workaround until fixed: pkill -9 boxlite-shim between runs.
Root cause
Two pieces of code with an implicit contract that doesn't hold:
vmm/controller/spawn.rs:91 — shim is deliberately spawned without CLI args so secret config (sent via stdin pipe) never lands in world-readable /proc/<pid>/cmdline:
// 4. Build isolated command — no CLI args, config sent via stdin pipe
let no_args: &[String] = &[];
let mut cmd = jail.command(self.binary_path, no_args);
util/process.rs:283 — is_same_process_linux validates ownership by checking cmdline contains box_id:
args.iter().any(|arg| arg.contains("boxlite-shim")) && cmdline.contains(box_id)
// ^^^^^^^^^^^^^^^^^^^^^^^^^
// always false (cmdline has no args)
cmdline.contains(box_id) is always false because spawn never put box_id there → is_same_process always returns false for live shims → recovery at rt_impl.rs:1196 hits the else branch that deletes the pid file and marks the box Stopped. The shim itself is not signalled, so it keeps running, keeps holding its resources, and is no longer tracked by the runtime.
Box process dead, cleaned up stale PID file is the visible signature in the logs.
Impact
--memory-sized RAM leak per run -d + ls/exec: 4 dind boxes with --memory 2048 leak 8 GB even if user thinks they were all removed
- Per-box
exec after run -d silently spawns a fresh shim: in-memory state of original box is lost; user thinks they're talking to same box but they're not (acutely bad for dockerd-style stateful PID 1)
- Visible failure for
EXPOSE-having images: next box of same image (or with same port mapping) gets EADDRINUSE; gvproxy fails silently; entire VM has no outbound network; symptom looks like "boxlite doesn't support docker:dind" but is actually leak-driven port collision
Suggested fix
Pass box_id as the sole argv to shim (box_id is a short random identifier, not sensitive — unlike the config which keeps stdin transport):
// vmm/controller/spawn.rs
let args = [self.box_id.to_string()];
let mut cmd = jail.command(self.binary_path, &args);
shim's main doesn't need to read the arg; it's purely there so /proc/<pid>/cmdline carries it for is_same_process to validate. ~3 lines of code.
After this fix:
- Recovery correctly identifies live shims → no false
Stopped → no orphan creation on exec
boxlite stop/rm paths already work (they read pid file → SIGTERM → graceful)
test:integration:dind passes deterministically (no manual pkill needed between runs)
Verified locally: shim cmdline becomes /path/to/boxlite-shim <box_id>, recovery passes, repeated run -d/exec/rm cycles leave no orphans, make test:integration:dind passes on fresh host state without pkill between runs.
Summary
Every
boxliteinvocation afterboxlite run -dtriggers recovery, which misdetects the live shim as dead, deletes its PID file, and marks the boxStopped— without killing the actual shim process. Subsequentboxlite execthen spawns a fresh shim, leaving the original as an orphan that still holds host TCP ports (from imageEXPOSE), vsock channels, file handles, and the box's full--memoryallocation.For most workloads this is a silent leak (RAM accumulates,
topeventually shows N zombieboxlite-shimprocesses). For images withEXPOSEdirectives (docker:dind, anything with TCP ports), the leaked shim blocks the next box's gvproxy from binding the same host port →EADDRINUSE→ gvproxy fails to create the virtual network → entire box has no outbound network (ARP probes go nowhere). This is what blockstest:integration:dindfrom passing on a host that has previously run any dind box.Reproduction
Workaround until fixed:
pkill -9 boxlite-shimbetween runs.Root cause
Two pieces of code with an implicit contract that doesn't hold:
vmm/controller/spawn.rs:91— shim is deliberately spawned without CLI args so secret config (sent via stdin pipe) never lands in world-readable/proc/<pid>/cmdline:util/process.rs:283—is_same_process_linuxvalidates ownership by checking cmdline containsbox_id:cmdline.contains(box_id)is alwaysfalsebecause spawn never putbox_idthere →is_same_processalways returnsfalsefor live shims → recovery atrt_impl.rs:1196hits theelsebranch that deletes the pid file and marks the boxStopped. The shim itself is not signalled, so it keeps running, keeps holding its resources, and is no longer tracked by the runtime.Box process dead, cleaned up stale PID fileis the visible signature in the logs.Impact
--memory-sized RAM leak perrun -d+ls/exec: 4 dind boxes with--memory 2048leak 8 GB even if user thinks they were all removedexecafterrun -dsilently spawns a fresh shim: in-memory state of original box is lost; user thinks they're talking to same box but they're not (acutely bad for dockerd-style stateful PID 1)EXPOSE-having images: next box of same image (or with same port mapping) getsEADDRINUSE; gvproxy fails silently; entire VM has no outbound network; symptom looks like "boxlite doesn't support docker:dind" but is actually leak-driven port collisionSuggested fix
Pass
box_idas the sole argv to shim (box_idis a short random identifier, not sensitive — unlike the config which keeps stdin transport):shim's main doesn't need to read the arg; it's purely there so
/proc/<pid>/cmdlinecarries it foris_same_processto validate. ~3 lines of code.After this fix:
Stopped→ no orphan creation onexecboxlite stop/rmpaths already work (they read pid file → SIGTERM → graceful)test:integration:dindpasses deterministically (no manualpkillneeded between runs)Verified locally: shim cmdline becomes
/path/to/boxlite-shim <box_id>, recovery passes, repeatedrun -d/exec/rmcycles leave no orphans,make test:integration:dindpasses on fresh host state withoutpkillbetween runs.