Skip to content

Shim leak on recovery: is_same_process rejects live shim, leaves orphan holding host ports #565

@G4614

Description

@G4614

Summary

Every boxlite invocation after boxlite run -d triggers recovery, which misdetects the live shim as dead, deletes its PID file, and marks the box Stoppedwithout killing the actual shim process. Subsequent boxlite exec then spawns a fresh shim, leaving the original as an orphan that still holds host TCP ports (from image EXPOSE), vsock channels, file handles, and the box's full --memory allocation.

For most workloads this is a silent leak (RAM accumulates, top eventually shows N zombie boxlite-shim processes). For images with EXPOSE directives (docker:dind, anything with TCP ports), the leaked shim blocks the next box's gvproxy from binding the same host port → EADDRINUSE → gvproxy fails to create the virtual network → entire box has no outbound network (ARP probes go nowhere). This is what blocks test:integration:dind from passing on a host that has previously run any dind box.

Reproduction

boxlite run -d --name a docker:dind sleep infinity  # spawns shim, binds host :2375 :2376
boxlite ls                                          # recovery fires; pid file removed, box marked Stopped
                                                    # shim still alive, still binding :2375 :2376
boxlite rm -f a                                     # state record removed; shim STILL alive
ss -tlnp | grep 2375                                # → libkrun VM pid=XXXXX, orphan

# Now any dind box fails:
make test:integration:dind  # → DNS timeout, ARP INCOMPLETE, build fails

Workaround until fixed: pkill -9 boxlite-shim between runs.

Root cause

Two pieces of code with an implicit contract that doesn't hold:

vmm/controller/spawn.rs:91 — shim is deliberately spawned without CLI args so secret config (sent via stdin pipe) never lands in world-readable /proc/<pid>/cmdline:

// 4. Build isolated command — no CLI args, config sent via stdin pipe
let no_args: &[String] = &[];
let mut cmd = jail.command(self.binary_path, no_args);

util/process.rs:283is_same_process_linux validates ownership by checking cmdline contains box_id:

args.iter().any(|arg| arg.contains("boxlite-shim")) && cmdline.contains(box_id)
//                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
//                                                    always false (cmdline has no args)

cmdline.contains(box_id) is always false because spawn never put box_id there → is_same_process always returns false for live shims → recovery at rt_impl.rs:1196 hits the else branch that deletes the pid file and marks the box Stopped. The shim itself is not signalled, so it keeps running, keeps holding its resources, and is no longer tracked by the runtime.

Box process dead, cleaned up stale PID file is the visible signature in the logs.

Impact

  • --memory-sized RAM leak per run -d + ls/exec: 4 dind boxes with --memory 2048 leak 8 GB even if user thinks they were all removed
  • Per-box exec after run -d silently spawns a fresh shim: in-memory state of original box is lost; user thinks they're talking to same box but they're not (acutely bad for dockerd-style stateful PID 1)
  • Visible failure for EXPOSE-having images: next box of same image (or with same port mapping) gets EADDRINUSE; gvproxy fails silently; entire VM has no outbound network; symptom looks like "boxlite doesn't support docker:dind" but is actually leak-driven port collision

Suggested fix

Pass box_id as the sole argv to shim (box_id is a short random identifier, not sensitive — unlike the config which keeps stdin transport):

// vmm/controller/spawn.rs
let args = [self.box_id.to_string()];
let mut cmd = jail.command(self.binary_path, &args);

shim's main doesn't need to read the arg; it's purely there so /proc/<pid>/cmdline carries it for is_same_process to validate. ~3 lines of code.

After this fix:

  • Recovery correctly identifies live shims → no false Stopped → no orphan creation on exec
  • boxlite stop/rm paths already work (they read pid file → SIGTERM → graceful)
  • test:integration:dind passes deterministically (no manual pkill needed between runs)

Verified locally: shim cmdline becomes /path/to/boxlite-shim <box_id>, recovery passes, repeated run -d/exec/rm cycles leave no orphans, make test:integration:dind passes on fresh host state without pkill between runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions