fix(cli): stop box before hard-exit on non-zero run exit code by G4614 · Pull Request #622 · boxlite-ai/boxlite

G4614 · 2026-05-29T04:12:26Z

when the box killed badly (non-zero return code), the shim process should be released by RAII

appeared because of #604

Test plan

Existing exit-code tests (branch ⑥: non-zero exit)

The abnormal-exit boxlite run integration tests, each verified two-sided (run.rs reverted to std::process::exit vs. RAII applied) on this branch. Side B's leak surfaces via PerTestBoxHome::Drop panicking with live shim(s): [pid] — the std::process::exit shortcut bypasses RuntimeImpl::Drop → shutdown_sync.

test	side A (RAII)	side B (pre-fix `std::process::exit`)
`test_run_exit_code_125`	ok (8.14s)	FAILED — `live shim(s): [320092]`
`test_run_exit_code_custom`	ok (8.14s)	FAILED — `live shim(s)`
`test_run_signal_exit_code_sigterm`	ok (16.42s)	FAILED — `live shim(s): [323543]`
`test_run_signal_exit_code_sigkill`	ok (16.42s)	FAILED — `live shim(s)`
`test_run_signal_exit_code_sigint`	ok (16.42s)	FAILED — `live shim(s)`
`test_run_python_error_handling`	ok (8.06s)	FAILED — `live shim(s): [326872]`
`test_run_exit_code_success` (control)	ok (8.14s)	ok (17.85s) — exit 0 never hits the buggy branch

Focused reproducer for branch ⑥ (added on review)

test_run_rm_non_zero_exit_does_not_leak_shim runs the same scenario (run --rm alpine:latest sh -c 'exit 7') but scans <home>/boxes/*/shim.pid for live PIDs in the test body — so the no-leak assertion is visible at the call site instead of buried in PerTestBoxHome::Drop's panic.

step	code state	result
A	RAII fix applied	`test_run_rm_non_zero_exit_does_not_leak_shim ... ok` (5.79s)
B	`run.rs:93` reverted to `std::process::exit(to_shell_exit_code(exit_code))`	FAILED — `non-zero boxlite run --rm left live shim PID(s): [286477]`
C	RAII fix restored	`test_run_rm_non_zero_exit_does_not_leak_shim ... ok` (6.06s)

Other early-return branches in `BoxRunner::run`

Reviewer's request was "execution and litebox dropped automatically in any branch". The remaining CLI-reachable early-returns each get their own no-leak test:

branch	trigger	test	two-sided?
① `validate_flags?`	`--tty` with non-TTY stdin	`test_run_tty_error_in_pipe` (existing)	n/a — fails before any box is created, structurally cannot leak
② `create_box?`	image pull failure	`test_run_image_pull_failure_does_not_leak_shim`	n/a — pull fails before any shim is spawned, structurally cannot leak
③ `litebox.exec?`	invoking `/etc` (a directory)	`test_run_exec_setup_failure_does_not_leak_shim`	yes — see below
④ `detach return Ok(0)`	`-d` flag	`test_run_detach` (existing, manually rm's the box)	n/a — keeping the box alive is the intended behavior
⑤ `streamer.start?`	signal-handler init failure	none — effectively unreachable in any real CLI invocation; covered by `PerTestBoxHome::Drop`'s implicit guard + Rust's stack-unwinding guarantee
⑥ non-zero exit	command exits non-zero	tables above	yes
⑦ `Ok(to_shell_exit_code(0))`	command exits 0	`test_run_exit_code_success` + 30+ commands as side-effect	implicit
panic	internal invariant violated	none — guaranteed by Rust: panic unwinds the stack, Drop runs; tests would exercise the panic, not its cleanup

Two-sided verification for branch ③ (the only realistic injection point on a path where a shim is actually alive at the failure moment):

step	code state	result
A	RAII fix applied	`test_run_exec_setup_failure_does_not_leak_shim ... ok` (4.44s)
B	`litebox.exec().await?` replaced with `match { Err => std::process::exit(1) }` to inject the Drop-bypass pattern onto this branch	FAILED — `exec-setup failure left live shim PID(s): [395393]`
C	injection reverted	`test_run_exec_setup_failure_does_not_leak_shim ... ok` (5.22s)

Why this works

Pre-fix, a non-zero command exit took the std::process::exit shortcut that bypasses the box teardown the success path runs on return, leaking the microVM's shim (the source of #604's "orphan shims in /tmp"). Post-fix returns the exit code as Result<i32> and lets main return ExitCode; the runtime drops on every return path and RuntimeImpl::Drop → shutdown_sync SIGTERMs the shim — the same teardown the success path already relied on.

make fmt:check + cargo clippy -- -D warnings clean; pre-push CLI matrix 277 tests run: 277 passed, 0 skipped (~83s).

Summary by CodeRabbit

Bug Fixes
- Exit codes from executed commands are now properly propagated and returned by the CLI instead of being discarded.
- Improved shutdown and cleanup logic to ensure orphaned processes are not left behind after command execution.
Tests
- Added comprehensive test coverage to verify proper process cleanup across various scenarios, including non-zero exits and failure paths.

`boxlite run` propagated a non-zero command exit via std::process::exit, which skips Drop and the box's async auto-stop/auto-remove — leaking the box's shim as a live host process. The success path tears the box down via normal teardown when run() returns, but the non-zero path never reached it. Explicitly stop the box (kills the shim; removes it when --rm) before std::process::exit. Fixes the abnormal-exit run integration tests (exit_code_125/custom, signal_exit_code_{sigint,sigkill,sigterm}, python_error_handling) that tripped PerTestBoxHome's live-shim guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DorianZheng · 2026-06-01T07:27:03Z

        let exit_code = streamer.start().await?;
        // Exit with box's exit code
        if exit_code != 0 {
+            // Tear the box down before hard-exiting. std::process::exit skips


try to design RAII around this so we execution and litebox will be dropped automatically in any branch

changed to RAII mode, also wrote test for each branch to confirm it, thx

DorianZheng · 2026-06-01T07:27:34Z

+            // Drop and the async auto-stop/auto-remove, so a non-zero command
+            // exit would otherwise leak the box's shim as a live process (the
+            // success path stops the box via normal teardown when run returns).
+            drop(execution);


write a test case to cover this issue

test_run_rm_non_zero_exit_does_not_leak_shim added for this, thx

…ess::exit Address PR boxlite-ai#622 review (DorianZheng): redesign so execution/litebox/the owning BoxliteRuntime drop on every return path instead of relying on a manual stop call before std::process::exit. process::exit bypasses Drop entirely, which is exactly what leaked the box's shim on the non-zero path; the only true RAII fix is to never call it mid-command. run::execute and exec::execute now return Result<i32> (the shell exit code), main returns ExitCode, and run_cli funnels every command through the same dispatcher. When run_cli returns, BoxliteRuntime drops, and RuntimeImpl::Drop -> shutdown_sync() reaps the shim - the same teardown the success path already relied on. Adds an explicit reproducer: test_run_rm_non_zero_exit_does_not_leak_shim scans <home>/boxes/*/shim.pid in the test body (not just via PerTestBoxHome::Drop), so the assertion is visible at the call site. Two-side verified: pre-fix simulation fails with "live shim PID(s): [..]", post-fix passes; exposes test_utils::home::live_shim_pids as pub for that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… wrapper PR boxlite-ai#622 review follow-up: address the aesthetic drift introduced by the RAII fix (per-branch .map(|_| 0) noise and the ExitCode::from(... as u8) cast in main). - Every command's `pub async fn execute` (and `auth::run`) now returns `anyhow::Result<i32>`. Unit-success commands `Ok(0)` at the end; run/exec pass through the box's mapped shell exit code unchanged. Dispatcher in `run_cli` no longer needs `.map(|_| 0)` adapters. - `main` is back to `fn main()`. The tokio runtime is dropped explicitly before `process::exit(code)` so the BoxliteRuntime Drop chain (RuntimeImpl::Drop -> shutdown_sync) has already finished by then — the hazard called out in boxlite-ai#622 was `process::exit` *mid-command*, not in `main` after every stack frame has unwound. Verified: cargo check, 118 CLI unit tests, clippy -D warnings, fmt:check, and the focused leak repro (test_run_rm_non_zero_exit_does_not_leak_shim, ok 8.57s) all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walk back the scope-creep portion of 69df038. The shim-leak RAII fix only needs run.rs / exec.rs / main.rs — making the 15 unit-success commands also return Result<i32> was purely dispatcher cosmetics, and it had a type-honesty cost: a command like `boxlite cp` that has no exit-code concept ended up advertising one (always 0). This commit: - Restores `anyhow::Result<()>` + `Ok(())` on auth/cp/create/images/info/ inspect/list/logs/pull/restart/rm/serve/start/stats/stop. - Puts the 15 `.map(|_| 0)` adapters back in `run_cli`'s dispatcher, collocated so the asymmetry is visible at one site (`run`/`exec` real; others adapted). - Keeps main.rs's `fn main() { ... drop(rt); process::exit(code); }` simplification — that part isn't scope creep, it's the RAII fix. cargo check, 118 CLI unit tests, clippy -D warnings, fmt:check, test_run_rm_non_zero_exit_does_not_leak_shim (ok 8.22s) all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rn paths The focused boxlite-ai#622 reproducer (test_run_rm_non_zero_exit_does_not_leak_shim) only exercises the std::process::exit branch that actually leaked. Add two companion tests that pin the same RAII invariant onto the two other CLI-reachable early-return points in BoxRunner::run: - test_run_image_pull_failure_does_not_leak_shim (branch ②) — create_box? fails on a non-existent image; no shim ever spawns, but partial-VM state must drop cleanly. - test_run_exec_setup_failure_does_not_leak_shim (branch ③) — litebox.exec? fails (invoking a directory) after the box is fully running; Drop has to reap the live shim. Both pass under today's RAII fix (4.4 s each in parallel). These branches are not affected by the original boxlite-ai#622 bug (they use `?` rather than process::exit, so Drop always ran) — the value is forward-looking: if anyone introduces a Drop-bypass shortcut on these paths later, the tests fail. The remaining two early-exits in BoxRunner::run — streamer.start? (signal-handler init, effectively unreachable in normal CLI invocation) and a panic mid-`run()` — are left to Rust's stack-unwinding guarantee plus PerTestBoxHome::Drop's implicit guard, documented inline so the choice is visible. No mock-injection tests for those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DorianZheng

LGTM

coderabbitai · 2026-06-08T04:28:15Z

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 508db24b-3dce-442d-b340-9caddcf29377

📥 Commits

Reviewing files that changed from the base of the PR and between 2337d6b and 0a9093a.

📒 Files selected for processing (5)

src/cli/src/commands/exec.rs
src/cli/src/commands/run.rs
src/cli/src/main.rs
src/cli/tests/run.rs
src/test-utils/src/home.rs

📝 Walkthrough

Walkthrough

The PR refactors CLI commands to return exit codes instead of calling process::exit directly, enabling proper RAII cleanup of runtime and shim resources on all return paths. exec and run commands now return anyhow::Result<i32>, detach mode returns Ok(0), and non-detach paths convert exit codes via to_shell_exit_code. main.rs orchestrates the final process exit after runtime drop. Regression tests verify no shim leaks on non-zero and error returns.

Changes

CLI Exit Code Propagation and RAII Cleanup

Layer / File(s)	Summary
Command signature updates `src/cli/src/commands/exec.rs`, `src/cli/src/commands/run.rs`	`execute` and `BoxRunner::run` signatures changed to return `anyhow::Result<i32>` with updated documentation explaining RAII-safe cleanup instead of direct process exit.
exec command implementation `src/cli/src/commands/exec.rs`	Detach mode returns `Ok(0)`; non-detach path converts exit code via `to_shell_exit_code()` and returns it instead of calling `process::exit()`.
run command implementation `src/cli/src/commands/run.rs`	Detach mode returns `Ok(0)`; non-detach path converts exit code via `to_shell_exit_code()` and returns it, enabling runtime drop for shim cleanup on all paths.
Main CLI orchestration and routing `src/cli/src/main.rs`	`main` captures `i32` exit code from `run_cli`, explicitly drops Tokio runtime, and calls `process::exit()`. `run_cli` maps subcommands to exit codes and returns `i32`; error path prints anyhow chain and returns exit code 1.
Regression tests and shim leak detection `src/test-utils/src/home.rs`, `src/cli/tests/run.rs`	`live_shim_pids` utility made public with expanded documentation. Three new tests verify no shim leaks on non-zero exit, image pull failure, and exec setup failure paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Commands return codes, no more sudden demise,
Resources drop gracefully, a rightful reprise,
Shims clean up after, no leaks left behind,
RAII saves the day with a carefully designed mind! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main fix: preventing resource leaks by ensuring proper cleanup (stopping the box) before process exit, which is the core objective throughout the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

G4614 marked this pull request as ready for review May 29, 2026 06:20

G4614 force-pushed the fix/cli-run-rm-abnormal-exit branch from 94daf72 to 02df899 Compare June 1, 2026 04:22

DorianZheng reviewed Jun 1, 2026

View reviewed changes

G4614 marked this pull request as draft June 1, 2026 08:04

G4614 and others added 4 commits June 1, 2026 08:52

G4614 marked this pull request as ready for review June 1, 2026 12:22

DorianZheng approved these changes Jun 8, 2026

View reviewed changes

Merge branch 'main' into fix/cli-run-rm-abnormal-exit

0a9093a

DorianZheng merged commit 92c7308 into boxlite-ai:main Jun 8, 2026
23 of 24 checks passed

coderabbitai Bot mentioned this pull request Jun 10, 2026

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases #710

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): stop box before hard-exit on non-zero run exit code#622

fix(cli): stop box before hard-exit on non-zero run exit code#622
DorianZheng merged 6 commits into
boxlite-ai:mainfrom
G4614:fix/cli-run-rm-abnormal-exit

G4614 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

DorianZheng Jun 1, 2026

Uh oh!

G4614 Jun 1, 2026 •

edited

Loading

Uh oh!

DorianZheng Jun 1, 2026

Uh oh!

G4614 Jun 1, 2026 •

edited

Loading

Uh oh!

DorianZheng left a comment

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

G4614 commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Existing exit-code tests (branch ⑥: non-zero exit)

Focused reproducer for branch ⑥ (added on review)

Other early-return branches in BoxRunner::run

Why this works

Summary by CodeRabbit

Uh oh!

DorianZheng Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

G4614 Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DorianZheng Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

G4614 Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DorianZheng left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

G4614 commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Other early-return branches in `BoxRunner::run`

G4614 Jun 1, 2026 •

edited

Loading

G4614 Jun 1, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading