fix(go-sdk): fold stream drain into Execution.Wait (os/exec io.Writer case)#563
Conversation
boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9007541 to
e647bf3
Compare
boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e647bf3 to
63f4adf
Compare
A fast command over the cloud (Go runner) could return exit 0 with empty
stdout. When the WS attach drops before the runner flushes output (a proxy /
high-latency cut), attach_ws_pump probed GET /executions/{id}, saw
"completed", and emitted the exit code WITHOUT re-attaching — dropping the
stdout still sitting in the runner's replay backlog. Local `boxlite serve`
masked it (low latency, the exit frame always arrives in-band).
On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff)
so the runner replays its backlog and we leave via its authoritative exit
frame. Bounded to a single attempt with fallback to the probed exit code, so
a runner that never sends a closing exit frame can't hang the pump.
Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path,
so it reproduced on both main and boxlite-ai#563.
Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives
the WS pump against a mock that cuts the first attach, reports completed/0,
and serves the backlog only on re-attach. Without the fix it observes empty
stdout with a clean exit; with the fix it drains "hello\n".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
63f4adf to
8a22353
Compare
A fast command over the cloud (Go runner) could return exit 0 with empty
stdout. When the WS attach drops before the runner flushes output (a proxy /
high-latency cut), attach_ws_pump probed GET /executions/{id}, saw
"completed", and emitted the exit code WITHOUT re-attaching — dropping the
stdout still sitting in the runner's replay backlog. Local `boxlite serve`
masked it (low latency, the exit frame always arrives in-band).
On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff)
so the runner replays its backlog and we leave via its authoritative exit
frame. Bounded to a single attempt with fallback to the probed exit code, so
a runner that never sends a closing exit frame can't hang the pump.
Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path,
so it reproduced on both main and boxlite-ai#563.
Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives
the WS pump against a mock that cuts the first attach, reports completed/0,
and serves the backlog only on re-attach. Without the fix it observes empty
stdout with a clean exit; with the fix it drains "hello\n".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts 9 existing make test:integration test files (Rust + Python) into
the e2e suite where they exercise the real production chain
(SDK→API→Runner→VM) instead of the local PyO3/FFI path.
Existing make test:integration uses Boxlite::default() / Boxlite.default(),
which never touches NestJS API or boxlite-runner. Bugs in the proxy
controller, REST wire format, or runner CGO won't break those tests
even though they're broken in production. These ports run the same
behavioural contracts through REST so the regression coverage tracks
the actual production deployment.
9 new files in scripts/test/e2e/cases/:
test_lifecycle.py — create / list / get_info / remove CRUD
(from src/boxlite/tests/lifecycle.rs)
test_exec_options.py — cwd, env, tty exit-code surface
(from src/boxlite/tests/exec_options.rs)
test_execution_shutdown.py — wait resolves after box.stop();
exec on stopped box must NOT 5xx
(from src/boxlite/tests/execution_shutdown.rs)
test_shutdown.py — close() idempotent; second REST runtime
shares the same world view
(from src/boxlite/tests/shutdown.rs)
test_exec_unit_shape.py — round-trip exec result shape + types
(from sdks/python/tests/test_exec.py)
test_exec_timeout.py — timeout kills a sleeping process and a
SIGTERM-ignoring process
(from sdks/python/tests/test_exec_timeout_sigalrm.py)
test_box_management.py — named box, list_info round-trip, env-var
propagation client→guest
(from sdks/python/tests/test_box_management.py)
test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse
on non-TTY
(from sdks/python/tests/test_resize_tty.py)
test_errors.py — unknown image surfaces typed error; bad
bearer token → 401/403, not 500
(from sdks/python/tests/test_errors.py)
Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration
(some Rust cases use it).
Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a
boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure
(test_exec_on_stopped_box_is_typed_error) is a real production bug —
the API surfaces "Handle invalidated after stop()" with HTTP 500 +
code=11 instead of mapping it to a typed 4xx. Test stays failing
until that mapping fix lands; flipping to xfail would defeat the
point of the regression coverage.
The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.
…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).
Adds `scripts/test/e2e/` — a regression suite that exercises the **SDK → API → boxlite-runner → libkrun VM** chain, complementing `make test:integration` which runs only against the local PyO3/FFI path and therefore can't catch bugs that live in the REST proxy, runner CGO, or API error mapping. ## Test plan | observed | pre (without this PR) | post | |---|---|---| | coverage of REST → API → runner → VM chain | none — `make test:integration:*` constructs `Boxlite.default()` (local FFI), never opens an HTTP connection, never reaches the NestJS API or `boxlite-runner` | `make test:e2e` runs 25 pytest cases, every box created via `Boxlite.rest(opts)` against `http://localhost:3000/api` | | #563 (exec-stdout drop) detection | regression invisible to CI; `make test:integration:python` green even on stock 0.9.5 runner | `test_p0_6_exec_stdout_race` deterministically FAILS at ≥90% loss against stock runner, PASS at 0% against a runner rebuilt with #563 | | path-bypass detection | tests could silently fall back to local FFI and still pass | autouse fixture `verify_runner_saw_all_boxes` snapshots `journalctl -u boxlite-runner` per-test and asserts every box id created via `rt.create` appears in the runner journal; sanity-verified by injecting a fake box id which fires the teardown assertion with a clear path-bypass message | | bootstrap surface | not present; e2e wasn't ramped on | `scripts/test/e2e/bootstrap.sh` is idempotent and self-contained — installs postgres, redis, docker registry, Node.js 22 (NodeSource), yarn (corepack), python3-pip, the boxlite-runner release binary, systemd units, and writes `/etc/boxlite-api.env`. No AWS credentials, no S3 bucket, no IAM role required — `S3_ENDPOINT=""` makes `VolumeManager.constructor` early-return at `apps/api/src/sandbox/managers/volume.manager.ts:47` | | local-dev ergonomics | tests scattered across personal repos | `make test:e2e:setup` (bootstrap + fixture data) and `make test:e2e` (pytest); `make test:e2e:two-sided` takes `PR_REF=<branch>` and verifies a candidate fix removes the bug | | CI surface | none | `.github/workflows/e2e-stack.yml` triggers on push/PR touching `sdks/c/src/exec/**`, `src/boxlite/src/rest/**`, `apps/runner/**`, `apps/api/src/**`; needs a self-hosted runner labeled `kvm` because GitHub-hosted runners have no `/dev/kvm` | ### Cases shipped | group | files | source migrated from | |---|---|---| | meta path verification | `test_path_verification.py` (2 cases) | new — proves SDK is REST-mode against `:3000` and the runner journal saw the box id | | #563 regression | `test_p0_6_exec_stdout_race.py` (1 case, 5 box × 2 exec rounds) | new — race reproducer | | `make test:integration:*` ports | 9 files, 22 cases total | `src/boxlite/tests/{lifecycle,exec_options,execution_shutdown,shutdown}.rs` + `sdks/python/tests/{test_exec,test_exec_timeout_sigalrm,test_box_management,test_resize_tty,test_errors}.py` — all rewritten to use `Boxlite.rest()` | ### Local results (this branch checked out clean against current main) | runner build | result | |---|---| | stock 0.9.5 release | 23 PASS / 2 FAIL — `test_p0_6_exec_stdout_race` (race) + `test_exec_on_stopped_box_is_typed_error` (real API/runner mapping bug, see body below) | | rebuilt with #563 cherry-picked | 24 PASS / 1 FAIL — only the mapping bug remains | ### `test_exec_on_stopped_box_is_typed_error` — the lone FAIL is real This case asserts that POST `/exec` on a stopped box returns a 4xx-class status, not 5xx. The current runner code at `apps/runner/pkg/api/controllers/boxlite_exec.go:77` writes `http.StatusInternalServerError` unconditionally when `execManager.Start` fails: ```go ctx.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("exec failed: %s", err)}) ``` Every other handler in the same file (signal / resize / kill / status) routes through `classifyExecError(err)` instead, which already maps `ErrExecClosed` / `ErrExecReaping` to 409. `POST /exec` was missed, and `classifyExecError` itself never gained a case for `boxlite.ErrStopped`. Rust side `src/shared/src/errors.rs:165` already maps `BoxliteError::Stopped` to 409, so the bug is purely the Go runner not propagating that. This test stays failing until that two-line fix lands; flipping it to `xfail` would defeat the regression point. ### Branch state | | value | |---|---| | base | `main` (a438d6b — origin/main HEAD) | | commits | 6 (all `test(e2e):*`, no production code touched) | | files | 13 new + 2 small additions (`make/test.mk`, `make/help.mk`) | | net lines | +1170 |
Existing make test:integration:* uses the local PyO3/FFI path (`Boxlite.default()`) and bypasses both the NestJS API and boxlite-runner. Bugs that surface only on the REST → API → runner chain — e.g. boxlite-ai#563's exec-stdout drop, boxlite-ai#627's attach re-drain — leak into production with the unit + integration suites all green. This suite always uses `Boxlite.rest(opts)` against the local API and proves the path on every run. Layout (scripts/test/e2e/): bootstrap.sh Install PG, Redis, registry, mountpoint-s3, runner binary, systemd units. Idempotent. fixture_setup.py Register alpine:3.23 + ubuntu:22.04 snapshots (waits for active), patch admin org quota, write [profiles.p1] in ~/.boxlite/credentials.toml. Idempotent. cases/conftest.py rt / image / box fixtures — REST-only. cases/test_path_verification.py Meta-test: asserts the runtime is REST-mode against :3000, and one round-trip exec leaves the runner journal with the box id. Catches the case where SDK silently degrades to local FFI or talks to the wrong endpoint. cases/test_p0_6_exec_stdout_race.py First real regression case — runs N boxes × 2 execs and asserts loss rate ≤ 5%. Against stock 0.9.5 runner: ~80–90% loss (FAIL). Against a runner rebuilt with boxlite-ai#563: 0% loss (PASS). lib/path_verification.py Helpers used by the meta-test. two_sided.sh Builds runner from MAIN_REF, runs pytest (expect FAIL=1), builds from PR_REF, runs pytest (expect PASS=0). Only verdict 1+0 is "PR removes the bug + test detects it". Restores original binary on any exit. Make targets (make/test.mk): make test:e2e:setup Bootstrap services + fixture data. make test:e2e Run the suite. make test:e2e:two-sided PR_REF=<branch> required. CI workflow (.github/workflows/e2e-stack.yml): Triggers on push/PR touching sdks/c/src/exec/, src/boxlite/src/rest/, apps/runner/, apps/api/src/. Requires a self-hosted runner labeled `kvm` because GitHub-hosted runners lack /dev/kvm and cannot run libkrun. Verified locally on alpine:3.23, py3.14, against both 0.9.5 stock runner (1 FAIL of 3) and a runner rebuilt from boxlite-ai#563 (3 PASS of 3).
Lifts 9 existing make test:integration test files (Rust + Python) into
the e2e suite where they exercise the real production chain
(SDK→API→Runner→VM) instead of the local PyO3/FFI path.
Existing make test:integration uses Boxlite::default() / Boxlite.default(),
which never touches NestJS API or boxlite-runner. Bugs in the proxy
controller, REST wire format, or runner CGO won't break those tests
even though they're broken in production. These ports run the same
behavioural contracts through REST so the regression coverage tracks
the actual production deployment.
9 new files in scripts/test/e2e/cases/:
test_lifecycle.py — create / list / get_info / remove CRUD
(from src/boxlite/tests/lifecycle.rs)
test_exec_options.py — cwd, env, tty exit-code surface
(from src/boxlite/tests/exec_options.rs)
test_execution_shutdown.py — wait resolves after box.stop();
exec on stopped box must NOT 5xx
(from src/boxlite/tests/execution_shutdown.rs)
test_shutdown.py — close() idempotent; second REST runtime
shares the same world view
(from src/boxlite/tests/shutdown.rs)
test_exec_unit_shape.py — round-trip exec result shape + types
(from sdks/python/tests/test_exec.py)
test_exec_timeout.py — timeout kills a sleeping process and a
SIGTERM-ignoring process
(from sdks/python/tests/test_exec_timeout_sigalrm.py)
test_box_management.py — named box, list_info round-trip, env-var
propagation client→guest
(from sdks/python/tests/test_box_management.py)
test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse
on non-TTY
(from sdks/python/tests/test_resize_tty.py)
test_errors.py — unknown image surfaces typed error; bad
bearer token → 401/403, not 500
(from sdks/python/tests/test_errors.py)
Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration
(some Rust cases use it).
Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a
boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure
(test_exec_on_stopped_box_is_typed_error) is a real production bug —
the API surfaces "Handle invalidated after stop()" with HTTP 500 +
code=11 instead of mapping it to a typed 4xx. Test stays failing
until that mapping fix lands; flipping to xfail would defeat the
point of the regression coverage.
The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.
…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).
Replaces ad-hoc `rt.create + try/finally + rt.remove` patterns in 8 of
the 15 case files with two shared fixtures defined in conftest:
- `box` — single-box tests; supports `@pytest.mark.image(tag)`
override; auto-remove + explicit force-remove in
finally (belt and braces).
- `box_factory` — multi-box tests; tracks every id and force-removes
all on teardown. `setdefault('auto_remove', True)`
so callers can flip to False without a kwarg clash
(needed for execution_shutdown tests that stop the
box mid-flow).
Three files keep their hand-rolled lifecycle because the option surface
required would defeat the fixture (test_create_named_box uses the
runtime-level `name=` arg; test_errors verifies a half-created box does
not leak; test_close_is_idempotent builds its own throwaway runtime).
Net pass/fail unchanged (26/6 vs prior 24/8 — small shifts come from the
underlying boxlite-ai#563 race window, not the refactor): the same two real bugs
(boxlite-ai#563 stdout drop, Stopped→500 mapping) are still caught.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.
Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.
Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.
Deterministic ordering tests added inline in execution.rs cover:
- stdout-only exec: all chunks land before Exit
- mixed stdout/stderr: every chunk lands before Wait
- process exit before pump completion: terminal event waits
- pump completion before process exit: terminal event fires
immediately (no spurious wait)
Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.
Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.
Split out of boxlite-ai#681 — this is **PR D of 4**:
A (boxlite-ai#682) reorganize
B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
D (this) FFI exec drain race fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Head branch was pushed to by a user without write access
8a22353 to
f11e469
Compare
7b672df to
e9c884d
Compare
e9c884d to
e048443
Compare
…ipe-style)
Go SDK's box.Exec returns with stdout chunks still in flight — the
caller reads the buffered Stdout sink before the trailing stream
callbacks have delivered them. Stock 0.9.5: 18-26 / 50 truncated;
post-fix: 50/50 complete.
Split the os/exec Wait responsibility so the bare terminal stays a
reap (StdoutPipe-style) and the buffered-sink wrappers compose the
drain themselves:
Execution.Wait : reap only. No `<-drained` block. Callback ordering
contract still holds — C's exit_pump gates on_exit
on every stream pump's done_tx, so OnStdout/OnStderr
callbacks naturally fire before the
executionStreamState.drained channel closes.
Cmd.Run : reap (via Execution.Wait) + post-reap `<-drained`
select. c.Stdout / c.Stderr are user-owned
io.Writer sinks the caller reads immediately after
Run returns, so completeness is guaranteed at the
wrapper boundary, not inside the SDK terminal.
box.Exec : same — owns bytesCollector sinks; same drain
compose.
Mirrors os/exec.Cmd's StdoutPipe-case Wait semantic at the
Execution.Wait level (caller is responsible for ensuring output sinks
have drained before reading them), while keeping the os/exec.Cmd.Run
"output complete on return" contract at the Cmd.Run / box.Exec
wrapper level.
C FFI unchanged.
Verification (sdks/go/exec_integration_test.go, TestIntegrationExecStdoutRace):
Runs `echo marker-line` 20x via Cmd.Run without the `&& sleep 0.1`
drainPad workaround that TestIntegrationExecEnvWorkingDirTimeout uses.
Counts how many rounds got an empty stdout buffer.
- Side A (Cmd.Run's post-Wait `<-drained` select removed):
8/20 truncated to "", FAIL.
- Side B (this commit applied):
20/20 complete, PASS.
§5 no-regression suite green: TestExecutionStreamState_HandleDeletedOnExit,
TestHandleOwnership_NoDoubleDeleteRaceDuringClose, TestAbandonAsync* (×6),
TestDrainAndDelete* (×3), TestClaimOrFreePayload_* (×2).
E2E (test_p0_6_exec_stdout_race.py) passes both sides — the Python
path goes through the runner's streamBus which buffers stdout in
backlog regardless of close ordering, so it doesn't surface the race.
The Go SDK direct test path does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e048443 to
6a87d30
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
sdks/go/exec.go (2)
320-322: ⚡ Quick winDocument the stronger return contract on
Box.ExecandCmd.Run.
Execution.Waitnow makes the contract split explicit, so the public wrapper docs should also say that they wait for buffered stdout/stderr delivery before returning. As per coding guidelines, "Write comprehensive docstrings for all public functions and classes".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdks/go/exec.go` around lines 320 - 322, Update the public docstrings for Box.Exec and Cmd.Run to state their stronger return contract: they wait for buffered stdout/stderr delivery (i.e., stream drains) before returning, matching the split made explicit by Execution.Wait; mention that these wrappers compose reaping with stream-drain at the wrapper boundary so callers can rely on outputs being flushed when the call returns.Source: Coding guidelines
96-104: ⚡ Quick winFix the stale
Wait/drain contract comments.Both comments still describe
Execution.Waitas the place that waits on or composes the drain barrier, butWaitnow delegates straight toreap. Leaving this as-is makes the file self-contradictory and points maintainers at the wrong API contract.Also applies to: 327-330
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdks/go/exec.go` around lines 96 - 104, Update the stale comment on the drained chan to reflect the current contract: explain that Execution.Wait no longer composes the drain barrier and instead delegates to reap, and that drained is closed by deliverExit/on_exit to signal the drain goroutine (used by reap) so callers using buffered Stdout/Stderr won't see truncated buffers; also update the analogous comment near the other occurrence (the comment block around drained at the later section) to match this same behavior and mention the relationship between deliverExit, the drain goroutine, and reap rather than Wait.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@sdks/go/exec.go`:
- Around line 201-205: The select that waits on
execution.streamState.drained/execution.closing must also respect the caller
context: update the wait logic in Execution.Wait (and the analogous block at the
other location) to include a case <-ctx.Done() so the wait returns with the
context error instead of blocking forever; ensure the returned error is
ctx.Err() (or wrapped) so callers receive cancellation/deadline information and
verify deliverExit still closes execution.streamState.drained on completion.
---
Nitpick comments:
In `@sdks/go/exec.go`:
- Around line 320-322: Update the public docstrings for Box.Exec and Cmd.Run to
state their stronger return contract: they wait for buffered stdout/stderr
delivery (i.e., stream drains) before returning, matching the split made
explicit by Execution.Wait; mention that these wrappers compose reaping with
stream-drain at the wrapper boundary so callers can rely on outputs being
flushed when the call returns.
- Around line 96-104: Update the stale comment on the drained chan to reflect
the current contract: explain that Execution.Wait no longer composes the drain
barrier and instead delegates to reap, and that drained is closed by
deliverExit/on_exit to signal the drain goroutine (used by reap) so callers
using buffered Stdout/Stderr won't see truncated buffers; also update the
analogous comment near the other occurrence (the comment block around drained at
the later section) to match this same behavior and mention the relationship
between deliverExit, the drain goroutine, and reap rather than Wait.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e908eafb-2046-43e0-840b-f8b17a724264
📒 Files selected for processing (2)
sdks/go/exec.gosdks/go/exec_integration_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- sdks/go/exec_integration_test.go
| select { | ||
| case <-execution.streamState.drained: | ||
| case <-execution.closing: | ||
| return nil, ErrRuntimeClosed | ||
| } |
There was a problem hiding this comment.
Honor ctx while waiting for stream drain.
Once Execution.Wait returns, both wrappers stop listening to ctx.Done(). If the final stdout/stderr delivery blocks in a caller-provided writer or callback, deliverExit never closes drained and Exec/Run can hang past the caller's deadline/cancellation.
💡 Suggested fix
select {
case <-execution.streamState.drained:
+ case <-ctx.Done():
+ return nil, ctx.Err()
case <-execution.closing:
return nil, ErrRuntimeClosed
} select {
case <-execution.streamState.drained:
+ case <-ctx.Done():
+ return ctx.Err()
case <-execution.closing:
return ErrRuntimeClosed
}Also applies to: 571-575
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sdks/go/exec.go` around lines 201 - 205, The select that waits on
execution.streamState.drained/execution.closing must also respect the caller
context: update the wait logic in Execution.Wait (and the analogous block at the
other location) to include a case <-ctx.Done() so the wait returns with the
context error instead of blocking forever; ensure the returned error is
ctx.Err() (or wrapped) so callers receive cancellation/deadline information and
verify deliverExit still closes execution.streamState.drained on completion.
|
clean, glad it's just go now. small thing: the drain only happens in Cmd.Run and box.Exec, so Execution.Wait on its own still returns before the pumps finish. if someone passes their own Stdout writer and calls Wait directly, they still get a short read: exec, _ := box.StartExecution(ctx, ExecutionOptions{Command: ..., Stdout: &buf})
exec.Wait(ctx)
buf.String() // still truncatedour streams are always pushed into a writer (theres no StdoutPipe to read yourself), so Wait is really the io.Writer case and should block until the bytes are delivered, the same way os/exec's Wait does. could you move the also the title still says fix(ffi) but theres no rust left in here now — fix(go-sdk) fits better. |
Address Dorian's review on 6a87d30: the wrapper-only drain leaves the direct Execution.Wait path broken — a caller passing their own Stdout io.Writer and calling Wait directly still gets a truncated buffer. BoxLite's streams are always pushed into a Writer (no StdoutPipe-style user-read pipe), so every Execution IS the os/exec io.Writer case: Wait is the single terminal and must guarantee output completeness on return. Move the `<-drained` select into Execution.Wait right after the reap. Drop the duplicate selects from Cmd.Run and box.Exec — they already get the strong contract via Wait. Mirrors os/exec.Cmd.Wait's Process.Wait → awaitGoroutines sequence (src/os/exec/exec.go:989-995): one terminal, internally reaps then drains. The post-reap drain is non-cancelable by ctx (parity); only runtime shutdown breaks it, preserving the reap's exit code and overwriting err only if the reap had none. C FFI untouched. Drained signal still lives in executionStreamState.drained (closed by deliverExit when the C on_exit fires, which the C exit_pump gates on every stream pump's done_tx). Two-side verification (TestIntegrationExecStdoutRace via Cmd.Run): - Side A (Execution.Wait stripped to pure reap): 4/20 truncated, FAIL. - Side B (this commit): 20/20 complete, PASS. §5 no-regression suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After running the suite end-to-end: - test_detached_box_survives_cli_exit_and_is_reusable used `boxlite info <id>` for per-box state. The CLI's `info` is system- wide (version / runtime stats) — there's no per-box info command — so the call was always `unexpected argument` exit 2. Replaced step 3 with `boxlite ls`-row parsing + a state-keyword check. Marked xfail(strict=True) because step 3's exec hits the stdout-drop race that boxlite-ai#563 fixes (same as the reattach case below). - test_reattach_after_original_completes already broken by the same stdout race — short `echo X && exit 0` returns out=''. Marked xfail(strict=True) pointing at boxlite-ai#563. - test_readonly_volume_mount_flag_and_write_reject expected a host bind mount to surface inside the guest. The cloud runtime deliberately dropped host bind mounts (boxlite-ai#639 "remove host bind mounts; only managed volumes allowed"), so this case was testing a feature that doesn't exist by design. Flipped to the negative contract — REST callers passing host paths get them silently dropped, no /mnt/<x> ever surfaces. Pin: boxlite-ai#639's product direction. Net: 3 pass + 2 xfail(strict). When boxlite-ai#563 lands, both xfails flip xpass-strict and become regular passes (markers should be dropped at that point). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review: the reap method introduced earlier added a private indirection that was only called from Wait. Inline its body so Wait has the full sequence in one place (boxlite_execution_wait kick, context/closing escape, then the stream-drain barrier) without adding a new method. Behaviour unchanged: ctx-cancel and runtime-closing during the process-wait still skip the drain barrier (same as before); successful exit still blocks on streamState.drained before returning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#710) add 5 e2e cases pinning REST contracts not covered today ## Coverage gaps Three classes of behaviour run through the SDK → API → Runner → VM chain but had no e2e pin on the chain itself; bugs in any one of them would silently regress through `make test:integration:*` (which only exercises local-FFI): ``` 1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits, does a fresh CLI process still see / exec the box? FFI side src/boxlite/tests/detach.rs, recovery.rs ✓ E2E side scripts/test/e2e/cases/ ✗ 2. Execution.attach / reattach contract — bogus exec_id should be a typed client error; reattach to a completed exec should return a usable handle. Runner side apps/runner/.../boxlite_exec_attach_test.go ✓ SDK <-> API end-to-end ✗ 3. host bind mount via REST — the cloud runtime intentionally dropped host bind mounts (#639); REST callers passing host paths must silently no-op, no /mnt/<x> in the guest. FFI surface src/boxlite/tests/mount_security.rs ✓ REST contract ✗ ``` ## Cases shipped ``` scripts/test/e2e/cases/test_cli_detach_recovery.py (2 cases) scripts/test/e2e/cases/test_exec_attach.py (2 cases) scripts/test/e2e/cases/test_volume_readonly.py (1 case) ``` 5 cases total. Author also dropped 6 cases from this branch's earlier incarnation that were already committed alongside their respective fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is strictly net-new coverage. ## Test plan — run against current main Stack: local e2e, runner unchanged, no source edits. | Case | Result | Notes | |---|---|---| | `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS | exit-code passthrough across CLI processes | | `test_detached_box_survives_cli_exit_and_is_reusable` |⚠️ XFAIL (strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then hits the stdout-drop race that #563 fixes — marker drops when #563 lands | | `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id → typed `Exception` (not 5xx, not silent) | | `test_reattach_after_original_completes` |⚠️ XFAIL (strict) | same stdout-drop race (#563) on the original exec's `out=='first-output'` assertion | | `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box created with `volumes=[(host_dir, "/mnt/ro", True)]` reports `MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is untouched — REST silently dropped the host path | The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the markers come off, and the suite is 5/5 green. No additional fix work is needed in this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end tests verifying CLI detach survival and box reusability across fresh CLI processes, including exit code propagation tests. * Added end-to-end tests for SDK reattach functionality, validating session state after execution completion. * Added end-to-end test confirming proper host bind mount handling during REST execution. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug
Go SDK's
box.Exec/Cmd.Run/Execution.Waitreturns with stdout chunks still in flight — the caller reads a buffered Stdout sink before the trailing stream callbacks have delivered them.test_p0_6_exec_stdout_race.py).Fix (Go-only, no new API, no C FFI changes)
Mirror
os/exec.Cmd.Wait'sProcess.Wait→awaitGoroutinessequence insideExecution.Wait. BoxLite's streams are always pushed into a user-suppliedWriter/ callback — there is noStdoutPipe-style user-read pipe — so everyExecutionISos/exec's io.Writer case, andWaitis the single terminal that must guarantee output completeness on return.Execution.Waite.reap(ctx)) → block on<-streamState.drained(theawaitGoroutinesanalog) → return. Non-cancelable by ctx during drain (os/exec parity); only runtime shutdown breaks it.exit_pumpgateson_exiton every stream pump'sdone_tx. Go-sidedeliverExitcloses the privateexecutionStreamState.drainedchannel. So by the timeWaitunblocks, everyOnStdout/OnStderrcallback (and everyio.Writer.Write) has fired.Cmd.Run/box.ExecExecution.Wait— no extra drain. The strong contract lives in one place.C FFI untouched.
Two-side verification
sdks/go/exec_integration_test.go::TestIntegrationExecStdoutRacerunsecho marker-line20× viaCmd.Runwithout the&& sleep 0.1drainPad workaround that the existingTestIntegrationExecEnvWorkingDirTimeoutuses.Execution.Waitdrain"", FAIL§5 no-regression suite green:
HandleDeletedOnExit,NoDoubleDeleteRaceDuringClose,AbandonAsync*(×6),DrainAndDelete*(×3),ClaimOrFreePayload_*(×2).E2E (
test_p0_6_exec_stdout_race.py) passes both sides — the Python path goes through the runner'sstreamBus, which buffers stdout in backlog regardless of close ordering, so it doesn't surface the race. The Go SDK direct test path does.Supersedes
#685 — same bug, C FFI
streams_pendingapproach (couples Wait with streams at FFI). Closed in favor of this Go-only fix.🤖 Generated with Claude Code