fix(go-sdk): fold stream drain into Execution.Wait (os/exec io.Writer case) by G4614 · Pull Request #563 · boxlite-ai/boxlite

G4614 · 2026-05-20T13:29:29Z

Bug

Go SDK's box.Exec / Cmd.Run / Execution.Wait returns with stdout chunks still in flight — the caller reads a buffered Stdout sink before the trailing stream callbacks have delivered them.

Stock pre-fix: 18-26 / 50 truncated (also test_p0_6_exec_stdout_race.py).
This commit: 50/50 complete.

Fix (Go-only, no new API, no C FFI changes)

Mirror os/exec.Cmd.Wait's Process.Wait → awaitGoroutines sequence inside Execution.Wait. BoxLite's streams are always pushed into a user-supplied Writer / callback — there is no StdoutPipe-style user-read pipe — so every Execution IS os/exec's io.Writer case, and Wait is the single terminal that must guarantee output completeness on return.

Layer	Behavior
`Execution.Wait`	Reap (`e.reap(ctx)`) → block on `<-streamState.drained` (the `awaitGoroutines` analog) → return. Non-cancelable by ctx during drain (os/exec parity); only runtime shutdown breaks it.
Callback ordering	C-side `exit_pump` gates `on_exit` on every stream pump's `done_tx`. Go-side `deliverExit` closes the private `executionStreamState.drained` channel. So by the time `Wait` unblocks, every `OnStdout` / `OnStderr` callback (and every `io.Writer.Write`) has fired.
`Cmd.Run` / `box.Exec`	Just call `Execution.Wait` — no extra drain. The strong contract lives in one place.

C FFI untouched.

Two-side verification

sdks/go/exec_integration_test.go::TestIntegrationExecStdoutRace runs echo marker-line 20× via Cmd.Run without the && sleep 0.1 drainPad workaround that the existing TestIntegrationExecEnvWorkingDirTimeout uses.

Side	`Execution.Wait` drain	Result
A	removed (Wait = pure reap)	4/20 truncated to `""`, FAIL
B (this PR)	present	20/20 complete, PASS

§5 no-regression suite green: HandleDeletedOnExit, NoDoubleDeleteRaceDuringClose, AbandonAsync* (×6), DrainAndDelete* (×3), ClaimOrFreePayload_* (×2).

E2E (test_p0_6_exec_stdout_race.py) passes both sides — the Python path goes through the runner's streamBus, which buffers stdout in backlog regardless of close ordering, so it doesn't surface the race. The Go SDK direct test path does.

Supersedes

#685 — same bug, C FFI streams_pending approach (couples Wait with streams at FFI). Closed in favor of this Go-only fix.

🤖 Generated with Claude Code

boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A fast command over the cloud (Go runner) could return exit 0 with empty stdout. When the WS attach drops before the runner flushes output (a proxy / high-latency cut), attach_ws_pump probed GET /executions/{id}, saw "completed", and emitted the exit code WITHOUT re-attaching — dropping the stdout still sitting in the runner's replay backlog. Local `boxlite serve` masked it (low latency, the exit frame always arrives in-band). On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff) so the runner replays its backlog and we leave via its authoritative exit frame. Bounded to a single attempt with fallback to the probed exit code, so a runner that never sends a closing exit frame can't hang the pump. Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path, so it reproduced on both main and boxlite-ai#563. Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives the WS pump against a mock that cuts the first attach, reports completed/0, and serves the backlog only on re-attach. Without the fix it observes empty stdout with a clean exit; with the fix it drains "hello\n". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A fast command over the cloud (Go runner) could return exit 0 with empty stdout. When the WS attach drops before the runner flushes output (a proxy / high-latency cut), attach_ws_pump probed GET /executions/{id}, saw "completed", and emitted the exit code WITHOUT re-attaching — dropping the stdout still sitting in the runner's replay backlog. Local `boxlite serve` masked it (low latency, the exit frame always arrives in-band). On a disconnect-then-Terminal probe, re-attach once (immediately, no backoff) so the runner replays its backlog and we leave via its authoritative exit frame. Bounded to a single attempt with fallback to the probed exit code, so a runner that never sends a closing exit frame can't hang the pump. Independent of boxlite-ai#563 (an sdks/c FFI drain fix): this is the REST client path, so it reproduced on both main and boxlite-ai#563. Reproducer: ws_terminal_probe_after_cut_must_not_drop_buffered_stdout drives the WS pump against a mock that cuts the first attach, reports completed/0, and serves the backlog only on re-attach. Without the fix it observes empty stdout with a clean exit; with the fix it drains "hello\n". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lifts 9 existing make test:integration test files (Rust + Python) into the e2e suite where they exercise the real production chain (SDK→API→Runner→VM) instead of the local PyO3/FFI path. Existing make test:integration uses Boxlite::default() / Boxlite.default(), which never touches NestJS API or boxlite-runner. Bugs in the proxy controller, REST wire format, or runner CGO won't break those tests even though they're broken in production. These ports run the same behavioural contracts through REST so the regression coverage tracks the actual production deployment. 9 new files in scripts/test/e2e/cases/: test_lifecycle.py — create / list / get_info / remove CRUD (from src/boxlite/tests/lifecycle.rs) test_exec_options.py — cwd, env, tty exit-code surface (from src/boxlite/tests/exec_options.rs) test_execution_shutdown.py — wait resolves after box.stop(); exec on stopped box must NOT 5xx (from src/boxlite/tests/execution_shutdown.rs) test_shutdown.py — close() idempotent; second REST runtime shares the same world view (from src/boxlite/tests/shutdown.rs) test_exec_unit_shape.py — round-trip exec result shape + types (from sdks/python/tests/test_exec.py) test_exec_timeout.py — timeout kills a sleeping process and a SIGTERM-ignoring process (from sdks/python/tests/test_exec_timeout_sigalrm.py) test_box_management.py — named box, list_info round-trip, env-var propagation client→guest (from sdks/python/tests/test_box_management.py) test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse on non-TTY (from sdks/python/tests/test_resize_tty.py) test_errors.py — unknown image surfaces typed error; bad bearer token → 401/403, not 500 (from sdks/python/tests/test_errors.py) Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration (some Rust cases use it). Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure (test_exec_on_stopped_box_is_typed_error) is a real production bug — the API surfaces "Handle invalidated after stop()" with HTTP 500 + code=11 instead of mapping it to a typed 4xx. Test stays failing until that mapping fix lands; flipping to xfail would defeat the point of the regression coverage.

The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.

…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).

Adds `scripts/test/e2e/` — a regression suite that exercises the **SDK → API → boxlite-runner → libkrun VM** chain, complementing `make test:integration` which runs only against the local PyO3/FFI path and therefore can't catch bugs that live in the REST proxy, runner CGO, or API error mapping. ## Test plan | observed | pre (without this PR) | post | |---|---|---| | coverage of REST → API → runner → VM chain | none — `make test:integration:*` constructs `Boxlite.default()` (local FFI), never opens an HTTP connection, never reaches the NestJS API or `boxlite-runner` | `make test:e2e` runs 25 pytest cases, every box created via `Boxlite.rest(opts)` against `http://localhost:3000/api` | | #563 (exec-stdout drop) detection | regression invisible to CI; `make test:integration:python` green even on stock 0.9.5 runner | `test_p0_6_exec_stdout_race` deterministically FAILS at ≥90% loss against stock runner, PASS at 0% against a runner rebuilt with #563 | | path-bypass detection | tests could silently fall back to local FFI and still pass | autouse fixture `verify_runner_saw_all_boxes` snapshots `journalctl -u boxlite-runner` per-test and asserts every box id created via `rt.create` appears in the runner journal; sanity-verified by injecting a fake box id which fires the teardown assertion with a clear path-bypass message | | bootstrap surface | not present; e2e wasn't ramped on | `scripts/test/e2e/bootstrap.sh` is idempotent and self-contained — installs postgres, redis, docker registry, Node.js 22 (NodeSource), yarn (corepack), python3-pip, the boxlite-runner release binary, systemd units, and writes `/etc/boxlite-api.env`. No AWS credentials, no S3 bucket, no IAM role required — `S3_ENDPOINT=""` makes `VolumeManager.constructor` early-return at `apps/api/src/sandbox/managers/volume.manager.ts:47` | | local-dev ergonomics | tests scattered across personal repos | `make test:e2e:setup` (bootstrap + fixture data) and `make test:e2e` (pytest); `make test:e2e:two-sided` takes `PR_REF=<branch>` and verifies a candidate fix removes the bug | | CI surface | none | `.github/workflows/e2e-stack.yml` triggers on push/PR touching `sdks/c/src/exec/**`, `src/boxlite/src/rest/**`, `apps/runner/**`, `apps/api/src/**`; needs a self-hosted runner labeled `kvm` because GitHub-hosted runners have no `/dev/kvm` | ### Cases shipped | group | files | source migrated from | |---|---|---| | meta path verification | `test_path_verification.py` (2 cases) | new — proves SDK is REST-mode against `:3000` and the runner journal saw the box id | | #563 regression | `test_p0_6_exec_stdout_race.py` (1 case, 5 box × 2 exec rounds) | new — race reproducer | | `make test:integration:*` ports | 9 files, 22 cases total | `src/boxlite/tests/{lifecycle,exec_options,execution_shutdown,shutdown}.rs` + `sdks/python/tests/{test_exec,test_exec_timeout_sigalrm,test_box_management,test_resize_tty,test_errors}.py` — all rewritten to use `Boxlite.rest()` | ### Local results (this branch checked out clean against current main) | runner build | result | |---|---| | stock 0.9.5 release | 23 PASS / 2 FAIL — `test_p0_6_exec_stdout_race` (race) + `test_exec_on_stopped_box_is_typed_error` (real API/runner mapping bug, see body below) | | rebuilt with #563 cherry-picked | 24 PASS / 1 FAIL — only the mapping bug remains | ### `test_exec_on_stopped_box_is_typed_error` — the lone FAIL is real This case asserts that POST `/exec` on a stopped box returns a 4xx-class status, not 5xx. The current runner code at `apps/runner/pkg/api/controllers/boxlite_exec.go:77` writes `http.StatusInternalServerError` unconditionally when `execManager.Start` fails: ```go ctx.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("exec failed: %s", err)}) ``` Every other handler in the same file (signal / resize / kill / status) routes through `classifyExecError(err)` instead, which already maps `ErrExecClosed` / `ErrExecReaping` to 409. `POST /exec` was missed, and `classifyExecError` itself never gained a case for `boxlite.ErrStopped`. Rust side `src/shared/src/errors.rs:165` already maps `BoxliteError::Stopped` to 409, so the bug is purely the Go runner not propagating that. This test stays failing until that two-line fix lands; flipping it to `xfail` would defeat the regression point. ### Branch state | | value | |---|---| | base | `main` (a438d6b — origin/main HEAD) | | commits | 6 (all `test(e2e):*`, no production code touched) | | files | 13 new + 2 small additions (`make/test.mk`, `make/help.mk`) | | net lines | +1170 |

Existing make test:integration:* uses the local PyO3/FFI path (`Boxlite.default()`) and bypasses both the NestJS API and boxlite-runner. Bugs that surface only on the REST → API → runner chain — e.g. boxlite-ai#563's exec-stdout drop, boxlite-ai#627's attach re-drain — leak into production with the unit + integration suites all green. This suite always uses `Boxlite.rest(opts)` against the local API and proves the path on every run. Layout (scripts/test/e2e/): bootstrap.sh Install PG, Redis, registry, mountpoint-s3, runner binary, systemd units. Idempotent. fixture_setup.py Register alpine:3.23 + ubuntu:22.04 snapshots (waits for active), patch admin org quota, write [profiles.p1] in ~/.boxlite/credentials.toml. Idempotent. cases/conftest.py rt / image / box fixtures — REST-only. cases/test_path_verification.py Meta-test: asserts the runtime is REST-mode against :3000, and one round-trip exec leaves the runner journal with the box id. Catches the case where SDK silently degrades to local FFI or talks to the wrong endpoint. cases/test_p0_6_exec_stdout_race.py First real regression case — runs N boxes × 2 execs and asserts loss rate ≤ 5%. Against stock 0.9.5 runner: ~80–90% loss (FAIL). Against a runner rebuilt with boxlite-ai#563: 0% loss (PASS). lib/path_verification.py Helpers used by the meta-test. two_sided.sh Builds runner from MAIN_REF, runs pytest (expect FAIL=1), builds from PR_REF, runs pytest (expect PASS=0). Only verdict 1+0 is "PR removes the bug + test detects it". Restores original binary on any exit. Make targets (make/test.mk): make test:e2e:setup Bootstrap services + fixture data. make test:e2e Run the suite. make test:e2e:two-sided PR_REF=<branch> required. CI workflow (.github/workflows/e2e-stack.yml): Triggers on push/PR touching sdks/c/src/exec/, src/boxlite/src/rest/, apps/runner/, apps/api/src/. Requires a self-hosted runner labeled `kvm` because GitHub-hosted runners lack /dev/kvm and cannot run libkrun. Verified locally on alpine:3.23, py3.14, against both 0.9.5 stock runner (1 FAIL of 3) and a runner rebuilt from boxlite-ai#563 (3 PASS of 3).

Lifts 9 existing make test:integration test files (Rust + Python) into the e2e suite where they exercise the real production chain (SDK→API→Runner→VM) instead of the local PyO3/FFI path. Existing make test:integration uses Boxlite::default() / Boxlite.default(), which never touches NestJS API or boxlite-runner. Bugs in the proxy controller, REST wire format, or runner CGO won't break those tests even though they're broken in production. These ports run the same behavioural contracts through REST so the regression coverage tracks the actual production deployment. 9 new files in scripts/test/e2e/cases/: test_lifecycle.py — create / list / get_info / remove CRUD (from src/boxlite/tests/lifecycle.rs) test_exec_options.py — cwd, env, tty exit-code surface (from src/boxlite/tests/exec_options.rs) test_execution_shutdown.py — wait resolves after box.stop(); exec on stopped box must NOT 5xx (from src/boxlite/tests/execution_shutdown.rs) test_shutdown.py — close() idempotent; second REST runtime shares the same world view (from src/boxlite/tests/shutdown.rs) test_exec_unit_shape.py — round-trip exec result shape + types (from sdks/python/tests/test_exec.py) test_exec_timeout.py — timeout kills a sleeping process and a SIGTERM-ignoring process (from sdks/python/tests/test_exec_timeout_sigalrm.py) test_box_management.py — named box, list_info round-trip, env-var propagation client→guest (from sdks/python/tests/test_box_management.py) test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse on non-TTY (from sdks/python/tests/test_resize_tty.py) test_errors.py — unknown image surfaces typed error; bad bearer token → 401/403, not 500 (from sdks/python/tests/test_errors.py) Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration (some Rust cases use it). Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure (test_exec_on_stopped_box_is_typed_error) is a real production bug — the API surfaces "Handle invalidated after stop()" with HTTP 500 + code=11 instead of mapping it to a typed 4xx. Test stays failing until that mapping fix lands; flipping to xfail would defeat the point of the regression coverage.

The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.

…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).

Replaces ad-hoc `rt.create + try/finally + rt.remove` patterns in 8 of the 15 case files with two shared fixtures defined in conftest: - `box` — single-box tests; supports `@pytest.mark.image(tag)` override; auto-remove + explicit force-remove in finally (belt and braces). - `box_factory` — multi-box tests; tracks every id and force-removes all on teardown. `setdefault('auto_remove', True)` so callers can flip to False without a kwarg clash (needed for execution_shutdown tests that stop the box mid-flow). Three files keep their hand-rolled lifecycle because the option surface required would defeat the fixture (test_create_named_box uses the runtime-level `name=` arg; test_errors verifies a half-created box does not leak; test_close_is_idempotent builds its own throwaway runtime). Net pass/fail unchanged (26/6 vs prior 24/8 — small shifts come from the underlying boxlite-ai#563 race window, not the refactor): the same two real bugs (boxlite-ai#563 stdout drop, Stopped→500 mapping) are still caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ipe-style) Go SDK's box.Exec returns with stdout chunks still in flight — the caller reads the buffered Stdout sink before the trailing stream callbacks have delivered them. Stock 0.9.5: 18-26 / 50 truncated; post-fix: 50/50 complete. Split the os/exec Wait responsibility so the bare terminal stays a reap (StdoutPipe-style) and the buffered-sink wrappers compose the drain themselves: Execution.Wait : reap only. No `<-drained` block. Callback ordering contract still holds — C's exit_pump gates on_exit on every stream pump's done_tx, so OnStdout/OnStderr callbacks naturally fire before the executionStreamState.drained channel closes. Cmd.Run : reap (via Execution.Wait) + post-reap `<-drained` select. c.Stdout / c.Stderr are user-owned io.Writer sinks the caller reads immediately after Run returns, so completeness is guaranteed at the wrapper boundary, not inside the SDK terminal. box.Exec : same — owns bytesCollector sinks; same drain compose. Mirrors os/exec.Cmd's StdoutPipe-case Wait semantic at the Execution.Wait level (caller is responsible for ensuring output sinks have drained before reading them), while keeping the os/exec.Cmd.Run "output complete on return" contract at the Cmd.Run / box.Exec wrapper level. C FFI unchanged. Verification (sdks/go/exec_integration_test.go, TestIntegrationExecStdoutRace): Runs `echo marker-line` 20x via Cmd.Run without the `&& sleep 0.1` drainPad workaround that TestIntegrationExecEnvWorkingDirTimeout uses. Counts how many rounds got an empty stdout buffer. - Side A (Cmd.Run's post-Wait `<-drained` select removed): 8/20 truncated to "", FAIL. - Side B (this commit applied): 20/20 complete, PASS. §5 no-regression suite green: TestExecutionStreamState_HandleDeletedOnExit, TestHandleOwnership_NoDoubleDeleteRaceDuringClose, TestAbandonAsync* (×6), TestDrainAndDelete* (×3), TestClaimOrFreePayload_* (×2). E2E (test_p0_6_exec_stdout_race.py) passes both sides — the Python path goes through the runner's streamBus which buffers stdout in backlog regardless of close ordering, so it doesn't surface the race. The Go SDK direct test path does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

sdks/go/exec.go (2)
320-322: ⚡ Quick win

Document the stronger return contract on Box.Exec and Cmd.Run.

Execution.Wait now makes the contract split explicit, so the public wrapper docs should also say that they wait for buffered stdout/stderr delivery before returning. As per coding guidelines, "Write comprehensive docstrings for all public functions and classes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdks/go/exec.go` around lines 320 - 322, Update the public docstrings for
Box.Exec and Cmd.Run to state their stronger return contract: they wait for
buffered stdout/stderr delivery (i.e., stream drains) before returning, matching
the split made explicit by Execution.Wait; mention that these wrappers compose
reaping with stream-drain at the wrapper boundary so callers can rely on outputs
being flushed when the call returns.
Source: Coding guidelines

96-104: ⚡ Quick win

Fix the stale Wait/drain contract comments.

Both comments still describe Execution.Wait as the place that waits on or composes the drain barrier, but Wait now delegates straight to reap. Leaving this as-is makes the file self-contradictory and points maintainers at the wrong API contract.

Also applies to: 327-330
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdks/go/exec.go` around lines 96 - 104, Update the stale comment on the
drained chan to reflect the current contract: explain that Execution.Wait no
longer composes the drain barrier and instead delegates to reap, and that
drained is closed by deliverExit/on_exit to signal the drain goroutine (used by
reap) so callers using buffered Stdout/Stderr won't see truncated buffers; also
update the analogous comment near the other occurrence (the comment block around
drained at the later section) to match this same behavior and mention the
relationship between deliverExit, the drain goroutine, and reap rather than
Wait.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@sdks/go/exec.go`:
- Around line 201-205: The select that waits on
execution.streamState.drained/execution.closing must also respect the caller
context: update the wait logic in Execution.Wait (and the analogous block at the
other location) to include a case <-ctx.Done() so the wait returns with the
context error instead of blocking forever; ensure the returned error is
ctx.Err() (or wrapped) so callers receive cancellation/deadline information and
verify deliverExit still closes execution.streamState.drained on completion.

---

Nitpick comments:
In `@sdks/go/exec.go`:
- Around line 320-322: Update the public docstrings for Box.Exec and Cmd.Run to
state their stronger return contract: they wait for buffered stdout/stderr
delivery (i.e., stream drains) before returning, matching the split made
explicit by Execution.Wait; mention that these wrappers compose reaping with
stream-drain at the wrapper boundary so callers can rely on outputs being
flushed when the call returns.
- Around line 96-104: Update the stale comment on the drained chan to reflect
the current contract: explain that Execution.Wait no longer composes the drain
barrier and instead delegates to reap, and that drained is closed by
deliverExit/on_exit to signal the drain goroutine (used by reap) so callers
using buffered Stdout/Stderr won't see truncated buffers; also update the
analogous comment near the other occurrence (the comment block around drained at
the later section) to match this same behavior and mention the relationship
between deliverExit, the drain goroutine, and reap rather than Wait.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e908eafb-2046-43e0-840b-f8b17a724264

📥 Commits

Reviewing files that changed from the base of the PR and between e048443 and 6a87d30.

📒 Files selected for processing (2)

sdks/go/exec.go
sdks/go/exec_integration_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

sdks/go/exec_integration_test.go

coderabbitai · 2026-06-09T10:13:30Z

+	select {
+	case <-execution.streamState.drained:
+	case <-execution.closing:
+		return nil, ErrRuntimeClosed
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor ctx while waiting for stream drain.

Once Execution.Wait returns, both wrappers stop listening to ctx.Done(). If the final stdout/stderr delivery blocks in a caller-provided writer or callback, deliverExit never closes drained and Exec/Run can hang past the caller's deadline/cancellation.

💡 Suggested fix

select { case <-execution.streamState.drained: + case <-ctx.Done(): + return nil, ctx.Err() case <-execution.closing: return nil, ErrRuntimeClosed }

select { case <-execution.streamState.drained: + case <-ctx.Done(): + return ctx.Err() case <-execution.closing: return ErrRuntimeClosed }

Also applies to: 571-575

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdks/go/exec.go` around lines 201 - 205, The select that waits on execution.streamState.drained/execution.closing must also respect the caller context: update the wait logic in Execution.Wait (and the analogous block at the other location) to include a case <-ctx.Done() so the wait returns with the context error instead of blocking forever; ensure the returned error is ctx.Err() (or wrapped) so callers receive cancellation/deadline information and verify deliverExit still closes execution.streamState.drained on completion.

DorianZheng

LGTM

DorianZheng · 2026-06-09T10:49:10Z

clean, glad it's just go now.

small thing: the drain only happens in Cmd.Run and box.Exec, so Execution.Wait on its own still returns before the pumps finish. if someone passes their own Stdout writer and calls Wait directly, they still get a short read:

exec, _ := box.StartExecution(ctx, ExecutionOptions{Command: ..., Stdout: &buf})
exec.Wait(ctx)
buf.String() // still truncated

our streams are always pushed into a writer (theres no StdoutPipe to read yourself), so Wait is really the io.Writer case and should block until the bytes are delivered, the same way os/exec's Wait does. could you move the <-drained select into Execution.Wait right after the reap, and drop it from the two wrappers? that fixes the direct path too and gets rid of the duplicated select.

also the title still says fix(ffi) but theres no rust left in here now — fix(go-sdk) fits better.

Address Dorian's review on 6a87d30: the wrapper-only drain leaves the direct Execution.Wait path broken — a caller passing their own Stdout io.Writer and calling Wait directly still gets a truncated buffer. BoxLite's streams are always pushed into a Writer (no StdoutPipe-style user-read pipe), so every Execution IS the os/exec io.Writer case: Wait is the single terminal and must guarantee output completeness on return. Move the `<-drained` select into Execution.Wait right after the reap. Drop the duplicate selects from Cmd.Run and box.Exec — they already get the strong contract via Wait. Mirrors os/exec.Cmd.Wait's Process.Wait → awaitGoroutines sequence (src/os/exec/exec.go:989-995): one terminal, internally reaps then drains. The post-reap drain is non-cancelable by ctx (parity); only runtime shutdown breaks it, preserving the reap's exit code and overwriting err only if the reap had none. C FFI untouched. Drained signal still lives in executionStreamState.drained (closed by deliverExit when the C on_exit fires, which the C exit_pump gates on every stream pump's done_tx). Two-side verification (TestIntegrationExecStdoutRace via Cmd.Run): - Side A (Execution.Wait stripped to pure reap): 4/20 truncated, FAIL. - Side B (this commit): 20/20 complete, PASS. §5 no-regression suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After running the suite end-to-end: - test_detached_box_survives_cli_exit_and_is_reusable used `boxlite info <id>` for per-box state. The CLI's `info` is system- wide (version / runtime stats) — there's no per-box info command — so the call was always `unexpected argument` exit 2. Replaced step 3 with `boxlite ls`-row parsing + a state-keyword check. Marked xfail(strict=True) because step 3's exec hits the stdout-drop race that boxlite-ai#563 fixes (same as the reattach case below). - test_reattach_after_original_completes already broken by the same stdout race — short `echo X && exit 0` returns out=''. Marked xfail(strict=True) pointing at boxlite-ai#563. - test_readonly_volume_mount_flag_and_write_reject expected a host bind mount to surface inside the guest. The cloud runtime deliberately dropped host bind mounts (boxlite-ai#639 "remove host bind mounts; only managed volumes allowed"), so this case was testing a feature that doesn't exist by design. Flipped to the negative contract — REST callers passing host paths get them silently dropped, no /mnt/<x> ever surfaces. Pin: boxlite-ai#639's product direction. Net: 3 pass + 2 xfail(strict). When boxlite-ai#563 lands, both xfails flip xpass-strict and become regular passes (markers should be dropped at that point). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per review: the reap method introduced earlier added a private indirection that was only called from Wait. Inline its body so Wait has the full sequence in one place (boxlite_execution_wait kick, context/closing escape, then the stream-drain barrier) without adding a new method. Behaviour unchanged: ctx-cancel and runtime-closing during the process-wait still skip the drain barrier (same as before); successful exit still blocks on streamState.drained before returning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#710) add 5 e2e cases pinning REST contracts not covered today ## Coverage gaps Three classes of behaviour run through the SDK → API → Runner → VM chain but had no e2e pin on the chain itself; bugs in any one of them would silently regress through `make test:integration:*` (which only exercises local-FFI): ``` 1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits, does a fresh CLI process still see / exec the box? FFI side src/boxlite/tests/detach.rs, recovery.rs ✓ E2E side scripts/test/e2e/cases/ ✗ 2. Execution.attach / reattach contract — bogus exec_id should be a typed client error; reattach to a completed exec should return a usable handle. Runner side apps/runner/.../boxlite_exec_attach_test.go ✓ SDK <-> API end-to-end ✗ 3. host bind mount via REST — the cloud runtime intentionally dropped host bind mounts (#639); REST callers passing host paths must silently no-op, no /mnt/<x> in the guest. FFI surface src/boxlite/tests/mount_security.rs ✓ REST contract ✗ ``` ## Cases shipped ``` scripts/test/e2e/cases/test_cli_detach_recovery.py (2 cases) scripts/test/e2e/cases/test_exec_attach.py (2 cases) scripts/test/e2e/cases/test_volume_readonly.py (1 case) ``` 5 cases total. Author also dropped 6 cases from this branch's earlier incarnation that were already committed alongside their respective fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is strictly net-new coverage. ## Test plan — run against current main Stack: local e2e, runner unchanged, no source edits. | Case | Result | Notes | |---|---|---| | `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS | exit-code passthrough across CLI processes | | `test_detached_box_survives_cli_exit_and_is_reusable` | ⚠️ XFAIL (strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then hits the stdout-drop race that #563 fixes — marker drops when #563 lands | | `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id → typed `Exception` (not 5xx, not silent) | | `test_reattach_after_original_completes` | ⚠️ XFAIL (strict) | same stdout-drop race (#563) on the original exec's `out=='first-output'` assertion | | `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box created with `volumes=[(host_dir, "/mnt/ro", True)]` reports `MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is untouched — REST silently dropped the host path | The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the markers come off, and the suite is 5/5 green. No additional fix work is needed in this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Tests** * Added end-to-end tests verifying CLI detach survival and box reusability across fresh CLI processes, including exit code propagation tests. * Added end-to-end tests for SDK reattach functionality, validating session state after execution completion. * Added end-to-end test confirming proper host bind mount handling during REST execution.  --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 marked this pull request as draft May 28, 2026 11:17

G4614 force-pushed the fix/exec-stdout-race branch from 9007541 to e647bf3 Compare May 28, 2026 11:42

G4614 marked this pull request as ready for review May 28, 2026 11:50

G4614 marked this pull request as draft May 29, 2026 06:53

This was referenced May 29, 2026

fix(rest): re-attach to drain backlog before emitting terminal result G4614/boxlite#3

Closed

fix(rest): re-attach to drain backlog before emitting terminal result #627

Open

G4614 marked this pull request as ready for review May 29, 2026 10:11

G4614 force-pushed the fix/exec-stdout-race branch from e647bf3 to 63f4adf Compare June 1, 2026 03:56

G4614 force-pushed the fix/exec-stdout-race branch from 63f4adf to 8a22353 Compare June 1, 2026 04:22

G4614 mentioned this pull request Jun 8, 2026

test(e2e): SDK→API→Runner→VM regression suite #678

Merged

DorianZheng enabled auto-merge June 8, 2026 14:17

G4614 mentioned this pull request Jun 9, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event #685

Closed

auto-merge was automatically disabled June 9, 2026 05:13
Head branch was pushed to by a user without write access

G4614 force-pushed the fix/exec-stdout-race branch from 8a22353 to f11e469 Compare June 9, 2026 05:13

G4614 force-pushed the fix/exec-stdout-race branch 2 times, most recently from 7b672df to e9c884d Compare June 9, 2026 08:39

DorianZheng enabled auto-merge June 9, 2026 08:39

G4614 force-pushed the fix/exec-stdout-race branch from e9c884d to e048443 Compare June 9, 2026 09:46

G4614 force-pushed the fix/exec-stdout-race branch from e048443 to 6a87d30 Compare June 9, 2026 10:08

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

DorianZheng previously approved these changes Jun 9, 2026

View reviewed changes

Merge branch 'main' into fix/exec-stdout-race

8da266b

DorianZheng requested a review from a team as a code owner June 9, 2026 10:44

G4614 changed the title ~~fix(ffi): drain stream pumps before pushing Wait event~~ fix(go-sdk): drain at Cmd.Run / box.Exec, not Execution.Wait (StdoutPipe-style) Jun 9, 2026

DorianZheng closed this Jun 9, 2026

auto-merge was automatically disabled June 9, 2026 11:01
Pull request was closed

DorianZheng reopened this Jun 9, 2026

G4614 dismissed DorianZheng’s stale review via 8982265 June 9, 2026 11:04

G4614 enabled auto-merge June 9, 2026 11:11

G4614 changed the title ~~fix(go-sdk): drain at Cmd.Run / box.Exec, not Execution.Wait (StdoutPipe-style)~~ fix(go-sdk): fold stream drain into Execution.Wait (os/exec io.Writer case) Jun 9, 2026

G4614 requested a review from DorianZheng June 9, 2026 11:27

G4614 mentioned this pull request Jun 9, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event #705

Closed

DorianZheng mentioned this pull request Jun 9, 2026

ci(codeql): switch to advanced setup so fork PRs are scanned #708

Merged

G4614 mentioned this pull request Jun 10, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event (re-open of #685/#705) #714

Closed

G4614 mentioned this pull request Jun 10, 2026

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases #710

Merged

DorianZheng approved these changes Jun 10, 2026

View reviewed changes

DorianZheng disabled auto-merge June 10, 2026 05:01

DorianZheng merged commit 2d6ae82 into boxlite-ai:main Jun 10, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(go-sdk): fold stream drain into Execution.Wait (os/exec io.Writer case)#563

fix(go-sdk): fold stream drain into Execution.Wait (os/exec io.Writer case)#563
DorianZheng merged 4 commits into
boxlite-ai:mainfrom
G4614:fix/exec-stdout-race

G4614 commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Uh oh!

DorianZheng left a comment

Uh oh!

DorianZheng commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

G4614 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Fix (Go-only, no new API, no C FFI changes)

Two-side verification

Supersedes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

DorianZheng left a comment

Choose a reason for hiding this comment

Uh oh!

DorianZheng commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

G4614 commented May 20, 2026 •

edited

Loading