test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix) by G4614 · Pull Request #681 · boxlite-ai/boxlite

G4614 · 2026-06-09T03:39:05Z

Summary

First batch of converting single-machine SDK tests into e2e (SDK → API → Runner → VM) form. Adds 21 new e2e cases under sdks/python/tests/e2e/ exercising contracts that local-FFI tests can't reach. Cases that pin a known production bug are xfail(strict=True) with a file:line pointer in the reason, so a follow-up fix-PR gets a green test (xpass-strict trips the marker).

Also bundles two adjacent fixes the new tests surfaced:

runner — exec error mapping now classifies SDK Stopped / NotFound / InvalidState instead of leaking 500.
FFI — drains stdout/stderr pumps before pushing the Wait terminal event (race fix), with deterministic ordering tests + rustfmt.

Cases shipped (21 new)

File	Cases	xfail	Topic
`test_error_code_mapping.py`	10	4	`BoxliteError → HTTP` mapping (`src/shared/src/errors.rs:198-280`)
`test_quota_enforcement.py`	5	5	API create-boundary per-sandbox quota
`test_runner_concurrency.py`	6	1	Cross-process state-machine races

xfail (bugs pinned, will xpass-strict once fixed)

cpus=0 / memory_mib=-1 silently accepted — DTO @Min + ValidationPipe transform interaction (apps/api/src/main.ts:66-67)
exec /nonexistent → 500 instead of 422 — Rust spawn_failed misclassified as ErrInternal
cpus=999 silently clamped — no @Max, no quota lookup in createFromSnapshot (apps/api/src/sandbox/services/sandbox.service.ts)
5× quota cases — shared root cause (no per-sandbox quota enforcement at API create boundary)
exec-after-box-removed — 500 leak via runner's spawn-against-soft-deleted-box path (distinct from the Stopped→500 leak the runner fix in this PR closes)

Side fix

sdks/node/tests/e2e/e2e_basic.ts — import path from \"../..\" → \"../../lib/index.ts\". Old path went through sdks/node/package.json's main:dist/index.js which doesn't exist on dev checkouts (no `yarn build` in e2e bootstrap). tsx now loads .ts source directly, so test_node_entry.py passes without a separate Node build step.

Test plan

`make test:e2e` against local stack — 43 passed, 10 xfailed in 37.05s
FFI ordering tests run green (`test_drain_before_terminal*`)
Re-run after CI brings up its own stack
Follow-up PRs flip the 10 xfails to xpass-strict as bugs land

🤖 Generated with Claude Code

execution_wait's spawned task pushed its Wait event into the runtime queue as soon as wait_on_clone returned, racing the stdout/stderr pumps that may still have in-flight chunks to push. The wait gRPC reply can return ahead of the still-flushing attach stream, so the Go SDK's box.Exec observes wait completion and returns with an empty stdout buffer — callbacks fire moments later into a struct nobody is reading anymore. On this host: ~36–52% loss per fast command. Replace the per-stream oneshot Vec used only by exit_pump with a shared streams_pending: Arc<AtomicI32> + streams_done: Arc<Notify> that both exit_pump AND the wait task await before pushing their terminal events. register_stdout/register_stderr bump the counter pre-spawn; finish_stream notify_waiters on the 1→0 transition. execution_free clamps the counter to 0 and notifies after aborting stream pumps so any wait task parked in await_streams_drained doesn't deadlock on the user's Close(). TestSecurityReadonlyVolumeRemountBypass (Go SDK) now passes; the 50×4 race repro drops from 18–26/50 stdout loss to 0/50. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The test arglists for stdout_pump and stderr_pump exceeded max-width after threading streams_pending + streams_done through their signatures. rustfmt wants the block layout; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but shipped no regression test — the existing exec unit tests pass with or without the drain barrier, so a revert would go unnoticed in CI. Add two guards that pin the racy mid-state (a stream pump registered but not finished, pending=1) and assert the terminal task is gated on the drain, rather than reproducing the timing race: - exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty execution → wait errors, but drain+push runs); asserts Exit is not queued while pending>0, then that stdout precedes Exit after finish_stream. - wait_event_waits_for_pending_stdout_drain: same for the inline wait task via boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be dropped in an async context; drives async via the handle's block_on). Verified each flips red when its await_streams_drained line (854 / 525) is removed, and green when restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 centralized e2e cases under scripts/test/e2e/cases/. Per maintainer feedback, each SDK's e2e cases should sit next to that SDK's source, so the cases stay with the code they regression-guard: sdks/python/tests/e2e/ — 11 Python-SDK cases + conftest.py sdks/c/tests/e2e/ — C-SDK subprocess driver + e2e_basic.c sdks/go/tests/e2e/ — Go-SDK subprocess driver + e2e_basic.go sdks/node/tests/e2e/ — Node-SDK subprocess driver + e2e_basic.ts src/cli/tests/e2e/ — CLI binary subprocess driver Stack-level infrastructure (bootstrap.sh, fixture_setup.py, teardown.sh, lib/ path_verification.py, pytest.ini, run.sh, two_sided.sh) stays at scripts/test/e2e/ — it's shared across all SDKs. testpaths in pytest.ini doesn't resolve outside its rootdir, so the runner (run.sh / `make test:e2e` / two_sided.sh) now passes each SDK's e2e dir to pytest explicitly. The pytest.ini still owns asyncio_mode / log_cli / markers. Each test's sys.path injection for `path_verification` was updated from `parent.parent / "lib"` to `parents[4] / "scripts/test/e2e/lib"`; the depth is uniform across all five new homes by design. .github/workflows/e2e-stack.yml path filters extended to cover the new test locations so PRs touching only test files still trigger the workflow. Two non-rename changes are bundled and called out here for review honesty: * `sdks/node/tests/e2e/e2e_basic.ts` — the original `from "../../../../../sdks/node"` (correct for the old location, 5 levels up) was rewritten to `from "../.."` (correct for the new sdks/node/tests/ e2e/ depth). Prettier reformat by `make lint:fix` is bundled. * `scripts/test/e2e/README.md` — meta-test description corrected from "tails /var/log/boxlite-api.log AND journalctl -u boxlite-runner" to "checks the runner journal only". The previous README overstated what `test_exec_reaches_runner_journal` actually does (it only asserts on the runner journal hit; the API-log read happens in api_log_seek/api_hits_for_box helpers but no current test calls them). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… mapping PR boxlite-ai#678's e2e suite caught this: POST /v1/{prefix}/boxes/{id}/exec returned HTTP 500 when the underlying Boxlite handle was Stopped (e.g. after runtime.close() or box.stop() raced the exec), instead of the canonical HTTP 409 from src/shared/src/errors.rs:198-280. Two parts: 1. boxlite_exec.go:77 — replace the hardcoded http.StatusInternalServerError in BoxliteExec's Start() error path with classifyExecError(err). Every other handler in this file (signal/resize/kill/status) already does this; POST /exec was the one that got missed. 2. classifyExecError — extend the switch to recognize the SDK's typed errors (sdks/go/errors.go) the Start() path can tunnel through: boxlite.IsNotFound -> 404 boxlite.IsStopped -> 409 boxlite.IsInvalidState -> 409 Local pkg sentinels (ErrExecNotFound, ErrExecClosed etc.) still match first via errors.Is on the original switch. Two-sided verified against sdks/python/tests/e2e/test_execution_shutdown.py ::test_exec_on_stopped_box_is_typed_error: revert the fix and the test FAILS with 'HTTP 500 Internal Server Error ... boxlite: stopped (code=11)' in the response body; restore the fix and it PASSES with 4xx + the typed code. Import alias `sdkboxlite` because the runner-local `github.com/boxlite-ai/runner/pkg/boxlite` already owns the bare `boxlite` identifier in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@min

Adds 21 new cases under sdks/python/tests/e2e/ exercising contracts that local-FFI tests can't reach. Cases that pin a known production bug are xfail(strict=True) with a file:line pointer in the reason so a follow-up fix-PR gets a green test (xpass-strict trips the marker). Files: * test_error_code_mapping.py — 10 cases pinning canonical BoxliteError → HTTP code mapping from src/shared/src/errors.rs:198-280. 4 cases xfail: - cpus=0 / memory_mib=-1 silently accepted (DTO @min + ValidationPipe transform=true interaction at apps/api/src/main.ts:66-67) - exec /nonexistent → 500 instead of 422 (Rust spawn_failed misclassifies as ErrInternal not ErrExecution) - cpus=999 silently clamped (no @max + no quota lookup in apps/api/src/sandbox/services/sandbox.service.ts::createFromSnapshot) * test_quota_enforcement.py — 5 cases, all xfail under module-level pytestmark with shared root cause (no per-sandbox quota enforcement at the API create boundary). * test_runner_concurrency.py — 6 cases for cross-process state machine races: two concurrent execs on same box (PASSES — proves no exec_manager map corruption), parallel box creates (PASSES), exec-after-box-removed (XFAIL — 500 leak via runner's spawn-against-soft-deleted-box path, distinct from the Stopped→500 leak the previous commit fixed), box-delete-during-running-exec (PASSES with the racy-400 path tolerated), many sequential execs (PASSES — guards the boxlite-ai#563-class regression on a different call shape than test_p0_6), double-stop is idempotent or typed-409 (PASSES — accepts the runner's "state change in progress" 400 as legitimate race protection). Side fix: * sdks/node/tests/e2e/e2e_basic.ts — change import from "../.." to "../../lib/index.ts". The "../.." form resolved through sdks/node/ package.json's main:dist/index.js, which doesn't exist on dev checkouts (no `yarn build` in the e2e bootstrap). tsx loads the .ts source directly via the new path, so test_node_entry.py now PASSES without requiring a separate Node build step. End-to-end results against the local stack (with the runner fix from the previous commit applied): 43 passed, 10 xfailed in 37.05s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-09T03:39:13Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4fba0887-7f01-4213-a2e7-6e518e5e02b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (no test-logic change): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Pure rename + supporting wiring. Zero new test logic, zero source changes. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (forced by the renames): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Two collisions the rename exposed (both fixed in this PR): 1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/ and would import the new e2e/ subdir, whose conftest needs Python 3.11+ tomllib and a live API stack the unit-test workflow doesn't provide. Fix: norecursedirs = e2e in sdks/python/pytest.ini. The e2e runner passes the dir explicitly on the CLI which bypasses norecursedirs. 2. pytest's default --import-mode=prepend puts each test file's rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the rootpath becomes sdks/python/, which then shadows the installed boxlite wheel with the source-tree stub package (sdks/python/boxlite/__init__.py — missing the compiled .so so Rust-bound symbols like BoxliteRestOptions are absent). Every fixture using boxlite.BoxliteRestOptions failed at setup with AttributeError. Fix: --import-mode=importlib in scripts/test/e2e/pytest.ini (modern mode, no sys.path injection). Tests that previously did `from conftest import drain` couldn't keep doing that under importlib mode (conftest no longer on sys.path) — extracted drain/collect_stream/ stdout_line_count from conftest into scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification which already used the same lib/ + sys.path.insert pattern), and updated the 8 test files' imports. Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed (the 7 are real production bugs the suite is designed to catch: PR-C fixes runner exec error mapping, PR-D fixes FFI drain race). Before this PR's collision fix: 0 passed, 32 fixture-setup errors. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 · 2026-06-09T05:54:23Z

Closing in favor of the 4-PR split: #682 → #683 → #684 → #685. Same content, cleaner per-layer review surface.

G4614 and others added 6 commits June 8, 2026 14:25

G4614 mentioned this pull request Jun 9, 2026

fix(rest): classify exec spawn failures as 4xx, not 5xx #690

Open

G4614 closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix)#681

test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix)#681
G4614 wants to merge 6 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize

G4614 commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026

Review skipped

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

G4614 commented Jun 9, 2026

Summary

Cases shipped (21 new)

xfail (bugs pinned, will xpass-strict once fixed)

Side fix

Test plan

Uh oh!

coderabbitai Bot commented Jun 9, 2026

Review skipped

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant