Skip to content

test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix)#681

Closed
G4614 wants to merge 6 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize
Closed

test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix)#681
G4614 wants to merge 6 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

First batch of converting single-machine SDK tests into e2e (SDK → API → Runner → VM) form. Adds 21 new e2e cases under sdks/python/tests/e2e/ exercising contracts that local-FFI tests can't reach. Cases that pin a known production bug are xfail(strict=True) with a file:line pointer in the reason, so a follow-up fix-PR gets a green test (xpass-strict trips the marker).

Also bundles two adjacent fixes the new tests surfaced:

  • runner — exec error mapping now classifies SDK Stopped / NotFound / InvalidState instead of leaking 500.
  • FFI — drains stdout/stderr pumps before pushing the Wait terminal event (race fix), with deterministic ordering tests + rustfmt.

Cases shipped (21 new)

File Cases xfail Topic
test_error_code_mapping.py 10 4 BoxliteError → HTTP mapping (src/shared/src/errors.rs:198-280)
test_quota_enforcement.py 5 5 API create-boundary per-sandbox quota
test_runner_concurrency.py 6 1 Cross-process state-machine races

xfail (bugs pinned, will xpass-strict once fixed)

  • cpus=0 / memory_mib=-1 silently accepted — DTO @Min + ValidationPipe transform interaction (apps/api/src/main.ts:66-67)
  • exec /nonexistent → 500 instead of 422 — Rust spawn_failed misclassified as ErrInternal
  • cpus=999 silently clamped — no @Max, no quota lookup in createFromSnapshot (apps/api/src/sandbox/services/sandbox.service.ts)
  • 5× quota cases — shared root cause (no per-sandbox quota enforcement at API create boundary)
  • exec-after-box-removed — 500 leak via runner's spawn-against-soft-deleted-box path (distinct from the Stopped→500 leak the runner fix in this PR closes)

Side fix

  • sdks/node/tests/e2e/e2e_basic.ts — import path from \"../..\"\"../../lib/index.ts\". Old path went through sdks/node/package.json's main:dist/index.js which doesn't exist on dev checkouts (no `yarn build` in e2e bootstrap). tsx now loads .ts source directly, so test_node_entry.py passes without a separate Node build step.

Test plan

  • `make test:e2e` against local stack — 43 passed, 10 xfailed in 37.05s
  • FFI ordering tests run green (`test_drain_before_terminal*`)
  • Re-run after CI brings up its own stack
  • Follow-up PRs flip the 10 xfails to xpass-strict as bugs land

🤖 Generated with Claude Code

G4614 and others added 6 commits June 8, 2026 14:25
execution_wait's spawned task pushed its Wait event into the runtime
queue as soon as wait_on_clone returned, racing the stdout/stderr
pumps that may still have in-flight chunks to push. The wait gRPC
reply can return ahead of the still-flushing attach stream, so the
Go SDK's box.Exec observes wait completion and returns with an empty
stdout buffer — callbacks fire moments later into a struct nobody
is reading anymore. On this host: ~36–52% loss per fast command.

Replace the per-stream oneshot Vec used only by exit_pump with a
shared streams_pending: Arc<AtomicI32> + streams_done: Arc<Notify>
that both exit_pump AND the wait task await before pushing their
terminal events. register_stdout/register_stderr bump the counter
pre-spawn; finish_stream notify_waiters on the 1→0 transition.
execution_free clamps the counter to 0 and notifies after aborting
stream pumps so any wait task parked in await_streams_drained
doesn't deadlock on the user's Close().

TestSecurityReadonlyVolumeRemountBypass (Go SDK) now passes; the
50×4 race repro drops from 18–26/50 stdout loss to 0/50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test arglists for stdout_pump and stderr_pump exceeded
max-width after threading streams_pending + streams_done through
their signatures. rustfmt wants the block layout; no behaviour
change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
boxlite-ai#563 fixed the wait/exit terminal events racing the stdout/stderr pumps but
shipped no regression test — the existing exec unit tests pass with or without
the drain barrier, so a revert would go unnoticed in CI.

Add two guards that pin the racy mid-state (a stream pump registered but not
finished, pending=1) and assert the terminal task is gated on the drain,
rather than reproducing the timing race:

- exit_event_waits_for_pending_stdout_drain: drives the real exit_pump (empty
  execution → wait errors, but drain+push runs); asserts Exit is not queued
  while pending>0, then that stdout precedes Exit after finish_stream.
- wait_event_waits_for_pending_stdout_drain: same for the inline wait task via
  boxlite_execution_wait (sync test — empty_handle owns a Runtime that can't be
  dropped in an async context; drives async via the handle's block_on).

Verified each flips red when its await_streams_drained line (854 / 525) is
removed, and green when restored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 centralized e2e cases under scripts/test/e2e/cases/. Per maintainer
feedback, each SDK's e2e cases should sit next to that SDK's source, so the
cases stay with the code they regression-guard:

  sdks/python/tests/e2e/   — 11 Python-SDK cases + conftest.py
  sdks/c/tests/e2e/        — C-SDK subprocess driver + e2e_basic.c
  sdks/go/tests/e2e/       — Go-SDK subprocess driver + e2e_basic.go
  sdks/node/tests/e2e/     — Node-SDK subprocess driver + e2e_basic.ts
  src/cli/tests/e2e/       — CLI binary subprocess driver

Stack-level infrastructure (bootstrap.sh, fixture_setup.py, teardown.sh, lib/
path_verification.py, pytest.ini, run.sh, two_sided.sh) stays at
scripts/test/e2e/ — it's shared across all SDKs.

testpaths in pytest.ini doesn't resolve outside its rootdir, so the runner
(run.sh / `make test:e2e` / two_sided.sh) now passes each SDK's e2e dir to
pytest explicitly. The pytest.ini still owns asyncio_mode / log_cli / markers.

Each test's sys.path injection for `path_verification` was updated from
`parent.parent / "lib"` to `parents[4] / "scripts/test/e2e/lib"`; the depth
is uniform across all five new homes by design.

.github/workflows/e2e-stack.yml path filters extended to cover the new test
locations so PRs touching only test files still trigger the workflow.

Two non-rename changes are bundled and called out here for review honesty:

* `sdks/node/tests/e2e/e2e_basic.ts` — the original
  `from "../../../../../sdks/node"` (correct for the old location, 5 levels
  up) was rewritten to `from "../.."` (correct for the new sdks/node/tests/
  e2e/ depth). Prettier reformat by `make lint:fix` is bundled.

* `scripts/test/e2e/README.md` — meta-test description corrected from
  "tails /var/log/boxlite-api.log AND journalctl -u boxlite-runner" to
  "checks the runner journal only". The previous README overstated what
  `test_exec_reaches_runner_journal` actually does (it only asserts on
  the runner journal hit; the API-log read happens in
  api_log_seek/api_hits_for_box helpers but no current test calls them).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… mapping

PR boxlite-ai#678's e2e suite caught this: POST /v1/{prefix}/boxes/{id}/exec returned
HTTP 500 when the underlying Boxlite handle was Stopped (e.g. after
runtime.close() or box.stop() raced the exec), instead of the canonical
HTTP 409 from src/shared/src/errors.rs:198-280.

Two parts:

1. boxlite_exec.go:77 — replace the hardcoded http.StatusInternalServerError
   in BoxliteExec's Start() error path with classifyExecError(err). Every
   other handler in this file (signal/resize/kill/status) already does this;
   POST /exec was the one that got missed.

2. classifyExecError — extend the switch to recognize the SDK's typed
   errors (sdks/go/errors.go) the Start() path can tunnel through:

     boxlite.IsNotFound       -> 404
     boxlite.IsStopped        -> 409
     boxlite.IsInvalidState   -> 409

   Local pkg sentinels (ErrExecNotFound, ErrExecClosed etc.) still match
   first via errors.Is on the original switch.

Two-sided verified against sdks/python/tests/e2e/test_execution_shutdown.py
::test_exec_on_stopped_box_is_typed_error: revert the fix and the test FAILS
with 'HTTP 500 Internal Server Error ... boxlite: stopped (code=11)' in the
response body; restore the fix and it PASSES with 4xx + the typed code.

Import alias `sdkboxlite` because the runner-local
`github.com/boxlite-ai/runner/pkg/boxlite` already owns the bare `boxlite`
identifier in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 21 new cases under sdks/python/tests/e2e/ exercising contracts that
local-FFI tests can't reach. Cases that pin a known production bug are
xfail(strict=True) with a file:line pointer in the reason so a follow-up
fix-PR gets a green test (xpass-strict trips the marker).

Files:

* test_error_code_mapping.py — 10 cases pinning canonical
  BoxliteError → HTTP code mapping from src/shared/src/errors.rs:198-280.
  4 cases xfail:
    - cpus=0 / memory_mib=-1 silently accepted (DTO @min + ValidationPipe
      transform=true interaction at apps/api/src/main.ts:66-67)
    - exec /nonexistent → 500 instead of 422 (Rust spawn_failed misclassifies
      as ErrInternal not ErrExecution)
    - cpus=999 silently clamped (no @max + no quota lookup in
      apps/api/src/sandbox/services/sandbox.service.ts::createFromSnapshot)

* test_quota_enforcement.py — 5 cases, all xfail under module-level
  pytestmark with shared root cause (no per-sandbox quota enforcement at
  the API create boundary).

* test_runner_concurrency.py — 6 cases for cross-process state machine
  races: two concurrent execs on same box (PASSES — proves no exec_manager
  map corruption), parallel box creates (PASSES), exec-after-box-removed
  (XFAIL — 500 leak via runner's spawn-against-soft-deleted-box path,
  distinct from the Stopped→500 leak the previous commit fixed),
  box-delete-during-running-exec (PASSES with the racy-400 path tolerated),
  many sequential execs (PASSES — guards the boxlite-ai#563-class regression
  on a different call shape than test_p0_6), double-stop is idempotent
  or typed-409 (PASSES — accepts the runner's "state change in progress"
  400 as legitimate race protection).

Side fix:

* sdks/node/tests/e2e/e2e_basic.ts — change import from "../.." to
  "../../lib/index.ts". The "../.." form resolved through sdks/node/
  package.json's main:dist/index.js, which doesn't exist on dev checkouts
  (no `yarn build` in the e2e bootstrap). tsx loads the .ts source
  directly via the new path, so test_node_entry.py now PASSES without
  requiring a separate Node build step.

End-to-end results against the local stack (with the runner fix from the
previous commit applied):
  43 passed, 10 xfailed in 37.05s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4fba0887-7f01-4213-a2e7-6e518e5e02b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a
fresh location that doesn't match the repo's existing per-SDK test
layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/,
sdks/c/tests/, sdks/go/*_test.go).

This PR moves the cases to match. Driver scripts stay in
scripts/test/e2e/ (cross-SDK infra); test files now live next to
the SDK they exercise:

  scripts/test/e2e/cases/test_c_entry.py    → sdks/c/tests/e2e/
  scripts/test/e2e/cases/test_go_entry.py   → sdks/go/tests/e2e/
  scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/
  scripts/test/e2e/cases/test_cli_entry.py  → src/cli/tests/e2e/
  scripts/test/e2e/cases/test_*.py (Python pytest)
                                            → sdks/python/tests/e2e/
  scripts/test/e2e/cases/conftest.py        → sdks/python/tests/e2e/
  scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}
                                            → sdks/{c,go,node}/tests/e2e/

Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh,
two_sided.sh, fixture_setup.py, pytest.ini, README.md.

Supporting wiring (no test-logic change):
  - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args
    (testpaths in pytest.ini can't resolve outside its rootdir)
  - pytest.ini: comment explaining the testpaths omission
  - .github/workflows/e2e-stack.yml: paths filter expanded to all
    new e2e locations
  - README.md: layout map updated

Pure rename + supporting wiring. Zero new test logic, zero source
changes. Originally part of boxlite-ai#681 which also bundled 21 new test
cases + 2 fixes; this split-out covers reorganize only — the rest
ships in follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a
fresh location that doesn't match the repo's existing per-SDK test
layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/,
sdks/c/tests/, sdks/go/*_test.go).

This PR moves the cases to match. Driver scripts stay in
scripts/test/e2e/ (cross-SDK infra); test files now live next to
the SDK they exercise:

  scripts/test/e2e/cases/test_c_entry.py    → sdks/c/tests/e2e/
  scripts/test/e2e/cases/test_go_entry.py   → sdks/go/tests/e2e/
  scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/
  scripts/test/e2e/cases/test_cli_entry.py  → src/cli/tests/e2e/
  scripts/test/e2e/cases/test_*.py (Python pytest)
                                            → sdks/python/tests/e2e/
  scripts/test/e2e/cases/conftest.py        → sdks/python/tests/e2e/
  scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}
                                            → sdks/{c,go,node}/tests/e2e/

Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh,
two_sided.sh, fixture_setup.py, pytest.ini, README.md.

Supporting wiring (forced by the renames):
  - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args
    (testpaths in pytest.ini can't resolve outside its rootdir)
  - pytest.ini: comment explaining the testpaths omission
  - .github/workflows/e2e-stack.yml: paths filter expanded to all
    new e2e locations
  - README.md: layout map updated

Two collisions the rename exposed (both fixed in this PR):

  1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/
     and would import the new e2e/ subdir, whose conftest needs
     Python 3.11+ tomllib and a live API stack the unit-test
     workflow doesn't provide. Fix: norecursedirs = e2e in
     sdks/python/pytest.ini. The e2e runner passes the dir
     explicitly on the CLI which bypasses norecursedirs.

  2. pytest's default --import-mode=prepend puts each test file's
     rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the
     rootpath becomes sdks/python/, which then shadows the installed
     boxlite wheel with the source-tree stub package
     (sdks/python/boxlite/__init__.py — missing the compiled .so so
     Rust-bound symbols like BoxliteRestOptions are absent). Every
     fixture using boxlite.BoxliteRestOptions failed at setup with
     AttributeError. Fix: --import-mode=importlib in
     scripts/test/e2e/pytest.ini (modern mode, no sys.path
     injection). Tests that previously did `from conftest import
     drain` couldn't keep doing that under importlib mode (conftest
     no longer on sys.path) — extracted drain/collect_stream/
     stdout_line_count from conftest into
     scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification
     which already used the same lib/ + sys.path.insert pattern),
     and updated the 8 test files' imports.

Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built
boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed
(the 7 are real production bugs the suite is designed to catch:
PR-C fixes runner exec error mapping, PR-D fixes FFI drain race).
Before this PR's collision fix: 0 passed, 32 fixture-setup errors.

Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2
fixes; this split-out covers reorganize only — the rest ships in
follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614

G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favor of the 4-PR split: #682#683#684#685. Same content, cleaner per-layer review surface.

@G4614 G4614 closed this Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant