test(e2e): move test cases to per-SDK tests/ folders by G4614 · Pull Request #682 · boxlite-ai/boxlite

G4614 · 2026-06-09T03:58:01Z

move test files to different SDK dirs as previous

What

PR #678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout. This PR moves them so each SDK's e2e tests live next to that SDK's other tests:

from	to
`scripts/test/e2e/cases/test_c_entry.py`	`sdks/c/tests/e2e/`
`scripts/test/e2e/cases/test_go_entry.py`	`sdks/go/tests/e2e/`
`scripts/test/e2e/cases/test_node_entry.py`	`sdks/node/tests/e2e/`
`scripts/test/e2e/cases/test_cli_entry.py`	`src/cli/tests/e2e/`
`scripts/test/e2e/cases/test_*.py` (Python pytest)	`sdks/python/tests/e2e/`
`scripts/test/e2e/cases/conftest.py`	`sdks/python/tests/e2e/`
`scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}`	`sdks/{c,go,node}/tests/e2e/`

Stays put in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md (cross-SDK orchestration, not per-SDK).

Supporting wiring (forced by the renames)

scripts/test/e2e/run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args. testpaths in pytest.ini can't resolve outside its rootdir.
scripts/test/e2e/pytest.ini: comment explaining the testpaths omission.
.github/workflows/e2e-stack.yml: paths: filter expanded to all new e2e locations (sdks/{c,go,node,python}/tests/e2e/**, src/cli/tests/e2e/**).
scripts/test/e2e/README.md: layout map updated.

Stack

	what	this PR
A	reorganize only (rename + wiring)	← you are here
B	21 new e2e cases (pin known bugs via `xfail(strict=True)`)	depends on A
C	runner exec error mapping fix + flip relevant B xfails	depends on B
D	FFI exec drain race fix + flip relevant B xfails	depends on B

Splitting #681 into 4 because the original bundled rename + new tests + 2 source fixes into one commit — hard to review, hard to revert any single piece without unwinding the rest.

Test plan

Result against local stack with the full A→D stack applied:

make test:e2e — 43 passed, 10 xfailed in 37.05s
FFI ordering tests run green (test_drain_before_terminal*)

Breakdown of the 43p/10xf (covers all four PRs in the stack):

File	Cases	xfail	Topic	Added by
`test_error_code_mapping.py`	10	4	`BoxliteError → HTTP` mapping (`src/shared/src/errors.rs:198-280`)	B (#683)
`test_quota_enforcement.py`	5	5	API create-boundary per-sandbox quota	B (#683)
`test_runner_concurrency.py`	6	1	Cross-process state-machine races	B (#683)
all existing relocated cases	22	0	unchanged behavior, just new paths	A (this PR)

xfail (bugs pinned, will xpass-strict once fixed):

cpus=0 / memory_mib=-1 silently accepted — DTO @Min + ValidationPipe transform interaction (apps/api/src/main.ts:66-67)
exec /nonexistent → 500 instead of 422 — Rust spawn_failed misclassified as ErrInternal
cpus=999 silently clamped — no @Max, no quota lookup in createFromSnapshot (apps/api/src/sandbox/services/sandbox.service.ts)
5× quota cases — shared root cause (no per-sandbox quota enforcement at API create boundary)
exec-after-box-removed — 500 leak via runner's spawn-against-soft-deleted-box path (distinct from the Stopped→500 leak C closes)

Known caveat

The pytest.ini-in-scripts/ + tests-in-sdks/ layout means cd sdks/python/tests/e2e && pytest (without going through run.sh) won't read the pytest.ini (asyncio_mode, markers, etc.). Always invoke via make test:e2e or scripts/test/e2e/run.sh. Cleaner would be to move pytest.ini to repo root so its rootdir matches — left as follow-up.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Reorganized E2E suite to run per-SDK end-to-end tests (C, Go, Node, Python, CLI) and updated the unified test runner.
- Introduced shared E2E helper utilities used across SDKs and adjusted tests to import them.
CI/Chores
- Expanded CI triggers so E2E workflows run for changes across SDK and test directories.
Documentation
- Updated E2E README with new layout, running instructions, and fixture/setup notes.

coderabbitai · 2026-06-09T03:58:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 98d981a4-b4d6-47d7-93cb-d068b6937bb8

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7f1f0 and 23ed4b8.

📒 Files selected for processing (1)

scripts/test/e2e/lib/e2e_helpers.py

📝 Walkthrough

Walkthrough

This PR centralizes shared E2E test utilities, switches the E2E runner to explicitly invoke per-SDK sdks/*/tests/e2e directories, updates pytest import behavior to avoid source-wheel shadowing, rewires SDK/CLI test entry imports to the centralized lib, and expands CI path filters to trigger on SDK e2e/source changes.

Changes

E2E Test Suite Infrastructure Reorganization

Layer / File(s)	Summary
E2E Helper Module Creation and Extraction `scripts/test/e2e/lib/e2e_helpers.py`, `scripts/test/e2e/pytest.ini`, `sdks/python/tests/e2e/conftest.py`	Creates shared utilities module with `collect_stream`, `drain`, and `stdout_line_count`; sets pytest `--import-mode=importlib`; removes local helper implementations from conftest and imports shared helpers from centralized lib.
Test Runner and Orchestration Refactoring `make/test.mk`, `scripts/test/e2e/run.sh`, `scripts/test/e2e/two_sided.sh`, `.github/workflows/e2e-stack.yml`, `scripts/test/e2e/README.md`	Updates `test:e2e` to call `run.sh`; `run.sh` runs `python3 -m pytest` with explicit per-SDK `tests/e2e` paths; two-sided harness uses `run.sh`; expands GitHub Actions path filters to include C/Go/Node/Python/CLI e2e and source paths; README updated for per-SDK layout and fixture setup.
Documentation and SDK Test Configuration `scripts/test/e2e/README.md`, `sdks/python/pytest.ini`	Rewrites suite docs to describe SDK-agnostic end-to-end path, updates fixture/profile instructions, and adds `norecursedirs = e2e` to SDK pytest config to avoid recursing into cross-SDK e2e tests.
C/Go/Node SDK E2E Test Rewiring `sdks/c/tests/e2e/`, `sdks/go/tests/e2e/`, `sdks/node/tests/e2e/*`, `src/cli/tests/e2e/test_cli_entry.py`	SDK and CLI e2e tests now reference local sibling `e2e_basic` sources, compute `REPO` consistently, and import path-verification helpers from centralized `scripts/test/e2e/lib`; Node smoke script uses local SDK build and improves cleanup handling.
Python E2E Test Helper Import Migration `sdks/python/tests/e2e/*`	All Python e2e tests updated to import `drain`, `collect_stream`, and `stdout_line_count` from `e2e_helpers` instead of local conftest; import wiring and minor formatting/whitespace changes applied without behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

boxlite-ai/boxlite#678: Introduces the cross-SDK E2E regression suite infrastructure that this PR reorganizes and refactors.

Poem

🐰 Helpers gathered, paths realigned,

Tests hop home to SDKs designed.
Shared code snug in one small burrow,
Runner calls each suite in a row.
Hooray — the e2e carrots grow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main objective of the pull request: relocating end-to-end test cases from a centralized scripts/test/e2e/cases/ directory into per-SDK tests/e2e/ folders, making tests colocate with their respective SDK implementations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (forced by the renames): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Two collisions the rename exposed (both fixed in this PR): 1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/ and would import the new e2e/ subdir, whose conftest needs Python 3.11+ tomllib and a live API stack the unit-test workflow doesn't provide. Fix: norecursedirs = e2e in sdks/python/pytest.ini. The e2e runner passes the dir explicitly on the CLI which bypasses norecursedirs. 2. pytest's default --import-mode=prepend puts each test file's rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the rootpath becomes sdks/python/, which then shadows the installed boxlite wheel with the source-tree stub package (sdks/python/boxlite/__init__.py — missing the compiled .so so Rust-bound symbols like BoxliteRestOptions are absent). Every fixture using boxlite.BoxliteRestOptions failed at setup with AttributeError. Fix: --import-mode=importlib in scripts/test/e2e/pytest.ini (modern mode, no sys.path injection). Tests that previously did `from conftest import drain` couldn't keep doing that under importlib mode (conftest no longer on sys.path) — extracted drain/collect_stream/ stdout_line_count from conftest into scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification which already used the same lib/ + sys.path.insert pattern), and updated the 8 test files' imports. Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed (the 7 are real production bugs the suite is designed to catch: PR-C fixes runner exec error mapping, PR-D fixes FFI drain race). Before this PR's collision fix: 0 passed, 32 fixture-setup errors. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the existing local-FFI integration tests can't reach. All ship with `xfail(strict=True)` for the 10 known production bugs they pin: test_error_code_mapping.py 10 cases 4 xfails test_quota_enforcement.py 5 cases 5 xfails (module-level) test_runner_concurrency.py 6 cases 1 xfail -- -- 21 10 Each xfail's reason= string carries a `file:line` pointer to the bug location and a short root-cause note, so a follow-up fix-PR can grep for the right reason, drop the marker, and watch the test go green. `strict=True` means an unexpected xpass becomes a CI failure — i.e. the moment any of these bugs is silently fixed we'll notice. xfail coverage map (which PR removes which marker): test_exec_after_box_removed_is_typed_error → PR C (runner exec error mapping: IsNotFound → 404) test_invalid_argument_zero_cpu_returns_400 ) test_invalid_argument_negative_memory_returns_400 ) needs API test_oversized_cpu_returns_400 ) ValidationPipe 5x test_quota_enforcement.py ) + per-sandbox ) quota fix ) (separate ) follow-up) test_execution_invalid_command_returns_422 → needs Rust spawn-failed → ErrExecution mapping fix (separate follow-up; not in C or D) Split out of boxlite-ai#681 — this is **PR B of 4**: A (boxlite-ai#682) reorganize — merged before this rebases cleanly B (this) 21 new cases C runner exec error mapping fix D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn. Two related observable bugs: exec on a removed box → 500 "spawn_failed: build failed" exec on a stopped box → 500 (same shape) exec on box mid-rebuild → 500 (same shape) Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go` called `execManager.Start(...)` and unconditionally wrapped any non-nil error as 500. The SDK Start path already tunnels typed errors from the Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState) when the box state changed under the request — they just weren't being read. Fix: extend `classifyExecError` to peek for those SDK typed errors and return the canonical mapping from `src/shared/src/errors.rs:198-280`: IsNotFound → 404 Not Found IsStopped → 409 Conflict IsInvalidState → 409 Conflict (other / no SDK match) → 500 (unchanged) Also routes the `Start()` error site through `classifyExecError` instead of hard-coding 500. Flips the xfail on `test_exec_after_box_removed_is_typed_error` (sdks/python/tests/e2e/test_runner_concurrency.py) — that case now xpasses, which under strict=True trips the marker and fails CI; the fix here removes the marker so it lands green. Other related xfails in test_error_code_mapping.py are NOT addressed by this PR (they need DTO validation / quota / Rust spawn-failed reclassification — separate follow-ups). Split out of boxlite-ai#681 — this is **PR C of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (this) runner exec error mapping fix + flip 1 xfail D FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's `box.Exec` would occasionally return with stdout chunks still in flight — the user's callback got `OnExit` (or the wait gRPC reply) before the matching `OnStdout`/`OnStderr` chunks landed on the queue. From the caller's perspective, the exec had finished but the last few bytes of stdout were missing. Root cause: `execution_wait` and `exit_pump` pushed their terminal events as soon as the underlying process exited, racing the still- draining stream pumps. The FFI queue is ordered, but the wait/exit events were entering it ahead of late stdout chunks. Fix: track a `streams_pending` count on `ExecutionHandle` instead of the previous per-pump receiver list. Both `exit_pump` and the `execution_wait` task now await `streams_pending → 0` (via a `tokio::sync::Notify` notification + standard register-then-check pattern to avoid post-notification miss) before pushing their terminal events. Deterministic ordering tests added inline in execution.rs cover: - stdout-only exec: all chunks land before Exit - mixed stdout/stderr: every chunk lands before Wait - process exit before pump completion: terminal event waits - pump completion before process exit: terminal event fires immediately (no spurious wait) Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race` in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from ~90% loss against stock 0.9.5 to 0% loss against this PR. That test is NOT in the PR-B xfail set (it was pre-existing), so no markers need flipping. Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are NOT addressed here — separate root causes, separate follow-ups. Split out of boxlite-ai#681 — this is **PR D of 4**: A (boxlite-ai#682) reorganize B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs) C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail D (this) FFI exec drain race fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

scripts/test/e2e/lib/e2e_helpers.py (1)

20-39: ⚡ Quick win

Consider adding parameter type annotations for better IDE support and type safety.

The helper functions are well-implemented and correct, but adding explicit type annotations for the stream and ex parameters would improve the developer experience when importing and using these utilities. This is especially valuable for a shared module that will be imported across multiple test files.

📝 Suggested type annotations

+from typing import AsyncIterator, Any
+
-async def collect_stream(stream) -> str:
+async def collect_stream(stream: AsyncIterator[bytes] | AsyncIterator[str] | None) -> str:
+    """Collect and decode an async stream into a single string."""
     if stream is None:
         return ""
     chunks: list[str] = []
     async for ch in stream:
         chunks.append(
             ch.decode("utf-8", "replace") if isinstance(ch, bytes) else str(ch)
         )
     return "".join(chunks)


-async def drain(ex) -> tuple[str, str]:
+async def drain(ex: Any) -> tuple[str, str]:
     """Drain stdout + stderr concurrently — required for REST exec."""
     out_t = asyncio.create_task(collect_stream(ex.stdout()))
     err_t = asyncio.create_task(collect_stream(ex.stderr()))
     return await asyncio.gather(out_t, err_t)


 def stdout_line_count(s: str) -> int:
+    """Count non-empty lines in a string."""
     return len([ln for ln in s.splitlines() if ln])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/lib/e2e_helpers.py` around lines 20 - 39, Add explicit
parameter type annotations to improve IDE/type-checker support: annotate
collect_stream(stream) with an appropriate async-iterable type (e.g., stream:
Optional[AsyncIterable[Union[bytes, str]]]) and annotate drain(ex) with the
expected execution object type (e.g., ex: Any or a small Protocol with
stdout()/stderr() -> AsyncIterable[Union[bytes, str]]), and update the drain
signature to return Tuple[str, str] (or import Tuple) if not already recognized;
also add any necessary typing imports (Optional, AsyncIterable, Union, Any,
Tuple) at the top. Ensure you keep the existing return types and semantics of
collect_stream and drain while only adding these parameter type hints.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test/e2e/lib/e2e_helpers.py`:
- Around line 20-39: Add explicit parameter type annotations to improve
IDE/type-checker support: annotate collect_stream(stream) with an appropriate
async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]])
and annotate drain(ex) with the expected execution object type (e.g., ex: Any or
a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]),
and update the drain signature to return Tuple[str, str] (or import Tuple) if
not already recognized; also add any necessary typing imports (Optional,
AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing
return types and semantics of collect_stream and drain while only adding these
parameter type hints.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ffa6a850-42c0-4fe7-a972-75621dd42e1d

📥 Commits

Reviewing files that changed from the base of the PR and between e4bbd0e and b5e6f30.

📒 Files selected for processing (27)

.github/workflows/e2e-stack.yml
make/test.mk
scripts/test/e2e/README.md
scripts/test/e2e/lib/e2e_helpers.py
scripts/test/e2e/pytest.ini
scripts/test/e2e/run.sh
scripts/test/e2e/two_sided.sh
sdks/c/tests/e2e/e2e_basic.c
sdks/c/tests/e2e/test_c_entry.py
sdks/go/tests/e2e/e2e_basic.go
sdks/go/tests/e2e/test_go_entry.py
sdks/node/tests/e2e/e2e_basic.ts
sdks/node/tests/e2e/test_node_entry.py
sdks/python/pytest.ini
sdks/python/tests/e2e/conftest.py
sdks/python/tests/e2e/test_box_management.py
sdks/python/tests/e2e/test_errors.py
sdks/python/tests/e2e/test_exec_options.py
sdks/python/tests/e2e/test_exec_timeout.py
sdks/python/tests/e2e/test_exec_unit_shape.py
sdks/python/tests/e2e/test_execution_shutdown.py
sdks/python/tests/e2e/test_lifecycle.py
sdks/python/tests/e2e/test_p0_6_exec_stdout_race.py
sdks/python/tests/e2e/test_path_verification.py
sdks/python/tests/e2e/test_resize_tty.py
sdks/python/tests/e2e/test_shutdown.py
src/cli/tests/e2e/test_cli_entry.py

Add a file-level header comment to each of the two pytest.ini files so readers can tell at a glance which suite each one drives and where the sibling config lives, without having to follow run.sh / Makefile targets. - scripts/test/e2e/pytest.ini — cross-SDK e2e (all SDKs go through Python pytest entries; native code lives in subprocess'd SDK binaries). - sdks/python/pytest.ini — Python SDK only (unit + single-machine integration via local PyO3/FFI). Existing inline comments near `testpaths` / `norecursedirs` are kept as-is — they explain the mechanical reason for those specific settings, while the new header explains the file's overall scope. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xlite-ai#682) Adds Python type hints to the two helper functions that were missing parameter annotations, and docstrings where they were missing too. Per CodeRabbit nitpick on PR boxlite-ai#682 — quality-of-life only, no behavior change. `from __future__ import annotations` already in the file means the `|` union syntax works on Python 3.10 too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 · 2026-06-09T10:30:25Z

Closing — abandoning the per-SDK relocation approach. New e2e cases will stay under scripts/test/e2e/cases/ matching main's layout. Stacked PRs #683/#684/#685 will be rebased to drop b5e6f30; #686-#696 are already independent from this commit.

21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the existing local-FFI integration tests can't reach. Cases shipped with xfail(strict=True) markers for the known production bugs they pin — a follow-up fix-PR flipping a bug closed will trip the strict marker into a CI failure, surfacing the fact that the xfail can now be dropped. Files added: scripts/test/e2e/cases/test_error_code_mapping.py scripts/test/e2e/cases/test_quota_enforcement.py scripts/test/e2e/cases/test_runner_concurrency.py Replaces an earlier split-out attempt that stacked on a reorganize PR (boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn: exec on a removed box -> 500 "spawn_failed: build failed" exec on a stopped box -> 500 (same shape) exec on box mid-rebuild -> 500 (same shape) Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a generic 500 without consulting the SDK's typed error helpers (IsNotFound, IsStopped, IsInvalidState). Other handlers in this file (signal/resize/kill/status) already classify these; POST /exec was the one that got missed. Fix: extend classifyExecError to recognise the SDK's typed errors and map them to 404 (NotFound), 409 (Stopped, InvalidState) per the canonical mapping at src/shared/src/errors.rs. Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes from 500 -> 4xx after this change. Re-validation: stack on top of boxlite-ai#683 (which adds test_runner_concurrency.py) for the exec-after-soft-deleted-box variant. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's box.Exec would occasionally return with stdout chunks still in flight — the user's callback got OnExit (or the wait gRPC reply) before the matching OnStdout/OnStderr chunks landed on the queue. From the caller's perspective, the exec had finished but its stdout was silently truncated. Root cause: execution_wait spawned an independent terminal task that pushed the Wait event as soon as wait_on_clone returned — with no drain barrier — so the wait reply could land in the event queue ahead of still-flushing stream pumps. Fix: make exit_pump the sole owner of terminal-event dispatch. Both execution_wait and register_exit fan into it; exit_pump awaits all stream pump receivers before pushing the terminal event. Queue order becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task. Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70% stdout-loss to 0%. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… pin known bugs) (#683) real e2e tests add-up ## What 21 new e2e cases under \`sdks/python/tests/e2e/\` pin contracts the existing local-FFI integration tests can't reach: | File | Cases | xfails | Topic | |---|---|---|---| | \`test_error_code_mapping.py\` | 10 | 4 | \`BoxliteError → HTTP\` mapping (\`src/shared/src/errors.rs:198-280\`) | | \`test_quota_enforcement.py\` | 5 | 5 | API create-boundary per-sandbox quota | | \`test_runner_concurrency.py\` | 6 | 1 | Cross-process state-machine races | All 10 xfails are \`strict=True\` — an unexpected xpass fails CI, so the moment any bug is silently fixed we notice. Each xfail's \`reason=\` carries a \`file:line\` pointer and short root-cause. ## xfail → fix PR map | xfail | Will flip in | |---|---| | \`test_exec_after_box_removed_is_typed_error\` | PR C (runner classifies \`IsNotFound → 404\`) | | \`test_invalid_argument_zero_cpu_returns_400\` | needs API \`ValidationPipe\` fix (separate follow-up, not in C/D) | | \`test_invalid_argument_negative_memory_returns_400\` | same | | \`test_oversized_cpu_returns_400\` | same + per-sandbox quota (separate follow-up) | | 5× \`test_quota_enforcement.py\` (module-level) | per-sandbox quota fix (separate follow-up) | | \`test_execution_invalid_command_returns_422\` | needs Rust \`spawn_failed → ErrExecution\` mapping (separate follow-up) | ## Stack | | what | this PR | |---|---|---| | A (#682) | reorganize only (rename + wiring) | ← prereq | | **B (this)** | **21 new e2e cases** | ← you are here | | C | runner exec error mapping fix + flip 1 xfail | depends on B | | D | FFI exec drain race fix (no PR-B xfails to flip) | depends on B | 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Tests** * Added end-to-end coverage validating API HTTP status ↔ error-code mappings, auth boundary behavior, unknown-resource and unregistered-image semantics, and state-transition error handling. * Added quota enforcement tests covering per-sandbox/org limits, boundary cases (including zero CPUs), and checks that quota rejections do not silently create resources. * Added concurrency/regression tests for parallel execs and creates, delete-during-exec behavior, repeated execs, and idempotent/typed stop semantics.  --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced Jun 9, 2026

test(e2e): add error-mapping, quota, runner-concurrency cases (xfails pin known bugs) #683

Merged

fix(runner): classify SDK errors at exec boundary (404/409 not 500) #684

Open

fix(ffi): drain stdout/stderr pumps before Wait terminal event #685

Closed

G4614 force-pushed the test/e2e-reorganize-only branch from c7cc31a to 24b3efa Compare June 9, 2026 04:06

G4614 force-pushed the test/e2e-reorganize-only branch from 24b3efa to b5e6f30 Compare June 9, 2026 04:47

G4614 mentioned this pull request Jun 9, 2026

test(e2e): Round 1 — error mapping, quota, runner concurrency (+ FFI drain fix) #681

Closed

4 tasks

G4614 marked this pull request as ready for review June 9, 2026 05:54

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

G4614 and others added 2 commits June 9, 2026 06:26

G4614 marked this pull request as draft June 9, 2026 06:59

G4614 closed this Jun 9, 2026

G4614 mentioned this pull request Jun 9, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event #705

Closed

This was referenced Jun 9, 2026

test(e2e): add cli-detach-recovery, exec-attach, volume-readonly cases #710

Merged

fix(ffi): drain stdout/stderr pumps before Wait terminal event (re-open of #685/#705) #714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): move test cases to per-SDK tests/ folders#682

test(e2e): move test cases to per-SDK tests/ folders#682
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only

G4614 commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

G4614 commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Supporting wiring (forced by the renames)

Stack

Test plan

Known caveat

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

G4614 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

G4614 commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading