Skip to content

test(e2e): move test cases to per-SDK tests/ folders#682

Closed
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only
Closed

test(e2e): move test cases to per-SDK tests/ folders#682
G4614 wants to merge 3 commits into
boxlite-ai:mainfrom
G4614:test/e2e-reorganize-only

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

move test files to different SDK dirs as previous

What

PR #678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout. This PR moves them so each SDK's e2e tests live next to that SDK's other tests:

from to
scripts/test/e2e/cases/test_c_entry.py sdks/c/tests/e2e/
scripts/test/e2e/cases/test_go_entry.py sdks/go/tests/e2e/
scripts/test/e2e/cases/test_node_entry.py sdks/node/tests/e2e/
scripts/test/e2e/cases/test_cli_entry.py src/cli/tests/e2e/
scripts/test/e2e/cases/test_*.py (Python pytest) sdks/python/tests/e2e/
scripts/test/e2e/cases/conftest.py sdks/python/tests/e2e/
scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} sdks/{c,go,node}/tests/e2e/

Stays put in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md (cross-SDK orchestration, not per-SDK).

Supporting wiring (forced by the renames)

  • scripts/test/e2e/run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args. testpaths in pytest.ini can't resolve outside its rootdir.
  • scripts/test/e2e/pytest.ini: comment explaining the testpaths omission.
  • .github/workflows/e2e-stack.yml: paths: filter expanded to all new e2e locations (sdks/{c,go,node,python}/tests/e2e/**, src/cli/tests/e2e/**).
  • scripts/test/e2e/README.md: layout map updated.

Stack

what this PR
A reorganize only (rename + wiring) ← you are here
B 21 new e2e cases (pin known bugs via xfail(strict=True)) depends on A
C runner exec error mapping fix + flip relevant B xfails depends on B
D FFI exec drain race fix + flip relevant B xfails depends on B

Splitting #681 into 4 because the original bundled rename + new tests + 2 source fixes into one commit — hard to review, hard to revert any single piece without unwinding the rest.

Test plan

Result against local stack with the full A→D stack applied:

  • make test:e2e43 passed, 10 xfailed in 37.05s
  • FFI ordering tests run green (test_drain_before_terminal*)

Breakdown of the 43p/10xf (covers all four PRs in the stack):

File Cases xfail Topic Added by
test_error_code_mapping.py 10 4 BoxliteError → HTTP mapping (src/shared/src/errors.rs:198-280) B (#683)
test_quota_enforcement.py 5 5 API create-boundary per-sandbox quota B (#683)
test_runner_concurrency.py 6 1 Cross-process state-machine races B (#683)
all existing relocated cases 22 0 unchanged behavior, just new paths A (this PR)

xfail (bugs pinned, will xpass-strict once fixed):

  • cpus=0 / memory_mib=-1 silently accepted — DTO @Min + ValidationPipe transform interaction (apps/api/src/main.ts:66-67)
  • exec /nonexistent → 500 instead of 422 — Rust spawn_failed misclassified as ErrInternal
  • cpus=999 silently clamped — no @Max, no quota lookup in createFromSnapshot (apps/api/src/sandbox/services/sandbox.service.ts)
  • 5× quota cases — shared root cause (no per-sandbox quota enforcement at API create boundary)
  • exec-after-box-removed — 500 leak via runner's spawn-against-soft-deleted-box path (distinct from the Stopped→500 leak C closes)

Known caveat

The pytest.ini-in-scripts/ + tests-in-sdks/ layout means cd sdks/python/tests/e2e && pytest (without going through run.sh) won't read the pytest.ini (asyncio_mode, markers, etc.). Always invoke via make test:e2e or scripts/test/e2e/run.sh. Cleaner would be to move pytest.ini to repo root so its rootdir matches — left as follow-up.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Reorganized E2E suite to run per-SDK end-to-end tests (C, Go, Node, Python, CLI) and updated the unified test runner.
    • Introduced shared E2E helper utilities used across SDKs and adjusted tests to import them.
  • CI/Chores
    • Expanded CI triggers so E2E workflows run for changes across SDK and test directories.
  • Documentation
    • Updated E2E README with new layout, running instructions, and fixture/setup notes.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 98d981a4-b4d6-47d7-93cb-d068b6937bb8

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7f1f0 and 23ed4b8.

📒 Files selected for processing (1)
  • scripts/test/e2e/lib/e2e_helpers.py

📝 Walkthrough

Walkthrough

This PR centralizes shared E2E test utilities, switches the E2E runner to explicitly invoke per-SDK sdks/*/tests/e2e directories, updates pytest import behavior to avoid source-wheel shadowing, rewires SDK/CLI test entry imports to the centralized lib, and expands CI path filters to trigger on SDK e2e/source changes.

Changes

E2E Test Suite Infrastructure Reorganization

Layer / File(s) Summary
E2E Helper Module Creation and Extraction
scripts/test/e2e/lib/e2e_helpers.py, scripts/test/e2e/pytest.ini, sdks/python/tests/e2e/conftest.py
Creates shared utilities module with collect_stream, drain, and stdout_line_count; sets pytest --import-mode=importlib; removes local helper implementations from conftest and imports shared helpers from centralized lib.
Test Runner and Orchestration Refactoring
make/test.mk, scripts/test/e2e/run.sh, scripts/test/e2e/two_sided.sh, .github/workflows/e2e-stack.yml, scripts/test/e2e/README.md
Updates test:e2e to call run.sh; run.sh runs python3 -m pytest with explicit per-SDK tests/e2e paths; two-sided harness uses run.sh; expands GitHub Actions path filters to include C/Go/Node/Python/CLI e2e and source paths; README updated for per-SDK layout and fixture setup.
Documentation and SDK Test Configuration
scripts/test/e2e/README.md, sdks/python/pytest.ini
Rewrites suite docs to describe SDK-agnostic end-to-end path, updates fixture/profile instructions, and adds norecursedirs = e2e to SDK pytest config to avoid recursing into cross-SDK e2e tests.
C/Go/Node SDK E2E Test Rewiring
sdks/c/tests/e2e/*, sdks/go/tests/e2e/*, sdks/node/tests/e2e/*, src/cli/tests/e2e/test_cli_entry.py
SDK and CLI e2e tests now reference local sibling e2e_basic sources, compute REPO consistently, and import path-verification helpers from centralized scripts/test/e2e/lib; Node smoke script uses local SDK build and improves cleanup handling.
Python E2E Test Helper Import Migration
sdks/python/tests/e2e/*
All Python e2e tests updated to import drain, collect_stream, and stdout_line_count from e2e_helpers instead of local conftest; import wiring and minor formatting/whitespace changes applied without behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • boxlite-ai/boxlite#678: Introduces the cross-SDK E2E regression suite infrastructure that this PR reorganizes and refactors.

Poem

🐰 Helpers gathered, paths realigned,

Tests hop home to SDKs designed.
Shared code snug in one small burrow,
Runner calls each suite in a row.
Hooray — the e2e carrots grow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main objective of the pull request: relocating end-to-end test cases from a centralized scripts/test/e2e/cases/ directory into per-SDK tests/e2e/ folders, making tests colocate with their respective SDK implementations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from c7cc31a to 24b3efa Compare June 9, 2026 04:06
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a
fresh location that doesn't match the repo's existing per-SDK test
layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/,
sdks/c/tests/, sdks/go/*_test.go).

This PR moves the cases to match. Driver scripts stay in
scripts/test/e2e/ (cross-SDK infra); test files now live next to
the SDK they exercise:

  scripts/test/e2e/cases/test_c_entry.py    → sdks/c/tests/e2e/
  scripts/test/e2e/cases/test_go_entry.py   → sdks/go/tests/e2e/
  scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/
  scripts/test/e2e/cases/test_cli_entry.py  → src/cli/tests/e2e/
  scripts/test/e2e/cases/test_*.py (Python pytest)
                                            → sdks/python/tests/e2e/
  scripts/test/e2e/cases/conftest.py        → sdks/python/tests/e2e/
  scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}
                                            → sdks/{c,go,node}/tests/e2e/

Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh,
two_sided.sh, fixture_setup.py, pytest.ini, README.md.

Supporting wiring (forced by the renames):
  - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args
    (testpaths in pytest.ini can't resolve outside its rootdir)
  - pytest.ini: comment explaining the testpaths omission
  - .github/workflows/e2e-stack.yml: paths filter expanded to all
    new e2e locations
  - README.md: layout map updated

Two collisions the rename exposed (both fixed in this PR):

  1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/
     and would import the new e2e/ subdir, whose conftest needs
     Python 3.11+ tomllib and a live API stack the unit-test
     workflow doesn't provide. Fix: norecursedirs = e2e in
     sdks/python/pytest.ini. The e2e runner passes the dir
     explicitly on the CLI which bypasses norecursedirs.

  2. pytest's default --import-mode=prepend puts each test file's
     rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the
     rootpath becomes sdks/python/, which then shadows the installed
     boxlite wheel with the source-tree stub package
     (sdks/python/boxlite/__init__.py — missing the compiled .so so
     Rust-bound symbols like BoxliteRestOptions are absent). Every
     fixture using boxlite.BoxliteRestOptions failed at setup with
     AttributeError. Fix: --import-mode=importlib in
     scripts/test/e2e/pytest.ini (modern mode, no sys.path
     injection). Tests that previously did `from conftest import
     drain` couldn't keep doing that under importlib mode (conftest
     no longer on sys.path) — extracted drain/collect_stream/
     stdout_line_count from conftest into
     scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification
     which already used the same lib/ + sys.path.insert pattern),
     and updated the 8 test files' imports.

Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built
boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed
(the 7 are real production bugs the suite is designed to catch:
PR-C fixes runner exec error mapping, PR-D fixes FFI drain race).
Before this PR's collision fix: 0 passed, 32 fixture-setup errors.

Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2
fixes; this split-out covers reorganize only — the rest ships in
follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the test/e2e-reorganize-only branch from 24b3efa to b5e6f30 Compare June 9, 2026 04:47
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under sdks/python/tests/e2e/ that pin contracts the
existing local-FFI integration tests can't reach. All ship with
`xfail(strict=True)` for the 10 known production bugs they pin:

  test_error_code_mapping.py  10 cases  4 xfails
  test_quota_enforcement.py    5 cases  5 xfails (module-level)
  test_runner_concurrency.py   6 cases  1 xfail
                              --       --
                              21       10

Each xfail's reason= string carries a `file:line` pointer to the bug
location and a short root-cause note, so a follow-up fix-PR can grep
for the right reason, drop the marker, and watch the test go green.
`strict=True` means an unexpected xpass becomes a CI failure — i.e.
the moment any of these bugs is silently fixed we'll notice.

xfail coverage map (which PR removes which marker):

  test_exec_after_box_removed_is_typed_error
    → PR C (runner exec error mapping: IsNotFound → 404)

  test_invalid_argument_zero_cpu_returns_400          )
  test_invalid_argument_negative_memory_returns_400   ) needs API
  test_oversized_cpu_returns_400                      ) ValidationPipe
  5x test_quota_enforcement.py                        ) + per-sandbox
                                                       ) quota fix
                                                       ) (separate
                                                       ) follow-up)

  test_execution_invalid_command_returns_422
    → needs Rust spawn-failed → ErrExecution mapping fix
      (separate follow-up; not in C or D)

Split out of boxlite-ai#681 — this is **PR B of 4**:
  A (boxlite-ai#682) reorganize     — merged before this rebases cleanly
  B (this)  21 new cases
  C         runner exec error mapping fix
  D         FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
`POST /v1/{prefix}/boxes/{id}/exec` was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn. Two related observable bugs:

  exec on a removed box   → 500 "spawn_failed: build failed"
  exec on a stopped box   → 500 (same shape)
  exec on box mid-rebuild → 500 (same shape)

Root cause: `BoxliteExec` in `apps/runner/pkg/api/controllers/boxlite_exec.go`
called `execManager.Start(...)` and unconditionally wrapped any non-nil
error as 500. The SDK Start path already tunnels typed errors from the
Rust core (sdks/go/errors.go: IsNotFound / IsStopped / IsInvalidState)
when the box state changed under the request — they just weren't being
read.

Fix: extend `classifyExecError` to peek for those SDK typed errors and
return the canonical mapping from `src/shared/src/errors.rs:198-280`:

  IsNotFound       → 404 Not Found
  IsStopped        → 409 Conflict
  IsInvalidState   → 409 Conflict
  (other / no SDK match) → 500 (unchanged)

Also routes the `Start()` error site through `classifyExecError`
instead of hard-coding 500.

Flips the xfail on `test_exec_after_box_removed_is_typed_error`
(sdks/python/tests/e2e/test_runner_concurrency.py) — that case now
xpasses, which under strict=True trips the marker and fails CI; the
fix here removes the marker so it lands green.

Other related xfails in test_error_code_mapping.py are NOT addressed
by this PR (they need DTO validation / quota / Rust spawn-failed
reclassification — separate follow-ups).

Split out of boxlite-ai#681 — this is **PR C of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (this) runner exec error mapping fix + flip 1 xfail
  D        FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's `box.Exec` would occasionally return with stdout chunks
still in flight — the user's callback got `OnExit` (or the wait gRPC
reply) before the matching `OnStdout`/`OnStderr` chunks landed on
the queue. From the caller's perspective, the exec had finished but
the last few bytes of stdout were missing.

Root cause: `execution_wait` and `exit_pump` pushed their terminal
events as soon as the underlying process exited, racing the still-
draining stream pumps. The FFI queue is ordered, but the wait/exit
events were entering it ahead of late stdout chunks.

Fix: track a `streams_pending` count on `ExecutionHandle` instead of
the previous per-pump receiver list. Both `exit_pump` and the
`execution_wait` task now await `streams_pending → 0` (via a
`tokio::sync::Notify` notification + standard register-then-check
pattern to avoid post-notification miss) before pushing their
terminal events.

Deterministic ordering tests added inline in execution.rs cover:

  - stdout-only exec: all chunks land before Exit
  - mixed stdout/stderr: every chunk lands before Wait
  - process exit before pump completion: terminal event waits
  - pump completion before process exit: terminal event fires
    immediately (no spurious wait)

Side effect on the existing e2e suite: `test_p0_6_exec_stdout_race`
in sdks/python/tests/e2e/ (the regression test for boxlite-ai#563) flips from
~90% loss against stock 0.9.5 to 0% loss against this PR. That test
is NOT in the PR-B xfail set (it was pre-existing), so no markers
need flipping.

Other PR-B xfails (DTO validation, quota, spawn-failed mapping) are
NOT addressed here — separate root causes, separate follow-ups.

Split out of boxlite-ai#681 — this is **PR D of 4**:
  A (boxlite-ai#682) reorganize
  B (boxlite-ai#683) 21 new e2e cases (xfails pin bugs)
  C (boxlite-ai#684) runner exec error mapping fix + flip 1 xfail
  D (this) FFI exec drain race fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as ready for review June 9, 2026 05:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/test/e2e/lib/e2e_helpers.py (1)

20-39: ⚡ Quick win

Consider adding parameter type annotations for better IDE support and type safety.

The helper functions are well-implemented and correct, but adding explicit type annotations for the stream and ex parameters would improve the developer experience when importing and using these utilities. This is especially valuable for a shared module that will be imported across multiple test files.

📝 Suggested type annotations
+from typing import AsyncIterator, Any
+
-async def collect_stream(stream) -> str:
+async def collect_stream(stream: AsyncIterator[bytes] | AsyncIterator[str] | None) -> str:
+    """Collect and decode an async stream into a single string."""
     if stream is None:
         return ""
     chunks: list[str] = []
     async for ch in stream:
         chunks.append(
             ch.decode("utf-8", "replace") if isinstance(ch, bytes) else str(ch)
         )
     return "".join(chunks)


-async def drain(ex) -> tuple[str, str]:
+async def drain(ex: Any) -> tuple[str, str]:
     """Drain stdout + stderr concurrently — required for REST exec."""
     out_t = asyncio.create_task(collect_stream(ex.stdout()))
     err_t = asyncio.create_task(collect_stream(ex.stderr()))
     return await asyncio.gather(out_t, err_t)


 def stdout_line_count(s: str) -> int:
+    """Count non-empty lines in a string."""
     return len([ln for ln in s.splitlines() if ln])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/test/e2e/lib/e2e_helpers.py` around lines 20 - 39, Add explicit
parameter type annotations to improve IDE/type-checker support: annotate
collect_stream(stream) with an appropriate async-iterable type (e.g., stream:
Optional[AsyncIterable[Union[bytes, str]]]) and annotate drain(ex) with the
expected execution object type (e.g., ex: Any or a small Protocol with
stdout()/stderr() -> AsyncIterable[Union[bytes, str]]), and update the drain
signature to return Tuple[str, str] (or import Tuple) if not already recognized;
also add any necessary typing imports (Optional, AsyncIterable, Union, Any,
Tuple) at the top. Ensure you keep the existing return types and semantics of
collect_stream and drain while only adding these parameter type hints.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/test/e2e/lib/e2e_helpers.py`:
- Around line 20-39: Add explicit parameter type annotations to improve
IDE/type-checker support: annotate collect_stream(stream) with an appropriate
async-iterable type (e.g., stream: Optional[AsyncIterable[Union[bytes, str]]])
and annotate drain(ex) with the expected execution object type (e.g., ex: Any or
a small Protocol with stdout()/stderr() -> AsyncIterable[Union[bytes, str]]),
and update the drain signature to return Tuple[str, str] (or import Tuple) if
not already recognized; also add any necessary typing imports (Optional,
AsyncIterable, Union, Any, Tuple) at the top. Ensure you keep the existing
return types and semantics of collect_stream and drain while only adding these
parameter type hints.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ffa6a850-42c0-4fe7-a972-75621dd42e1d

📥 Commits

Reviewing files that changed from the base of the PR and between e4bbd0e and b5e6f30.

📒 Files selected for processing (27)
  • .github/workflows/e2e-stack.yml
  • make/test.mk
  • scripts/test/e2e/README.md
  • scripts/test/e2e/lib/e2e_helpers.py
  • scripts/test/e2e/pytest.ini
  • scripts/test/e2e/run.sh
  • scripts/test/e2e/two_sided.sh
  • sdks/c/tests/e2e/e2e_basic.c
  • sdks/c/tests/e2e/test_c_entry.py
  • sdks/go/tests/e2e/e2e_basic.go
  • sdks/go/tests/e2e/test_go_entry.py
  • sdks/node/tests/e2e/e2e_basic.ts
  • sdks/node/tests/e2e/test_node_entry.py
  • sdks/python/pytest.ini
  • sdks/python/tests/e2e/conftest.py
  • sdks/python/tests/e2e/test_box_management.py
  • sdks/python/tests/e2e/test_errors.py
  • sdks/python/tests/e2e/test_exec_options.py
  • sdks/python/tests/e2e/test_exec_timeout.py
  • sdks/python/tests/e2e/test_exec_unit_shape.py
  • sdks/python/tests/e2e/test_execution_shutdown.py
  • sdks/python/tests/e2e/test_lifecycle.py
  • sdks/python/tests/e2e/test_p0_6_exec_stdout_race.py
  • sdks/python/tests/e2e/test_path_verification.py
  • sdks/python/tests/e2e/test_resize_tty.py
  • sdks/python/tests/e2e/test_shutdown.py
  • src/cli/tests/e2e/test_cli_entry.py

G4614 and others added 2 commits June 9, 2026 06:26
Add a file-level header comment to each of the two pytest.ini files so
readers can tell at a glance which suite each one drives and where the
sibling config lives, without having to follow run.sh / Makefile targets.

- scripts/test/e2e/pytest.ini  — cross-SDK e2e (all SDKs go through Python
  pytest entries; native code lives in subprocess'd SDK binaries).
- sdks/python/pytest.ini       — Python SDK only (unit + single-machine
  integration via local PyO3/FFI).

Existing inline comments near `testpaths` / `norecursedirs` are kept as-is
— they explain the mechanical reason for those specific settings, while
the new header explains the file's overall scope.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xlite-ai#682)

Adds Python type hints to the two helper functions that were missing
parameter annotations, and docstrings where they were missing too.

Per CodeRabbit nitpick on PR boxlite-ai#682 — quality-of-life only, no behavior
change. `from __future__ import annotations` already in the file means
the `|` union syntax works on Python 3.10 too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 marked this pull request as draft June 9, 2026 06:59
@G4614

G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Closing — abandoning the per-SDK relocation approach. New e2e cases will stay under scripts/test/e2e/cases/ matching main's layout. Stacked PRs #683/#684/#685 will be rebased to drop b5e6f30; #686-#696 are already independent from this commit.

@G4614 G4614 closed this Jun 9, 2026
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the
existing local-FFI integration tests can't reach. Cases shipped with
xfail(strict=True) markers for the known production bugs they pin —
a follow-up fix-PR flipping a bug closed will trip the strict marker
into a CI failure, surfacing the fact that the xfail can now be dropped.

Files added:
  scripts/test/e2e/cases/test_error_code_mapping.py
  scripts/test/e2e/cases/test_quota_enforcement.py
  scripts/test/e2e/cases/test_runner_concurrency.py

Replaces an earlier split-out attempt that stacked on a reorganize PR
(boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land
under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn:

  exec on a removed box   -> 500 "spawn_failed: build failed"
  exec on a stopped box   -> 500 (same shape)
  exec on box mid-rebuild -> 500 (same shape)

Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a
generic 500 without consulting the SDK's typed error helpers
(IsNotFound, IsStopped, IsInvalidState). Other handlers in this file
(signal/resize/kill/status) already classify these; POST /exec was
the one that got missed.

Fix: extend classifyExecError to recognise the SDK's typed errors and
map them to 404 (NotFound), 409 (Stopped, InvalidState) per the
canonical mapping at src/shared/src/errors.rs.

Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes
from 500 -> 4xx after this change. Re-validation: stack on top of
boxlite-ai#683 (which adds test_runner_concurrency.py) for the
exec-after-soft-deleted-box variant.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's box.Exec would occasionally return with stdout chunks
still in flight — the user's callback got OnExit (or the wait gRPC
reply) before the matching OnStdout/OnStderr chunks landed on the
queue. From the caller's perspective, the exec had finished but its
stdout was silently truncated.

Root cause: execution_wait spawned an independent terminal task that
pushed the Wait event as soon as wait_on_clone returned — with no
drain barrier — so the wait reply could land in the event queue
ahead of still-flushing stream pumps.

Fix: make exit_pump the sole owner of terminal-event dispatch. Both
execution_wait and register_exit fan into it; exit_pump awaits all
stream pump receivers before pushing the terminal event. Queue order
becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task.

Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70%
stdout-loss to 0%.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DorianZheng pushed a commit that referenced this pull request Jun 9, 2026
… pin known bugs) (#683)

real e2e tests add-up

## What

21 new e2e cases under \`sdks/python/tests/e2e/\` pin contracts the
existing local-FFI integration tests can't reach:

| File | Cases | xfails | Topic |
|---|---|---|---|
| \`test_error_code_mapping.py\` | 10 | 4 | \`BoxliteError → HTTP\`
mapping (\`src/shared/src/errors.rs:198-280\`) |
| \`test_quota_enforcement.py\` | 5 | 5 | API create-boundary
per-sandbox quota |
| \`test_runner_concurrency.py\` | 6 | 1 | Cross-process state-machine
races |

All 10 xfails are \`strict=True\` — an unexpected xpass fails CI, so the
moment any bug is silently fixed we notice. Each xfail's \`reason=\`
carries a \`file:line\` pointer and short root-cause.

## xfail → fix PR map

| xfail | Will flip in |
|---|---|
| \`test_exec_after_box_removed_is_typed_error\` | PR C (runner
classifies \`IsNotFound → 404\`) |
| \`test_invalid_argument_zero_cpu_returns_400\` | needs API
\`ValidationPipe\` fix (separate follow-up, not in C/D) |
| \`test_invalid_argument_negative_memory_returns_400\` | same |
| \`test_oversized_cpu_returns_400\` | same + per-sandbox quota
(separate follow-up) |
| 5× \`test_quota_enforcement.py\` (module-level) | per-sandbox quota
fix (separate follow-up) |
| \`test_execution_invalid_command_returns_422\` | needs Rust
\`spawn_failed → ErrExecution\` mapping (separate follow-up) |

## Stack

| | what | this PR |
|---|---|---|
| A (#682) | reorganize only (rename + wiring) | ← prereq |
| **B (this)** | **21 new e2e cases** | ← you are here |
| C | runner exec error mapping fix + flip 1 xfail | depends on B |
| D | FFI exec drain race fix (no PR-B xfails to flip) | depends on B |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added end-to-end coverage validating API HTTP status ↔ error-code
mappings, auth boundary behavior, unknown-resource and
unregistered-image semantics, and state-transition error handling.
* Added quota enforcement tests covering per-sandbox/org limits,
boundary cases (including zero CPUs), and checks that quota rejections
do not silently create resources.
* Added concurrency/regression tests for parallel execs and creates,
delete-during-exec behavior, repeated execs, and idempotent/typed stop
semantics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant