test(e2e): SDK→API→Runner→VM regression suite#678
Conversation
Existing make test:integration:* uses the local PyO3/FFI path (`Boxlite.default()`) and bypasses both the NestJS API and boxlite-runner. Bugs that surface only on the REST → API → runner chain — e.g. boxlite-ai#563's exec-stdout drop, boxlite-ai#627's attach re-drain — leak into production with the unit + integration suites all green. This suite always uses `Boxlite.rest(opts)` against the local API and proves the path on every run. Layout (scripts/test/e2e/): bootstrap.sh Install PG, Redis, registry, mountpoint-s3, runner binary, systemd units. Idempotent. fixture_setup.py Register alpine:3.23 + ubuntu:22.04 snapshots (waits for active), patch admin org quota, write [profiles.p1] in ~/.boxlite/credentials.toml. Idempotent. cases/conftest.py rt / image / box fixtures — REST-only. cases/test_path_verification.py Meta-test: asserts the runtime is REST-mode against :3000, and one round-trip exec leaves the runner journal with the box id. Catches the case where SDK silently degrades to local FFI or talks to the wrong endpoint. cases/test_p0_6_exec_stdout_race.py First real regression case — runs N boxes × 2 execs and asserts loss rate ≤ 5%. Against stock 0.9.5 runner: ~80–90% loss (FAIL). Against a runner rebuilt with boxlite-ai#563: 0% loss (PASS). lib/path_verification.py Helpers used by the meta-test. two_sided.sh Builds runner from MAIN_REF, runs pytest (expect FAIL=1), builds from PR_REF, runs pytest (expect PASS=0). Only verdict 1+0 is "PR removes the bug + test detects it". Restores original binary on any exit. Make targets (make/test.mk): make test:e2e:setup Bootstrap services + fixture data. make test:e2e Run the suite. make test:e2e:two-sided PR_REF=<branch> required. CI workflow (.github/workflows/e2e-stack.yml): Triggers on push/PR touching sdks/c/src/exec/, src/boxlite/src/rest/, apps/runner/, apps/api/src/. Requires a self-hosted runner labeled `kvm` because GitHub-hosted runners lack /dev/kvm and cannot run libkrun. Verified locally on alpine:3.23, py3.14, against both 0.9.5 stock runner (1 FAIL of 3) and a runner rebuilt from boxlite-ai#563 (3 PASS of 3).
Lifts 9 existing make test:integration test files (Rust + Python) into
the e2e suite where they exercise the real production chain
(SDK→API→Runner→VM) instead of the local PyO3/FFI path.
Existing make test:integration uses Boxlite::default() / Boxlite.default(),
which never touches NestJS API or boxlite-runner. Bugs in the proxy
controller, REST wire format, or runner CGO won't break those tests
even though they're broken in production. These ports run the same
behavioural contracts through REST so the regression coverage tracks
the actual production deployment.
9 new files in scripts/test/e2e/cases/:
test_lifecycle.py — create / list / get_info / remove CRUD
(from src/boxlite/tests/lifecycle.rs)
test_exec_options.py — cwd, env, tty exit-code surface
(from src/boxlite/tests/exec_options.rs)
test_execution_shutdown.py — wait resolves after box.stop();
exec on stopped box must NOT 5xx
(from src/boxlite/tests/execution_shutdown.rs)
test_shutdown.py — close() idempotent; second REST runtime
shares the same world view
(from src/boxlite/tests/shutdown.rs)
test_exec_unit_shape.py — round-trip exec result shape + types
(from sdks/python/tests/test_exec.py)
test_exec_timeout.py — timeout kills a sleeping process and a
SIGTERM-ignoring process
(from sdks/python/tests/test_exec_timeout_sigalrm.py)
test_box_management.py — named box, list_info round-trip, env-var
propagation client→guest
(from sdks/python/tests/test_box_management.py)
test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse
on non-TTY
(from sdks/python/tests/test_resize_tty.py)
test_errors.py — unknown image surfaces typed error; bad
bearer token → 401/403, not 500
(from sdks/python/tests/test_errors.py)
Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration
(some Rust cases use it).
Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a
boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure
(test_exec_on_stopped_box_is_typed_error) is a real production bug —
the API surfaces "Handle invalidated after stop()" with HTTP 500 +
code=11 instead of mapping it to a typed 4xx. Test stays failing
until that mapping fix lands; flipping to xfail would defeat the
point of the regression coverage.
Wraps the session-scoped `rt` fixture in a `_TrackingRuntime` shim that records the box id of every successful `rt.create()` call. An autouse fixture snapshots the runner journal timestamp before each test and, after the test body, asserts every recorded box id appears in the runner journal at least once. If a test silently degrades to local FFI, talks to a wrong endpoint, or the API stops forwarding to the runner, the autouse assertion fires during teardown with the offending box id(s). Tests that don't create any boxes (e.g. `test_invalid_token_returns_401_not_500`, `test_sdk_runtime_is_rest_against_local_api`) are unaffected — the bucket stays empty, the check is a no-op. Sanity-verified: deliberately appending a fake box id to `rt._created` without an actual create call fails the teardown with the expected message, confirming the bypass detection has real teeth. Side effect: the meta-test `test_exec_reaches_runner_journal` is now redundant with the autouse fixture, but kept for clarity — it documents the contract explicitly rather than relying on shared fixture machinery.
The e2e suite exercises box lifecycle (create / exec / lifecycle); none
of those paths touch S3 / volumes. The API's VolumeManager constructor
short-circuits when S3_ENDPOINT is empty (see
apps/api/src/sandbox/managers/volume.manager.ts:47), and the
ObjectStorageService is only constructed on demand by the volume API
which e2e doesn't exercise. So leaving S3_ENDPOINT="" cleanly disables
the AWS surface.
Before this commit, bootstrap.sh required:
- AWS creds (aws login or static keys)
- boxlite-dev-storagebucket-bzhbzvme to exist in account 064212132677
- Two uncommitted patches to volume.manager.ts / object-storage.service.ts
so the S3Client constructor wouldn't `getOrThrow('s3.accessKey')` on empty
values
After this commit:
- bootstrap.sh runs on any nested-KVM Ubuntu host
- No AWS creds needed
- No code patches needed — the early-return path is followed instead
- 24/25 e2e tests pass identically (the lone failure is the
test_exec_on_stopped_box_is_typed_error mapping bug, unchanged)
Verified locally on alpine:3.23, ubuntu:22.04, ubuntu:24.04 with main
HEAD checked out and no working-tree modifications.
…hosts Previous bootstrap.sh assumed node 22 + yarn + npx + pip were already installed on the host. Fresh Ubuntu 24/26 EC2s don't have any of those preinstalled — `yarn install` blew up immediately. Add: - python3-pip + ca-certificates + curl via apt - Node.js 22 via NodeSource if missing or older than 20 - yarn via corepack enable run.sh also split the pip install chain so a missing single package doesn't shadow the others. cargo + go intentionally NOT installed — they're only needed by test:e2e:two-sided (PR comparison flow), not the default make test:e2e. The two-sided README still documents them as prereqs. After this commit, a fresh nested-KVM Ubuntu 26 EC2 with git + sudo can run `make test:e2e:setup && make test:e2e` and reach the 24-pass / 1-fail baseline without any other preinstalls or configuration.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (32)
📝 WalkthroughWalkthroughThis PR adds a complete end-to-end testing infrastructure for Boxlite. It includes a GitHub Actions workflow that targets KVM-capable self-hosted runners, shell scripts to bootstrap/teardown a local API and runner stack, Python fixtures and a pytest-based test harness, REST-mode validation tests, SDK entry-point smoke tests for multiple languages, and utilities for verifying the request path through the system. ChangesEnd-to-end test infrastructure and SDK validation suite
🎯 4 (Complex) | ⏱️ ~45 minutes
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)scripts/test/e2e/sdks/c/e2e_basic.cscripts/test/e2e/sdks/c/e2e_basic.c:14:10: fatal error: 'boxlite.h' file not found ... [truncated 714 characters] ... x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include" Comment |
Adds 4 cases under cases/test_cli_entry.py that exercise the same
SDK→API→Runner→VM chain through the `boxlite` CLI binary instead of
the Python SDK. The CLI shares the Rust `boxlite::rest` client with
the Python SDK at the bottom layer, but adds CLI-only surface
(argparse, exit-code propagation, table output, auth login) that
nothing else in the e2e suite exercises.
Cases:
test_cli_whoami_against_local_api
`boxlite auth whoami` returns the expected admin profile and
URL — proves the CLI is talking to the same local API the
Python SDK uses.
test_cli_ls_returns_table
`boxlite ls` succeeds and emits table-shaped output even with
zero boxes.
test_cli_run_exec_chain
Full lifecycle via CLI: `boxlite run -d <image>`,
`boxlite exec <id> -- echo`, `boxlite rm -f <id>`. Asserts
stdout capture, list visibility, AND the runner journal
contains the new box id. The journal check is the CLI-specific
path-bypass guard (the autouse fixture only watches
`Boxlite.rest._created`; CLI subprocesses are invisible to it).
test_cli_exec_exit_code_propagates
Non-zero exit inside the box round-trips back through the CLI's
own exit code. CLI-specific because argv-to-exit-code mapping
is not on the SDK path.
Locally: 4/4 pass; total suite is now 28 / 29 (the single failure is
still the runner Stopped → 500 mapping bug — unchanged).
Adds 3 new entry-point cases (1 each for Go, Node, C SDK), each routing through SDK → API → Runner → libkrun VM via the local API. Combined with the existing Python SDK + CLI entries, this covers all 5 binding layers above the shared Rust REST client: - Python SDK (PyO3 + asyncio bridge) - CLI binary (subprocess + table output) - Node SDK (napi-rs + JS Promise marshalling) ← new - Go SDK (CGO + context + slice conversion) ← new - C SDK (raw C ABI + callback drain pump) ← new Each binding layer has its own marshalling / serialization / async-glue code path, so a regression in any one of them (e.g. a napi enum that forgets to add a variant, a Go BoxOption that drops a field, a missing pthread join) is invisible to the others. Notes per SDK: Go: builds against sdks/go in the workspace; runs against target/release/libboxlite.so via LD_LIBRARY_PATH. Full create → exec → remove path, asserts captured stdout + exit code + runner journal hit. Node: imports from the LOCAL sdks/node build (the repo root has a stale @boxlite-ai/boxlite 0.9.5 install with field-name glitches — BoxliteRestOptions stores `prefix` but JsBoxlite.rest reads `pathPrefix`; importing from `../../../../../sdks/node` skips the cache). Smoke is create + remove only — exec stream draining is covered by Python / Go / CLI. C: pthread-synchronized create_box callback + dedicated boxlite_runtime_drain pump thread (callbacks don't fire without it — same model as sdks/go's ensureDrainRunning). Smoke is also create + remove only; exec would add 80+ lines of stream-pump glue that doesn't change what the REST chain proves. All three drivers read connection settings from env (BOXLITE_E2E_URL / API_KEY / PREFIX / IMAGE) so pytest fixtures can swap them per-test without rebuilding the native binary. Path-bypass guarantee: each new test snapshots journalctl -u boxlite-runner before the run and asserts the runner journal contains the box id afterwards. The autouse Python guard cannot see CLI / native subprocesses (it only watches the Python `rt._created` bucket), so each new entry test does the verification explicitly. Local results: 31 PASS / 1 FAIL — the lone failure is unchanged (test_exec_on_stopped_box_is_typed_error, Go runner Stopped → 500 mapping bug at apps/runner/pkg/api/controllers/boxlite_exec.go:77).
The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.
…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).
…port (#679) ## Problem After enabling the merge queue, `Test (conclusion)` and `Lint (conclusion)` were made **required** status checks. But `test.yml` / `lint.yml` still had a `paths:` filter on the **`pull_request`** trigger, so a PR not touching those paths never runs the workflow — and the required conclusion check stays **Pending**, blocking the PR from merging. Observed live on #678 (changed only `scripts/test/e2e/**` + `make/**` → `test.yml` never ran → `Test (conclusion)` missing). Confirmed by [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/troubleshooting-required-status-checks): *"a workflow skipped due to path filtering... checks... remain 'Pending'... blocked from merging."* ## Fix Remove the `paths:` filter from the `pull_request` trigger on both workflows so they run on **every** PR (matches rust-analyzer/bevy, which have no top-level paths on their gating workflows). Nothing else changes: - Per-suite filtering still happens in the `changes` job (heavy jobs skip when unaffected). - The conclusion job reports `success`-or-`skipped`, so it always reports green for unrelated PRs. - `push:` paths are kept as-is (irrelevant to PR gating). Cost: ~3 small jobs (config/changes/conclusion) per workflow per PR. ## Note PRs opened before #677 merged still lack the conclusion jobs entirely and need a rebase onto `main` to pick them up — separate from this fix. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated CI/CD workflows to ensure status checks remain responsive on all pull requests targeting main, preventing them from getting stuck in a pending state. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ustup defer
Three reviewer fixes around build-tooling install:
🔴 make setup:build before cargo build
Fresh Ubuntu doesn't ship meson / ninja / build-essential, but
`cargo build -p boxlite-c` shells out to them via the
bubblewrap-sys / e2fsprogs-sys / libkrun-sys crates. Without the
canonical `make setup:build` (routes to scripts/setup/setup-ubuntu.sh),
bootstrap died on `meson: command not found` on every fresh host.
Now invoked before the C SDK build. Also swap the raw cargo call
for `make dist:c` so the staged sdks/c/dist/ output is in place
for any future tooling that expects it.
🟡 Go version from apps/runner/go.mod
Was hardcoded GO_VER=1.23.4. The runner module requires 1.25.4;
the old pin lagged the workspace and `make setup:build` correctly
rejected it. Now reads:
GO_VER=$(awk '/^go [0-9]/ {print $2; exit}' apps/runner/go.mod)
Subtle gotcha: Go 1.21+ auto-downloads a matching toolchain to
~/go/pkg/mod/golang.org/toolchain@... when a go.mod requires a
newer version, and `go version` then reports the auto-toolchain
version, not the system Go install. `make setup:build` and most
other subprocesses see the system Go. So we read the truthful
system version with:
cd /tmp && GOTOOLCHAIN=local go version
before deciding whether to install. Without this, bootstrap would
see "1.25.4" (auto-toolchain) and skip the install, leaving
setup:build still failing on the system 1.23.4.
🟡 rustup defers to rust-toolchain.toml
Was `sh -s -- -y --default-toolchain stable`. The repo's
`rust-toolchain.toml` already says channel = "stable", so the
rustup install pre-downloads stable, then the first cargo call
re-resolves the toolchain anyway (one wasted round-trip). Now:
sh -s -- -y --no-modify-path --profile minimal --default-toolchain none
rustup installs without a default; the first `cargo build` reads
the toolchain file and pulls whatever it asks for. Smaller
install, less duplication, automatically tracks the toml if it
ever pins a specific version.
…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (no test-logic change): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Pure rename + supporting wiring. Zero new test logic, zero source changes. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (forced by the renames): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Two collisions the rename exposed (both fixed in this PR): 1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/ and would import the new e2e/ subdir, whose conftest needs Python 3.11+ tomllib and a live API stack the unit-test workflow doesn't provide. Fix: norecursedirs = e2e in sdks/python/pytest.ini. The e2e runner passes the dir explicitly on the CLI which bypasses norecursedirs. 2. pytest's default --import-mode=prepend puts each test file's rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the rootpath becomes sdks/python/, which then shadows the installed boxlite wheel with the source-tree stub package (sdks/python/boxlite/__init__.py — missing the compiled .so so Rust-bound symbols like BoxliteRestOptions are absent). Every fixture using boxlite.BoxliteRestOptions failed at setup with AttributeError. Fix: --import-mode=importlib in scripts/test/e2e/pytest.ini (modern mode, no sys.path injection). Tests that previously did `from conftest import drain` couldn't keep doing that under importlib mode (conftest no longer on sys.path) — extracted drain/collect_stream/ stdout_line_count from conftest into scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification which already used the same lib/ + sys.path.insert pattern), and updated the 8 test files' imports. Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed (the 7 are real production bugs the suite is designed to catch: PR-C fixes runner exec error mapping, PR-D fixes FFI drain race). Before this PR's collision fix: 0 passed, 32 fixture-setup errors. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframes boxlite-ai#563 around the SDK contract: Go SDK (sdks/go/exec.go:266-275) documents that on_exit is always registered AFTER on_stdout/on_stderr, specifically so the Exit dispatch can safely free shared state (the cgo.Handle) without racing stream callbacks. With that contract, register_exit sees all stream pumps already registered, and the existing `stream_done_rx: Vec<oneshot::Receiver<()>>` snapshot-and-await pattern correctly orders Stdout/Stderr* before Exit. The bug wasn't in that path. It was that `execution_wait` ran as an INDEPENDENT terminal task pushing Wait — with no stream drain — so the wait gRPC reply could land in the queue ahead of still-flushing chunks (`box.Exec` in the Go SDK returned with empty stdout when the user observed Wait before the Stream callbacks). This change makes exit_pump the SOLE owner of terminal-event dispatch: ExecutionHandle: + pending_waits: Arc<Mutex<Vec<(CExecutionWaitFn, usize)>>> + exit_pump_spawned: Arc<AtomicBool> - (no watch / Notify / AtomicI32 wakeup primitive) spawn_exit_pump_if_needed: First of {register_exit, execution_wait} to call wins the compare_exchange and spawns exit_pump; subsequent callers just register their cb into exit_dispatch / pending_waits. execution_wait (no longer spawns its own task): Hold pending_waits lock while reading exit_dispatched: - already true → push Wait directly (Exit already drained) - false → append to pending_waits, ensure exit_pump spawned exit_pump (single task per execution): 1. wait_on_clone(process) 2. drain stream_done_rx (snapshot via mem::take) 3. set process_completed (only on Ok wait — preserves existing contract that an Err wait doesn't prove process exit) 4. claim exit_dispatched, push Exit if exit_dispatch registered 5. take pending_waits, push Wait per registration (each Wait gets its own wait_on_clone — same shared execution, already exited, returns immediately) Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task, so the queue's push order IS the dispatch order. No inter-task synchronization, no watch::Sender, no Notify, no register-then-check footgun. Late `boxlite_execution_wait` races the `exit_dispatched=true` flag: under pending_waits lock, either the late caller is part of the take()'d snapshot OR observes the flag and spawns its own direct-push task. Never lost. Test plan --------- - `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass - Two-side e2e verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel against `make test:e2e:setup` infra from boxlite-ai#678): Side A — revert sdks/c/src/exec/execution.rs to upstream/main (oneshot vec for Exit path, execution_wait as separate task with no drain): test_p0_6_exec_stdout_race: FAIL 70% loss rate test_563...complete_stdout_sequence: FAIL truncated stdout Side B — apply this PR: test_p0_6_exec_stdout_race: PASS test_563...complete_stdout_sequence: PASS Adds scripts/test/e2e/cases/test_563_exec_drain_barrier.py as a deterministic single-shot regression (seq 1 200), complementary to test_p0_6_exec_stdout_race's statistical (ROUNDS×2 execs) coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframes boxlite-ai#563 around the SDK contract: Go SDK (sdks/go/exec.go:266-275) documents that on_exit is always registered AFTER on_stdout/on_stderr, specifically so the Exit dispatch can safely free shared state (the cgo.Handle) without racing stream callbacks. With that contract, register_exit sees all stream pumps already registered, and the existing `stream_done_rx: Vec<oneshot::Receiver<()>>` snapshot-and-await pattern correctly orders Stdout/Stderr* before Exit. The bug wasn't in that path. It was that `execution_wait` ran as an INDEPENDENT terminal task pushing Wait — with no stream drain — so the wait gRPC reply could land in the queue ahead of still-flushing chunks (`box.Exec` in the Go SDK returned with empty stdout when the user observed Wait before the Stream callbacks). This change makes exit_pump the SOLE owner of terminal-event dispatch: ExecutionHandle: + pending_waits: Arc<Mutex<Vec<(CExecutionWaitFn, usize)>>> + exit_pump_spawned: Arc<AtomicBool> - (no watch / Notify / AtomicI32 wakeup primitive) spawn_exit_pump_if_needed: First of {register_exit, execution_wait} to call wins the compare_exchange and spawns exit_pump; subsequent callers just register their cb into exit_dispatch / pending_waits. execution_wait (no longer spawns its own task): Hold pending_waits lock while reading exit_dispatched: - already true → push Wait directly (Exit already drained) - false → append to pending_waits, ensure exit_pump spawned exit_pump (single task per execution): 1. wait_on_clone(process) 2. drain stream_done_rx (snapshot via mem::take) 3. set process_completed (only on Ok wait — preserves existing contract that an Err wait doesn't prove process exit) 4. claim exit_dispatched, push Exit if exit_dispatch registered 5. take pending_waits, push Wait per registration (each Wait gets its own wait_on_clone — same shared execution, already exited, returns immediately) Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task, so the queue's push order IS the dispatch order. No inter-task synchronization, no watch::Sender, no Notify, no register-then-check footgun. Late `boxlite_execution_wait` races the `exit_dispatched=true` flag: under pending_waits lock, either the late caller is part of the take()'d snapshot OR observes the flag and spawns its own direct-push task. Never lost. Test plan --------- - `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass - Two-side e2e verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel against `make test:e2e:setup` infra from boxlite-ai#678): Side A — revert sdks/c/src/exec/execution.rs to upstream/main (oneshot vec for Exit path, execution_wait as separate task with no drain): test_p0_6_exec_stdout_race: FAIL 70% loss rate test_563...complete_stdout_sequence: FAIL truncated stdout Side B — apply this PR: test_p0_6_exec_stdout_race: PASS test_563...complete_stdout_sequence: PASS Adds scripts/test/e2e/cases/test_563_exec_drain_barrier.py as a deterministic single-shot regression (seq 1 200), complementary to test_p0_6_exec_stdout_race's statistical (ROUNDS×2 execs) coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Go SDK's box.Exec returns with stdout chunks still in flight — the
user's stream callbacks fire after the synchronous wait already
returned, so box.Exec hands the caller a result with truncated stdout.
Pre-fix, two terminal-event paths existed, but only ONE drained streams:
exit_pump → drains stream_done_rx, then push Exit ✅
execution_wait → wait_on_clone + push Wait, no drain ❌
The Exit path was already correctly ordered behind every Stdout/Stderr
event. The Wait path raced the still-flushing stream chunks: Wait lands
in the FFI queue ahead of late Stdout/Stderr, the Python SDK closes the
stdout iterator on Wait, the user's `drain(ex)` returns truncated.
Fix: consolidate. exit_pump becomes the sole owner of terminal-event
dispatch. execution_wait stops spawning its own task and instead
appends (cb, user_data) to a pending_waits vec that exit_pump iterates
AFTER pushing Exit:
[register_exit / execution_wait — first caller spawns exit_pump]
register_exit: set exit_dispatch slot
execution_wait: append to pending_waits (or push direct if
exit_dispatched=true)
[exit_pump — single task per execution]
1. wait_on_clone(process)
2. drain stream_done_rx (existing oneshot vec)
3. claim exit_dispatched, push Exit if exit_dispatch registered
4. take pending_waits, push Wait per registration
Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task,
so push order IS dispatch order. No inter-task synchronization, no
watch / Notify / AtomicI32 waker primitive.
The on_exit registration contract (Go SDK `sdks/go/exec.go:266-275`:
on_exit always registered AFTER on_stdout/on_stderr) means
register_exit sees every stream pump's done_rx already in the vec.
We didn't have to invent the Exit ordering — it already worked. Just
needed to route Wait through the same path.
Late `boxlite_execution_wait` race: caller holds pending_waits lock
while reading exit_dispatched. Either it lands in exit_pump's
mem::take snapshot, OR observes the flag and spawns a direct-push
task — never lost.
What changed beyond the Wait loop itself
----------------------------------------
- pending_waits, exit_pump_spawned: new fields
- spawn_exit_pump_if_needed helper: single spawn whether register_exit
or execution_wait arrives first
- stream_done_rx, exit_dispatch Arc-wrapped: exit_pump now lives
outside register_exit's scope, needs Arc clones of state
- Both constructors initialise the 2 new fields
- register_exit simplified to set the dispatch slot + call helper
Test plan
---------
- `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass
- Two-side verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel)
against the existing boxlite-ai#678 regression suite case
`scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py` (which boxlite-ai#678
already cites boxlite-ai#563 in its docstring):
Side A — revert sdks/c/src/exec/execution.rs to upstream/main:
test_p0_6_exec_stdout_race: FAIL 70% loss rate
Side B — apply this PR:
test_p0_6_exec_stdout_race: PASS
- Full `make test:e2e` against this PR's runner: 27 passed / 2 failed
/ 4 skipped. The 2 fails are unrelated (test_exec_on_stopped_box —
fixed by PR boxlite-ai#684 runner error mapping; test_node_sdk_create_exec_remove
— local Node SDK build setup gap, not FFI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the existing local-FFI integration tests can't reach. Cases shipped with xfail(strict=True) markers for the known production bugs they pin — a follow-up fix-PR flipping a bug closed will trip the strict marker into a CI failure, surfacing the fact that the xfail can now be dropped. Files added: scripts/test/e2e/cases/test_error_code_mapping.py scripts/test/e2e/cases/test_quota_enforcement.py scripts/test/e2e/cases/test_runner_concurrency.py Replaces an earlier split-out attempt that stacked on a reorganize PR (boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn:
exec on a removed box -> 500 "spawn_failed: build failed"
exec on a stopped box -> 500 (same shape)
exec on box mid-rebuild -> 500 (same shape)
Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a
generic 500 without consulting the SDK's typed error helpers
(IsNotFound, IsStopped, IsInvalidState). Other handlers in this file
(signal/resize/kill/status) already classify these; POST /exec was
the one that got missed.
Fix: extend classifyExecError to recognise the SDK's typed errors and
map them to 404 (NotFound), 409 (Stopped, InvalidState) per the
canonical mapping at src/shared/src/errors.rs.
Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes
from 500 -> 4xx after this change. Re-validation: stack on top of
boxlite-ai#683 (which adds test_runner_concurrency.py) for the
exec-after-soft-deleted-box variant.
Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Go SDK's box.Exec would occasionally return with stdout chunks still in flight — the user's callback got OnExit (or the wait gRPC reply) before the matching OnStdout/OnStderr chunks landed on the queue. From the caller's perspective, the exec had finished but its stdout was silently truncated. Root cause: execution_wait spawned an independent terminal task that pushed the Wait event as soon as wait_on_clone returned — with no drain barrier — so the wait reply could land in the event queue ahead of still-flushing stream pumps. Fix: make exit_pump the sole owner of terminal-event dispatch. Both execution_wait and register_exit fan into it; exit_pump awaits all stream pump receivers before pushing the terminal event. Queue order becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task. Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70% stdout-loss to 0%. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on #678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on #678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds
scripts/test/e2e/— a regression suite that exercises the SDK → API → boxlite-runner → libkrun VM chain, complementingmake test:integrationwhich runs only against the local PyO3/FFI path and therefore can't catch bugs that live in the REST proxy, runner CGO, or API error mapping.Test plan
make test:integration:*constructsBoxlite.default()(local FFI), never opens an HTTP connection, never reaches the NestJS API orboxlite-runnermake test:e2eruns 25 pytest cases, every box created viaBoxlite.rest(opts)againsthttp://localhost:3000/apimake test:integration:pythongreen even on stock 0.9.5 runnertest_p0_6_exec_stdout_racedeterministically FAILS at ≥90% loss against stock runner, PASS at 0% against a runner rebuilt with #563verify_runner_saw_all_boxessnapshotsjournalctl -u boxlite-runnerper-test and asserts every box id created viart.createappears in the runner journal; sanity-verified by injecting a fake box id which fires the teardown assertion with a clear path-bypass messagescripts/test/e2e/bootstrap.shis idempotent and self-contained — installs postgres, redis, docker registry, Node.js 22 (NodeSource), yarn (corepack), python3-pip, the boxlite-runner release binary, systemd units, and writes/etc/boxlite-api.env. No AWS credentials, no S3 bucket, no IAM role required —S3_ENDPOINT=""makesVolumeManager.constructorearly-return atapps/api/src/sandbox/managers/volume.manager.ts:47make test:e2e:setup(bootstrap + fixture data) andmake test:e2e(pytest);make test:e2e:two-sidedtakesPR_REF=<branch>and verifies a candidate fix removes the bug.github/workflows/e2e-stack.ymltriggers on push/PR touchingsdks/c/src/exec/**,src/boxlite/src/rest/**,apps/runner/**,apps/api/src/**; needs a self-hosted runner labeledkvmbecause GitHub-hosted runners have no/dev/kvmCases shipped
test_path_verification.py(2 cases):3000and the runner journal saw the box idtest_p0_6_exec_stdout_race.py(1 case, 5 box × 2 exec rounds)make test:integration:*portssrc/boxlite/tests/{lifecycle,exec_options,execution_shutdown,shutdown}.rs+sdks/python/tests/{test_exec,test_exec_timeout_sigalrm,test_box_management,test_resize_tty,test_errors}.py— all rewritten to useBoxlite.rest()Local results (this branch checked out clean against current main)
test_p0_6_exec_stdout_race(race) +test_exec_on_stopped_box_is_typed_error(real API/runner mapping bug, see body below)test_exec_on_stopped_box_is_typed_error— the lone FAIL is realThis case asserts that POST
/execon a stopped box returns a 4xx-class status, not 5xx. The current runner code atapps/runner/pkg/api/controllers/boxlite_exec.go:77writeshttp.StatusInternalServerErrorunconditionally whenexecManager.Startfails:Every other handler in the same file (signal / resize / kill / status) routes through
classifyExecError(err)instead, which already mapsErrExecClosed/ErrExecReapingto 409.POST /execwas missed, andclassifyExecErroritself never gained a case forboxlite.ErrStopped. Rust sidesrc/shared/src/errors.rs:165already mapsBoxliteError::Stoppedto 409, so the bug is purely the Go runner not propagating that.This test stays failing until that two-line fix lands; flipping it to
xfailwould defeat the regression point.Branch state
main(a438d6b — origin/main HEAD)test(e2e):*, no production code touched)make/test.mk,make/help.mk)Summary by CodeRabbit
Release Notes