test(e2e): SDK→API→Runner→VM regression suite by G4614 · Pull Request #678 · boxlite-ai/boxlite

G4614 · 2026-06-08T12:29:17Z

Adds scripts/test/e2e/ — a regression suite that exercises the SDK → API → boxlite-runner → libkrun VM chain, complementing make test:integration which runs only against the local PyO3/FFI path and therefore can't catch bugs that live in the REST proxy, runner CGO, or API error mapping.

Test plan

observed	pre (without this PR)	post
coverage of REST → API → runner → VM chain	none — `make test:integration:*` constructs `Boxlite.default()` (local FFI), never opens an HTTP connection, never reaches the NestJS API or `boxlite-runner`	`make test:e2e` runs 25 pytest cases, every box created via `Boxlite.rest(opts)` against `http://localhost:3000/api`
#563 (exec-stdout drop) detection	regression invisible to CI; `make test:integration:python` green even on stock 0.9.5 runner	`test_p0_6_exec_stdout_race` deterministically FAILS at ≥90% loss against stock runner, PASS at 0% against a runner rebuilt with #563
path-bypass detection	tests could silently fall back to local FFI and still pass	autouse fixture `verify_runner_saw_all_boxes` snapshots `journalctl -u boxlite-runner` per-test and asserts every box id created via `rt.create` appears in the runner journal; sanity-verified by injecting a fake box id which fires the teardown assertion with a clear path-bypass message
bootstrap surface	not present; e2e wasn't ramped on	`scripts/test/e2e/bootstrap.sh` is idempotent and self-contained — installs postgres, redis, docker registry, Node.js 22 (NodeSource), yarn (corepack), python3-pip, the boxlite-runner release binary, systemd units, and writes `/etc/boxlite-api.env`. No AWS credentials, no S3 bucket, no IAM role required — `S3_ENDPOINT=""` makes `VolumeManager.constructor` early-return at `apps/api/src/sandbox/managers/volume.manager.ts:47`
local-dev ergonomics	tests scattered across personal repos	`make test:e2e:setup` (bootstrap + fixture data) and `make test:e2e` (pytest); `make test:e2e:two-sided` takes `PR_REF=<branch>` and verifies a candidate fix removes the bug
CI surface	none	`.github/workflows/e2e-stack.yml` triggers on push/PR touching `sdks/c/src/exec/`, `src/boxlite/src/rest/`, `apps/runner/`, `apps/api/src/`; needs a self-hosted runner labeled `kvm` because GitHub-hosted runners have no `/dev/kvm`

Cases shipped

group	files	source migrated from
meta path verification	`test_path_verification.py` (2 cases)	new — proves SDK is REST-mode against `:3000` and the runner journal saw the box id
#563 regression	`test_p0_6_exec_stdout_race.py` (1 case, 5 box × 2 exec rounds)	new — race reproducer
`make test:integration:*` ports	9 files, 22 cases total	`src/boxlite/tests/{lifecycle,exec_options,execution_shutdown,shutdown}.rs` + `sdks/python/tests/{test_exec,test_exec_timeout_sigalrm,test_box_management,test_resize_tty,test_errors}.py` — all rewritten to use `Boxlite.rest()`

Local results (this branch checked out clean against current main)

runner build	result
stock 0.9.5 release	23 PASS / 2 FAIL — `test_p0_6_exec_stdout_race` (race) + `test_exec_on_stopped_box_is_typed_error` (real API/runner mapping bug, see body below)
rebuilt with #563 cherry-picked	24 PASS / 1 FAIL — only the mapping bug remains

`test_exec_on_stopped_box_is_typed_error` — the lone FAIL is real

This case asserts that POST /exec on a stopped box returns a 4xx-class status, not 5xx. The current runner code at apps/runner/pkg/api/controllers/boxlite_exec.go:77 writes http.StatusInternalServerError unconditionally when execManager.Start fails:

ctx.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("exec failed: %s", err)})

Every other handler in the same file (signal / resize / kill / status) routes through classifyExecError(err) instead, which already maps ErrExecClosed / ErrExecReaping to 409. POST /exec was missed, and classifyExecError itself never gained a case for boxlite.ErrStopped. Rust side src/shared/src/errors.rs:165 already maps BoxliteError::Stopped to 409, so the bug is purely the Go runner not propagating that.

This test stays failing until that two-line fix lands; flipping it to xfail would defeat the regression point.

Branch state

	value
base	`main` (`a438d6b` — origin/main HEAD)
commits	6 (all `test(e2e):*`, no production code touched)
files	13 new + 2 small additions (`make/test.mk`, `make/help.mk`)
net lines	+1170

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive end-to-end testing infrastructure supporting multiple SDKs (C, Go, Node, Python) across the full request chain (API → runner → VM).
- Introduced automated GitHub Actions E2E testing workflow on main branch changes.
- Added two-sided regression testing capability to validate fixes against main branch issues.

Existing make test:integration:* uses the local PyO3/FFI path (`Boxlite.default()`) and bypasses both the NestJS API and boxlite-runner. Bugs that surface only on the REST → API → runner chain — e.g. boxlite-ai#563's exec-stdout drop, boxlite-ai#627's attach re-drain — leak into production with the unit + integration suites all green. This suite always uses `Boxlite.rest(opts)` against the local API and proves the path on every run. Layout (scripts/test/e2e/): bootstrap.sh Install PG, Redis, registry, mountpoint-s3, runner binary, systemd units. Idempotent. fixture_setup.py Register alpine:3.23 + ubuntu:22.04 snapshots (waits for active), patch admin org quota, write [profiles.p1] in ~/.boxlite/credentials.toml. Idempotent. cases/conftest.py rt / image / box fixtures — REST-only. cases/test_path_verification.py Meta-test: asserts the runtime is REST-mode against :3000, and one round-trip exec leaves the runner journal with the box id. Catches the case where SDK silently degrades to local FFI or talks to the wrong endpoint. cases/test_p0_6_exec_stdout_race.py First real regression case — runs N boxes × 2 execs and asserts loss rate ≤ 5%. Against stock 0.9.5 runner: ~80–90% loss (FAIL). Against a runner rebuilt with boxlite-ai#563: 0% loss (PASS). lib/path_verification.py Helpers used by the meta-test. two_sided.sh Builds runner from MAIN_REF, runs pytest (expect FAIL=1), builds from PR_REF, runs pytest (expect PASS=0). Only verdict 1+0 is "PR removes the bug + test detects it". Restores original binary on any exit. Make targets (make/test.mk): make test:e2e:setup Bootstrap services + fixture data. make test:e2e Run the suite. make test:e2e:two-sided PR_REF=<branch> required. CI workflow (.github/workflows/e2e-stack.yml): Triggers on push/PR touching sdks/c/src/exec/, src/boxlite/src/rest/, apps/runner/, apps/api/src/. Requires a self-hosted runner labeled `kvm` because GitHub-hosted runners lack /dev/kvm and cannot run libkrun. Verified locally on alpine:3.23, py3.14, against both 0.9.5 stock runner (1 FAIL of 3) and a runner rebuilt from boxlite-ai#563 (3 PASS of 3).

Lifts 9 existing make test:integration test files (Rust + Python) into the e2e suite where they exercise the real production chain (SDK→API→Runner→VM) instead of the local PyO3/FFI path. Existing make test:integration uses Boxlite::default() / Boxlite.default(), which never touches NestJS API or boxlite-runner. Bugs in the proxy controller, REST wire format, or runner CGO won't break those tests even though they're broken in production. These ports run the same behavioural contracts through REST so the regression coverage tracks the actual production deployment. 9 new files in scripts/test/e2e/cases/: test_lifecycle.py — create / list / get_info / remove CRUD (from src/boxlite/tests/lifecycle.rs) test_exec_options.py — cwd, env, tty exit-code surface (from src/boxlite/tests/exec_options.rs) test_execution_shutdown.py — wait resolves after box.stop(); exec on stopped box must NOT 5xx (from src/boxlite/tests/execution_shutdown.rs) test_shutdown.py — close() idempotent; second REST runtime shares the same world view (from src/boxlite/tests/shutdown.rs) test_exec_unit_shape.py — round-trip exec result shape + types (from sdks/python/tests/test_exec.py) test_exec_timeout.py — timeout kills a sleeping process and a SIGTERM-ignoring process (from sdks/python/tests/test_exec_timeout_sigalrm.py) test_box_management.py — named box, list_info round-trip, env-var propagation client→guest (from sdks/python/tests/test_box_management.py) test_resize_tty.py — resize_tty(rows, cols) on TTY exec; refuse on non-TTY (from sdks/python/tests/test_resize_tty.py) test_errors.py — unknown image surfaces typed error; bad bearer token → 401/403, not 500 (from sdks/python/tests/test_errors.py) Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration (some Rust cases use it). Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure (test_exec_on_stopped_box_is_typed_error) is a real production bug — the API surfaces "Handle invalidated after stop()" with HTTP 500 + code=11 instead of mapping it to a typed 4xx. Test stays failing until that mapping fix lands; flipping to xfail would defeat the point of the regression coverage.

Wraps the session-scoped `rt` fixture in a `_TrackingRuntime` shim that records the box id of every successful `rt.create()` call. An autouse fixture snapshots the runner journal timestamp before each test and, after the test body, asserts every recorded box id appears in the runner journal at least once. If a test silently degrades to local FFI, talks to a wrong endpoint, or the API stops forwarding to the runner, the autouse assertion fires during teardown with the offending box id(s). Tests that don't create any boxes (e.g. `test_invalid_token_returns_401_not_500`, `test_sdk_runtime_is_rest_against_local_api`) are unaffected — the bucket stays empty, the check is a no-op. Sanity-verified: deliberately appending a fake box id to `rt._created` without an actual create call fails the teardown with the expected message, confirming the bypass detection has real teeth. Side effect: the meta-test `test_exec_reaches_runner_journal` is now redundant with the autouse fixture, but kept for clarity — it documents the contract explicitly rather than relying on shared fixture machinery.

The e2e suite exercises box lifecycle (create / exec / lifecycle); none of those paths touch S3 / volumes. The API's VolumeManager constructor short-circuits when S3_ENDPOINT is empty (see apps/api/src/sandbox/managers/volume.manager.ts:47), and the ObjectStorageService is only constructed on demand by the volume API which e2e doesn't exercise. So leaving S3_ENDPOINT="" cleanly disables the AWS surface. Before this commit, bootstrap.sh required: - AWS creds (aws login or static keys) - boxlite-dev-storagebucket-bzhbzvme to exist in account 064212132677 - Two uncommitted patches to volume.manager.ts / object-storage.service.ts so the S3Client constructor wouldn't `getOrThrow('s3.accessKey')` on empty values After this commit: - bootstrap.sh runs on any nested-KVM Ubuntu host - No AWS creds needed - No code patches needed — the early-return path is followed instead - 24/25 e2e tests pass identically (the lone failure is the test_exec_on_stopped_box_is_typed_error mapping bug, unchanged) Verified locally on alpine:3.23, ubuntu:22.04, ubuntu:24.04 with main HEAD checked out and no working-tree modifications.

…hosts Previous bootstrap.sh assumed node 22 + yarn + npx + pip were already installed on the host. Fresh Ubuntu 24/26 EC2s don't have any of those preinstalled — `yarn install` blew up immediately. Add: - python3-pip + ca-certificates + curl via apt - Node.js 22 via NodeSource if missing or older than 20 - yarn via corepack enable run.sh also split the pip install chain so a missing single package doesn't shadow the others. cargo + go intentionally NOT installed — they're only needed by test:e2e:two-sided (PR comparison flow), not the default make test:e2e. The two-sided README still documents them as prereqs. After this commit, a fresh nested-KVM Ubuntu 26 EC2 with git + sudo can run `make test:e2e:setup && make test:e2e` and reach the 24-pass / 1-fail baseline without any other preinstalls or configuration.

coderabbitai · 2026-06-08T12:29:27Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b3710f0c-79d3-48d6-b857-9ded5e50eddb

📥 Commits

Reviewing files that changed from the base of the PR and between 5e08966 and 9d98c0a.

📒 Files selected for processing (32)

.github/workflows/e2e-stack.yml
apps/package.json
make/help.mk
make/test.mk
scripts/test/e2e/.gitignore
scripts/test/e2e/README.md
scripts/test/e2e/bootstrap.sh
scripts/test/e2e/cases/conftest.py
scripts/test/e2e/cases/test_box_management.py
scripts/test/e2e/cases/test_c_entry.py
scripts/test/e2e/cases/test_cli_entry.py
scripts/test/e2e/cases/test_errors.py
scripts/test/e2e/cases/test_exec_options.py
scripts/test/e2e/cases/test_exec_timeout.py
scripts/test/e2e/cases/test_exec_unit_shape.py
scripts/test/e2e/cases/test_execution_shutdown.py
scripts/test/e2e/cases/test_go_entry.py
scripts/test/e2e/cases/test_lifecycle.py
scripts/test/e2e/cases/test_node_entry.py
scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py
scripts/test/e2e/cases/test_path_verification.py
scripts/test/e2e/cases/test_resize_tty.py
scripts/test/e2e/cases/test_shutdown.py
scripts/test/e2e/fixture_setup.py
scripts/test/e2e/lib/path_verification.py
scripts/test/e2e/pytest.ini
scripts/test/e2e/run.sh
scripts/test/e2e/sdks/c/e2e_basic.c
scripts/test/e2e/sdks/go/e2e_basic.go
scripts/test/e2e/sdks/node/e2e_basic.ts
scripts/test/e2e/teardown.sh
scripts/test/e2e/two_sided.sh

📝 Walkthrough

Walkthrough

This PR adds a complete end-to-end testing infrastructure for Boxlite. It includes a GitHub Actions workflow that targets KVM-capable self-hosted runners, shell scripts to bootstrap/teardown a local API and runner stack, Python fixtures and a pytest-based test harness, REST-mode validation tests, SDK entry-point smoke tests for multiple languages, and utilities for verifying the request path through the system.

Changes

End-to-end test infrastructure and SDK validation suite

Layer / File(s)	Summary
Build configuration and workflow integration `make/help.mk`, `make/test.mk`, `.github/workflows/e2e-stack.yml`, `apps/package.json`	Make targets for E2E bootstrap/test/two-sided, GitHub Actions workflow with KVM runner, maturin SDK rebuild, and node-forge dependency added.
Stack provisioning and lifecycle scripts `scripts/test/e2e/bootstrap.sh`, `scripts/test/e2e/teardown.sh`, `scripts/test/e2e/two_sided.sh`	Bootstrap provisions OS/database/Docker dependencies, builds toolchains, compiles runner, creates/starts systemd services. Teardown reverses provisioning with optional data wipe. Two-sided validates regressions via runner rebuild from two refs.
Fixture setup and credential provisioning `scripts/test/e2e/fixture_setup.py`	Patches admin quotas, registers snapshots, discovers API path prefix, and writes credentials to `~/.boxlite/credentials.toml` for test consumption.
Test harness, fixtures, and path verification helpers `scripts/test/e2e/pytest.ini`, `scripts/test/e2e/cases/conftest.py`, `scripts/test/e2e/lib/path_verification.py`, `scripts/test/e2e/.gitignore`	Pytest config with asyncio and e2e markers, fixtures for REST runtime (`rt`), per-test box creation, stream draining, and autouse box-tracking/journal verification. Path helpers inspect API logs and runner systemd journal for request passage evidence.
REST API and lifecycle tests `scripts/test/e2e/cases/test_box_management.py`, `scripts/test/e2e/cases/test_lifecycle.py`, `scripts/test/e2e/cases/test_shutdown.py`, `scripts/test/e2e/cases/test_path_verification.py`	Tests cover box creation/naming, list/get/remove operations, runtime multi-instance sharing, and validate the SDK runtime points to the local API and that exec reaches the runner journal.
Execution features and timeout tests `scripts/test/e2e/cases/test_exec_options.py`, `scripts/test/e2e/cases/test_exec_timeout.py`, `scripts/test/e2e/cases/test_exec_unit_shape.py`, `scripts/test/e2e/cases/test_execution_shutdown.py`, `scripts/test/e2e/cases/test_resize_tty.py`	Tests validate working directory, environment variables, TTY mode, timeout enforcement with signal escalation, output stream collection, exec result shape, and shutdown semantics.
Error handling and regression tests `scripts/test/e2e/cases/test_errors.py`, `scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py`	Tests assert error typing (non-5xx for client errors, explanatory messages), invalid token rejection, and P0-6 stdout-drop race with configurable loss-rate tolerance.
SDK entry-point smoke tests `scripts/test/e2e/cases/test_c_entry.py`, `scripts/test/e2e/cases/test_go_entry.py`, `scripts/test/e2e/cases/test_node_entry.py`, `scripts/test/e2e/sdks/c/e2e_basic.c`, `scripts/test/e2e/sdks/go/e2e_basic.go`, `scripts/test/e2e/sdks/node/e2e_basic.ts`	Language-specific smoke drivers: C compiles against libboxlite with create/remove round-trip, Go reads env config and validates box/exec with journal verification, Node runs via tsx with OK marker confirmation.
Test runner and documentation `scripts/test/e2e/run.sh`, `scripts/test/e2e/README.md`	Runner script conditionally bootstraps, runs fixture setup, auto-installs test dependencies, and executes pytest. README documents suite purpose, prerequisites, running/teardown commands, and case addition workflow.

🎯 4 (Complex) | ⏱️ ~45 minutes

A rabbit's ode to the E2E trail,
Where bootstrap meets fixture, and failures tell tales,
🐰 runs boxes through API and runner—
Path verified, journals singing, each test a winner!

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)

scripts/test/e2e/sdks/c/e2e_basic.c

scripts/test/e2e/sdks/c/e2e_basic.c:14:10: fatal error: 'boxlite.h' file not found
14 | #include "boxlite.h"
| ^~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/9d98c0a81ae6d1e0e0b62b398c9977ca3fc852de-976c439587b05d25/tmp/clang_command_.tmp.1a38f6.txt
++Contents of '/tmp/coderabbit-infer/9d98c0a81ae6d1e0e0b62b398c9977ca3fc852de-976c439587b05d25/tmp/clang_command_.tmp.1a38f6.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "-disable-free"

... [truncated 714 characters] ...

x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/976c439587b05d25/file.o" "-x" "c"
"scripts/test/e2e/sdks/c/e2e_basic.c" "-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Adds 4 cases under cases/test_cli_entry.py that exercise the same SDK→API→Runner→VM chain through the `boxlite` CLI binary instead of the Python SDK. The CLI shares the Rust `boxlite::rest` client with the Python SDK at the bottom layer, but adds CLI-only surface (argparse, exit-code propagation, table output, auth login) that nothing else in the e2e suite exercises. Cases: test_cli_whoami_against_local_api `boxlite auth whoami` returns the expected admin profile and URL — proves the CLI is talking to the same local API the Python SDK uses. test_cli_ls_returns_table `boxlite ls` succeeds and emits table-shaped output even with zero boxes. test_cli_run_exec_chain Full lifecycle via CLI: `boxlite run -d <image>`, `boxlite exec <id> -- echo`, `boxlite rm -f <id>`. Asserts stdout capture, list visibility, AND the runner journal contains the new box id. The journal check is the CLI-specific path-bypass guard (the autouse fixture only watches `Boxlite.rest._created`; CLI subprocesses are invisible to it). test_cli_exec_exit_code_propagates Non-zero exit inside the box round-trips back through the CLI's own exit code. CLI-specific because argv-to-exit-code mapping is not on the SDK path. Locally: 4/4 pass; total suite is now 28 / 29 (the single failure is still the runner Stopped → 500 mapping bug — unchanged).

Adds 3 new entry-point cases (1 each for Go, Node, C SDK), each routing through SDK → API → Runner → libkrun VM via the local API. Combined with the existing Python SDK + CLI entries, this covers all 5 binding layers above the shared Rust REST client: - Python SDK (PyO3 + asyncio bridge) - CLI binary (subprocess + table output) - Node SDK (napi-rs + JS Promise marshalling) ← new - Go SDK (CGO + context + slice conversion) ← new - C SDK (raw C ABI + callback drain pump) ← new Each binding layer has its own marshalling / serialization / async-glue code path, so a regression in any one of them (e.g. a napi enum that forgets to add a variant, a Go BoxOption that drops a field, a missing pthread join) is invisible to the others. Notes per SDK: Go: builds against sdks/go in the workspace; runs against target/release/libboxlite.so via LD_LIBRARY_PATH. Full create → exec → remove path, asserts captured stdout + exit code + runner journal hit. Node: imports from the LOCAL sdks/node build (the repo root has a stale @boxlite-ai/boxlite 0.9.5 install with field-name glitches — BoxliteRestOptions stores `prefix` but JsBoxlite.rest reads `pathPrefix`; importing from `../../../../../sdks/node` skips the cache). Smoke is create + remove only — exec stream draining is covered by Python / Go / CLI. C: pthread-synchronized create_box callback + dedicated boxlite_runtime_drain pump thread (callbacks don't fire without it — same model as sdks/go's ensureDrainRunning). Smoke is also create + remove only; exec would add 80+ lines of stream-pump glue that doesn't change what the REST chain proves. All three drivers read connection settings from env (BOXLITE_E2E_URL / API_KEY / PREFIX / IMAGE) so pytest fixtures can swap them per-test without rebuilding the native binary. Path-bypass guarantee: each new test snapshots journalctl -u boxlite-runner before the run and asserts the runner journal contains the box id afterwards. The autouse Python guard cannot see CLI / native subprocesses (it only watches the Python `rt._created` bucket), so each new entry test does the verification explicitly. Local results: 31 PASS / 1 FAIL — the lone failure is unchanged (test_exec_on_stopped_box_is_typed_error, Go runner Stopped → 500 mapping bug at apps/runner/pkg/api/controllers/boxlite_exec.go:77).

The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz from GitHub Releases, pinning the runner to whatever was last cut. That meant the e2e suite was permanently testing stale code: - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the boxlite_exec.go:77 mapping fix this suite catches) would never surface in CI — the release binary still has the old behaviour - test:e2e:two-sided was the only way to actually validate a PR, and it relied on side-channel build steps Now bootstrap.sh: 1. Installs rustup + Go if missing (skips otherwise) 2. cargo build --release -p boxlite-c (produces libboxlite.a) 3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner cargo + go both do incremental builds, so re-running bootstrap on a clean working tree finishes in seconds; the cold first build is ~5 min (the Rust release build dominates). The CI workflow no longer needs the separate "Rebuild boxlite-runner binary from this checkout" step — bootstrap covers it. Left a restart step in so a no-op bootstrap still picks up other env-file drift. Side effect of this change: the e2e suite against main HEAD now FAILs 7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke case, not just test_p0_6). This is the correct signal — main currently ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563 flips all 7 (plus the original test_p0_6) green.

…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).

…port (#679) ## Problem After enabling the merge queue, `Test (conclusion)` and `Lint (conclusion)` were made **required** status checks. But `test.yml` / `lint.yml` still had a `paths:` filter on the **`pull_request`** trigger, so a PR not touching those paths never runs the workflow — and the required conclusion check stays **Pending**, blocking the PR from merging. Observed live on #678 (changed only `scripts/test/e2e/**` + `make/**` → `test.yml` never ran → `Test (conclusion)` missing). Confirmed by [GitHub docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/troubleshooting-required-status-checks): *"a workflow skipped due to path filtering... checks... remain 'Pending'... blocked from merging."* ## Fix Remove the `paths:` filter from the `pull_request` trigger on both workflows so they run on **every** PR (matches rust-analyzer/bevy, which have no top-level paths on their gating workflows). Nothing else changes: - Per-suite filtering still happens in the `changes` job (heavy jobs skip when unaffected). - The conclusion job reports `success`-or-`skipped`, so it always reports green for unrelated PRs. - `push:` paths are kept as-is (irrelevant to PR gating). Cost: ~3 small jobs (config/changes/conclusion) per workflow per PR. ## Note PRs opened before #677 merged still lack the conclusion jobs entirely and need a rebase onto `main` to pick them up — separate from this fix.  ## Summary by CodeRabbit * **Chores** * Updated CI/CD workflows to ensure status checks remain responsive on all pull requests targeting main, preventing them from getting stuck in a pending state.

…ustup defer Three reviewer fixes around build-tooling install: 🔴 make setup:build before cargo build Fresh Ubuntu doesn't ship meson / ninja / build-essential, but `cargo build -p boxlite-c` shells out to them via the bubblewrap-sys / e2fsprogs-sys / libkrun-sys crates. Without the canonical `make setup:build` (routes to scripts/setup/setup-ubuntu.sh), bootstrap died on `meson: command not found` on every fresh host. Now invoked before the C SDK build. Also swap the raw cargo call for `make dist:c` so the staged sdks/c/dist/ output is in place for any future tooling that expects it. 🟡 Go version from apps/runner/go.mod Was hardcoded GO_VER=1.23.4. The runner module requires 1.25.4; the old pin lagged the workspace and `make setup:build` correctly rejected it. Now reads: GO_VER=$(awk '/^go [0-9]/ {print $2; exit}' apps/runner/go.mod) Subtle gotcha: Go 1.21+ auto-downloads a matching toolchain to ~/go/pkg/mod/golang.org/toolchain@... when a go.mod requires a newer version, and `go version` then reports the auto-toolchain version, not the system Go install. `make setup:build` and most other subprocesses see the system Go. So we read the truthful system version with: cd /tmp && GOTOOLCHAIN=local go version before deciding whether to install. Without this, bootstrap would see "1.25.4" (auto-toolchain) and skip the install, leaving setup:build still failing on the system 1.23.4. 🟡 rustup defers to rust-toolchain.toml Was `sh -s -- -y --default-toolchain stable`. The repo's `rust-toolchain.toml` already says channel = "stable", so the rustup install pre-downloads stable, then the first cargo call re-resolves the toolchain anyway (one wasted round-trip). Now: sh -s -- -y --no-modify-path --profile minimal --default-toolchain none rustup installs without a default; the first `cargo build` reads the toolchain file and pulls whatever it asks for. Smaller install, less duplication, automatically tracks the toml if it ever pins a specific version.

…guns Addresses 11 review points on PR boxlite-ai#678: 1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens), persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned). Read back on every re-bootstrap so DB-encrypted data and the existing admin user keep working. 2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0. 3. `yarn install` no longer swallows stderr — silent install failures were the reviewer's "挂了不知道为啥" pain. 4. Runner health check is now a real wait loop (`pgrep` of the binary path + ss listening on :8080) instead of `sleep 3`. API already poll /api/health. 5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin key. Catches encryption-key mismatch, broken migrations, stale admin user / new-token mismatch BEFORE the e2e suite runs (with a helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch). 7. tslib + node-forge are now real `apps/package.json` deps (committed here). Bootstrap no longer yarn-adds them, so it no longer dirties the working tree mid-run. 9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves secrets via the secrets file). Previous behaviour only wrote when missing, so PRs that added env vars never landed on existing hosts. 10. New teardown.sh — basic / --wipe-data / --full modes that revert the bootstrap state cleanly. 11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr instead of being silent. 12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented in-line — Google is a fetchable .well-known endpoint; e2e auth goes through the admin API key path and never touches Google. 13. REPO autodetects from the script's own location (no more $HOME/ws/boxlite default that only worked on one machine). Two reviewer concerns were left as-is with comments: 6. apps/apps self-symlink — the real fix is to rewrite the `apps/api/...` paths in every `project.json` to `api/...`. That's an apps-workspace refactor PR, not e2e infra. 8. API runs via npx ts-node, not webpack bundle — production parity diverges only on TypeORM entity discovery (the cron `databaseName undefined` warning), which doesn't intersect any code path e2e exercises. Webpack bundle would add 5–10 min per PR. Side change to fixture_setup.py: now also writes the `default` profile to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests hit the default profile when running `boxlite auth whoami` etc, so a re-bootstrap with a new admin key needs to update both or CLI tests 401. Verified end-to-end: teardown --wipe-data → fresh bootstrap → fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug — both real production bugs the suite is specifically here to catch).

PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (no test-logic change): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Pure rename + supporting wiring. Zero new test logic, zero source changes. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a fresh location that doesn't match the repo's existing per-SDK test layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/, sdks/c/tests/, sdks/go/*_test.go). This PR moves the cases to match. Driver scripts stay in scripts/test/e2e/ (cross-SDK infra); test files now live next to the SDK they exercise: scripts/test/e2e/cases/test_c_entry.py → sdks/c/tests/e2e/ scripts/test/e2e/cases/test_go_entry.py → sdks/go/tests/e2e/ scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/ scripts/test/e2e/cases/test_cli_entry.py → src/cli/tests/e2e/ scripts/test/e2e/cases/test_*.py (Python pytest) → sdks/python/tests/e2e/ scripts/test/e2e/cases/conftest.py → sdks/python/tests/e2e/ scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts} → sdks/{c,go,node}/tests/e2e/ Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh, two_sided.sh, fixture_setup.py, pytest.ini, README.md. Supporting wiring (forced by the renames): - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args (testpaths in pytest.ini can't resolve outside its rootdir) - pytest.ini: comment explaining the testpaths omission - .github/workflows/e2e-stack.yml: paths filter expanded to all new e2e locations - README.md: layout map updated Two collisions the rename exposed (both fixed in this PR): 1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/ and would import the new e2e/ subdir, whose conftest needs Python 3.11+ tomllib and a live API stack the unit-test workflow doesn't provide. Fix: norecursedirs = e2e in sdks/python/pytest.ini. The e2e runner passes the dir explicitly on the CLI which bypasses norecursedirs. 2. pytest's default --import-mode=prepend puts each test file's rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the rootpath becomes sdks/python/, which then shadows the installed boxlite wheel with the source-tree stub package (sdks/python/boxlite/__init__.py — missing the compiled .so so Rust-bound symbols like BoxliteRestOptions are absent). Every fixture using boxlite.BoxliteRestOptions failed at setup with AttributeError. Fix: --import-mode=importlib in scripts/test/e2e/pytest.ini (modern mode, no sys.path injection). Tests that previously did `from conftest import drain` couldn't keep doing that under importlib mode (conftest no longer on sys.path) — extracted drain/collect_stream/ stdout_line_count from conftest into scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification which already used the same lib/ + sys.path.insert pattern), and updated the 8 test files' imports. Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed (the 7 are real production bugs the suite is designed to catch: PR-C fixes runner exec error mapping, PR-D fixes FFI drain race). Before this PR's collision fix: 0 passed, 32 fixture-setup errors. Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2 fixes; this split-out covers reorganize only — the rest ships in follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reframes boxlite-ai#563 around the SDK contract: Go SDK (sdks/go/exec.go:266-275) documents that on_exit is always registered AFTER on_stdout/on_stderr, specifically so the Exit dispatch can safely free shared state (the cgo.Handle) without racing stream callbacks. With that contract, register_exit sees all stream pumps already registered, and the existing `stream_done_rx: Vec<oneshot::Receiver<()>>` snapshot-and-await pattern correctly orders Stdout/Stderr* before Exit. The bug wasn't in that path. It was that `execution_wait` ran as an INDEPENDENT terminal task pushing Wait — with no stream drain — so the wait gRPC reply could land in the queue ahead of still-flushing chunks (`box.Exec` in the Go SDK returned with empty stdout when the user observed Wait before the Stream callbacks). This change makes exit_pump the SOLE owner of terminal-event dispatch: ExecutionHandle: + pending_waits: Arc<Mutex<Vec<(CExecutionWaitFn, usize)>>> + exit_pump_spawned: Arc<AtomicBool> - (no watch / Notify / AtomicI32 wakeup primitive) spawn_exit_pump_if_needed: First of {register_exit, execution_wait} to call wins the compare_exchange and spawns exit_pump; subsequent callers just register their cb into exit_dispatch / pending_waits. execution_wait (no longer spawns its own task): Hold pending_waits lock while reading exit_dispatched: - already true → push Wait directly (Exit already drained) - false → append to pending_waits, ensure exit_pump spawned exit_pump (single task per execution): 1. wait_on_clone(process) 2. drain stream_done_rx (snapshot via mem::take) 3. set process_completed (only on Ok wait — preserves existing contract that an Err wait doesn't prove process exit) 4. claim exit_dispatched, push Exit if exit_dispatch registered 5. take pending_waits, push Wait per registration (each Wait gets its own wait_on_clone — same shared execution, already exited, returns immediately) Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task, so the queue's push order IS the dispatch order. No inter-task synchronization, no watch::Sender, no Notify, no register-then-check footgun. Late `boxlite_execution_wait` races the `exit_dispatched=true` flag: under pending_waits lock, either the late caller is part of the take()'d snapshot OR observes the flag and spawns its own direct-push task. Never lost. Test plan --------- - `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass - Two-side e2e verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel against `make test:e2e:setup` infra from boxlite-ai#678): Side A — revert sdks/c/src/exec/execution.rs to upstream/main (oneshot vec for Exit path, execution_wait as separate task with no drain): test_p0_6_exec_stdout_race: FAIL 70% loss rate test_563...complete_stdout_sequence: FAIL truncated stdout Side B — apply this PR: test_p0_6_exec_stdout_race: PASS test_563...complete_stdout_sequence: PASS Adds scripts/test/e2e/cases/test_563_exec_drain_barrier.py as a deterministic single-shot regression (seq 1 200), complementary to test_p0_6_exec_stdout_race's statistical (ROUNDS×2 execs) coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Go SDK's box.Exec returns with stdout chunks still in flight — the user's stream callbacks fire after the synchronous wait already returned, so box.Exec hands the caller a result with truncated stdout. Pre-fix, two terminal-event paths existed, but only ONE drained streams: exit_pump → drains stream_done_rx, then push Exit ✅ execution_wait → wait_on_clone + push Wait, no drain ❌ The Exit path was already correctly ordered behind every Stdout/Stderr event. The Wait path raced the still-flushing stream chunks: Wait lands in the FFI queue ahead of late Stdout/Stderr, the Python SDK closes the stdout iterator on Wait, the user's `drain(ex)` returns truncated. Fix: consolidate. exit_pump becomes the sole owner of terminal-event dispatch. execution_wait stops spawning its own task and instead appends (cb, user_data) to a pending_waits vec that exit_pump iterates AFTER pushing Exit: [register_exit / execution_wait — first caller spawns exit_pump] register_exit: set exit_dispatch slot execution_wait: append to pending_waits (or push direct if exit_dispatched=true) [exit_pump — single task per execution] 1. wait_on_clone(process) 2. drain stream_done_rx (existing oneshot vec) 3. claim exit_dispatched, push Exit if exit_dispatch registered 4. take pending_waits, push Wait per registration Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task, so push order IS dispatch order. No inter-task synchronization, no watch / Notify / AtomicI32 waker primitive. The on_exit registration contract (Go SDK `sdks/go/exec.go:266-275`: on_exit always registered AFTER on_stdout/on_stderr) means register_exit sees every stream pump's done_rx already in the vec. We didn't have to invent the Exit ordering — it already worked. Just needed to route Wait through the same path. Late `boxlite_execution_wait` race: caller holds pending_waits lock while reading exit_dispatched. Either it lands in exit_pump's mem::take snapshot, OR observes the flag and spawns a direct-push task — never lost. What changed beyond the Wait loop itself ---------------------------------------- - pending_waits, exit_pump_spawned: new fields - spawn_exit_pump_if_needed helper: single spawn whether register_exit or execution_wait arrives first - stream_done_rx, exit_dispatch Arc-wrapped: exit_pump now lives outside register_exit's scope, needs Arc clones of state - Both constructors initialise the 2 new fields - register_exit simplified to set the dispatch slot + call helper Test plan --------- - `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass - Two-side verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel) against the existing boxlite-ai#678 regression suite case `scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py` (which boxlite-ai#678 already cites boxlite-ai#563 in its docstring): Side A — revert sdks/c/src/exec/execution.rs to upstream/main: test_p0_6_exec_stdout_race: FAIL 70% loss rate Side B — apply this PR: test_p0_6_exec_stdout_race: PASS - Full `make test:e2e` against this PR's runner: 27 passed / 2 failed / 4 skipped. The 2 fails are unrelated (test_exec_on_stopped_box — fixed by PR boxlite-ai#684 runner error mapping; test_node_sdk_create_exec_remove — local Node SDK build setup gap, not FFI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the existing local-FFI integration tests can't reach. Cases shipped with xfail(strict=True) markers for the known production bugs they pin — a follow-up fix-PR flipping a bug closed will trip the strict marker into a CI failure, surfacing the fact that the xfail can now be dropped. Files added: scripts/test/e2e/cases/test_error_code_mapping.py scripts/test/e2e/cases/test_quota_enforcement.py scripts/test/e2e/cases/test_runner_concurrency.py Replaces an earlier split-out attempt that stacked on a reorganize PR (boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the underlying box was already gone / stopped / in a non-runnable state by the time the runner tried to spawn: exec on a removed box -> 500 "spawn_failed: build failed" exec on a stopped box -> 500 (same shape) exec on box mid-rebuild -> 500 (same shape) Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a generic 500 without consulting the SDK's typed error helpers (IsNotFound, IsStopped, IsInvalidState). Other handlers in this file (signal/resize/kill/status) already classify these; POST /exec was the one that got missed. Fix: extend classifyExecError to recognise the SDK's typed errors and map them to 404 (NotFound), 409 (Stopped, InvalidState) per the canonical mapping at src/shared/src/errors.rs. Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes from 500 -> 4xx after this change. Re-validation: stack on top of boxlite-ai#683 (which adds test_runner_concurrency.py) for the exec-after-soft-deleted-box variant. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Go SDK's box.Exec would occasionally return with stdout chunks still in flight — the user's callback got OnExit (or the wait gRPC reply) before the matching OnStdout/OnStderr chunks landed on the queue. From the caller's perspective, the exec had finished but its stdout was silently truncated. Root cause: execution_wait spawned an independent terminal task that pushed the Wait event as soon as wait_on_clone returned — with no drain barrier — so the wait reply could land in the event queue ahead of still-flushing stream pumps. Fix: make exit_pump the sole owner of terminal-event dispatch. Both execution_wait and register_exit fan into it; exit_pump awaits all stream pump receivers before pushing the terminal event. Queue order becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task. Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70% stdout-loss to 0%. Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg (now abandoned). Branch rebuilt against current main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on boxlite-ai#678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #678 wires up an e2e suite but its workflow only triggers on PRs that touch specific paths. GitHub branch protection requires a check to ALWAYS report a result — a path-filtered workflow that doesn't fire leaves the required check pending forever, so the PR can't merge. Reshape so e2e can be a required check: 1. Drop `paths:` from the trigger — workflow always starts on every PR to main. Inside, a cheap `changes` job on a GitHub-hosted runner does the path-filter detection via dorny/paths-filter@v3. 2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min) stays gated on `changes.outputs.relevant == true`. PRs that don't touch relevant paths skip the kvm runner entirely — no contention cost. 3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses the conditional outcome into one stable check name `E2E gate (required)`. Reports success if e2e passed OR no relevant paths changed; fail if e2e ran and failed. Branch protection should be configured (separately, via repo Settings → Branches → main) to require the `E2E gate (required)` status check. That's a repo-settings action a maintainer needs to do once after this merges. Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes` + ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed. Cost on relevant PRs: same as before (full e2e on kvm runner). Stacks on #678 — this commit only makes sense once that PR's e2e-stack.yml is in main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

G4614 added 6 commits June 8, 2026 11:32

test(e2e): list new e2e targets in make help

6eca490

DorianZheng mentioned this pull request Jun 8, 2026

ci: run test/lint on every PR so required conclusion checks always report #679

Merged

G4614 added 3 commits June 8, 2026 13:12

DorianZheng marked this pull request as ready for review June 8, 2026 14:12

DorianZheng merged commit e4bbd0e into boxlite-ai:main Jun 8, 2026
19 of 21 checks passed

G4614 mentioned this pull request Jun 8, 2026

ci(e2e): make e2e suite a required merge gate (stacks on #678) #680

Closed

G4614 mentioned this pull request Jun 9, 2026

test(e2e): move test cases to per-SDK tests/ folders #682

Closed

This was referenced Jun 9, 2026

fix(ffi): drain stdout/stderr pumps before Wait terminal event #705

Closed

fix(ffi): drain stdout/stderr pumps before Wait terminal event (re-open of #685/#705) #714

Closed

coderabbitai Bot mentioned this pull request Jun 11, 2026

feat(box): curated 3-image box boot + #715 build-debt fixes #731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): SDK→API→Runner→VM regression suite#678

test(e2e): SDK→API→Runner→VM regression suite#678
DorianZheng merged 11 commits into
boxlite-ai:mainfrom
G4614:feat/e2e-test-suite

G4614 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

G4614 commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Cases shipped

Local results (this branch checked out clean against current main)

test_exec_on_stopped_box_is_typed_error — the lone FAIL is real

Branch state

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

G4614 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

`test_exec_on_stopped_box_is_typed_error` — the lone FAIL is real

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading