Skip to content

test(e2e): SDK→API→Runner→VM regression suite#678

Merged
DorianZheng merged 11 commits into
boxlite-ai:mainfrom
G4614:feat/e2e-test-suite
Jun 8, 2026
Merged

test(e2e): SDK→API→Runner→VM regression suite#678
DorianZheng merged 11 commits into
boxlite-ai:mainfrom
G4614:feat/e2e-test-suite

Conversation

@G4614

@G4614 G4614 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Adds scripts/test/e2e/ — a regression suite that exercises the SDK → API → boxlite-runner → libkrun VM chain, complementing make test:integration which runs only against the local PyO3/FFI path and therefore can't catch bugs that live in the REST proxy, runner CGO, or API error mapping.

Test plan

observed pre (without this PR) post
coverage of REST → API → runner → VM chain none — make test:integration:* constructs Boxlite.default() (local FFI), never opens an HTTP connection, never reaches the NestJS API or boxlite-runner make test:e2e runs 25 pytest cases, every box created via Boxlite.rest(opts) against http://localhost:3000/api
#563 (exec-stdout drop) detection regression invisible to CI; make test:integration:python green even on stock 0.9.5 runner test_p0_6_exec_stdout_race deterministically FAILS at ≥90% loss against stock runner, PASS at 0% against a runner rebuilt with #563
path-bypass detection tests could silently fall back to local FFI and still pass autouse fixture verify_runner_saw_all_boxes snapshots journalctl -u boxlite-runner per-test and asserts every box id created via rt.create appears in the runner journal; sanity-verified by injecting a fake box id which fires the teardown assertion with a clear path-bypass message
bootstrap surface not present; e2e wasn't ramped on scripts/test/e2e/bootstrap.sh is idempotent and self-contained — installs postgres, redis, docker registry, Node.js 22 (NodeSource), yarn (corepack), python3-pip, the boxlite-runner release binary, systemd units, and writes /etc/boxlite-api.env. No AWS credentials, no S3 bucket, no IAM role required — S3_ENDPOINT="" makes VolumeManager.constructor early-return at apps/api/src/sandbox/managers/volume.manager.ts:47
local-dev ergonomics tests scattered across personal repos make test:e2e:setup (bootstrap + fixture data) and make test:e2e (pytest); make test:e2e:two-sided takes PR_REF=<branch> and verifies a candidate fix removes the bug
CI surface none .github/workflows/e2e-stack.yml triggers on push/PR touching sdks/c/src/exec/**, src/boxlite/src/rest/**, apps/runner/**, apps/api/src/**; needs a self-hosted runner labeled kvm because GitHub-hosted runners have no /dev/kvm

Cases shipped

group files source migrated from
meta path verification test_path_verification.py (2 cases) new — proves SDK is REST-mode against :3000 and the runner journal saw the box id
#563 regression test_p0_6_exec_stdout_race.py (1 case, 5 box × 2 exec rounds) new — race reproducer
make test:integration:* ports 9 files, 22 cases total src/boxlite/tests/{lifecycle,exec_options,execution_shutdown,shutdown}.rs + sdks/python/tests/{test_exec,test_exec_timeout_sigalrm,test_box_management,test_resize_tty,test_errors}.py — all rewritten to use Boxlite.rest()

Local results (this branch checked out clean against current main)

runner build result
stock 0.9.5 release 23 PASS / 2 FAIL — test_p0_6_exec_stdout_race (race) + test_exec_on_stopped_box_is_typed_error (real API/runner mapping bug, see body below)
rebuilt with #563 cherry-picked 24 PASS / 1 FAIL — only the mapping bug remains

test_exec_on_stopped_box_is_typed_error — the lone FAIL is real

This case asserts that POST /exec on a stopped box returns a 4xx-class status, not 5xx. The current runner code at apps/runner/pkg/api/controllers/boxlite_exec.go:77 writes http.StatusInternalServerError unconditionally when execManager.Start fails:

ctx.JSON(http.StatusInternalServerError, gin.H{"error": fmt.Sprintf("exec failed: %s", err)})

Every other handler in the same file (signal / resize / kill / status) routes through classifyExecError(err) instead, which already maps ErrExecClosed / ErrExecReaping to 409. POST /exec was missed, and classifyExecError itself never gained a case for boxlite.ErrStopped. Rust side src/shared/src/errors.rs:165 already maps BoxliteError::Stopped to 409, so the bug is purely the Go runner not propagating that.

This test stays failing until that two-line fix lands; flipping it to xfail would defeat the regression point.

Branch state

value
base main (a438d6b — origin/main HEAD)
commits 6 (all test(e2e):*, no production code touched)
files 13 new + 2 small additions (make/test.mk, make/help.mk)
net lines +1170

Summary by CodeRabbit

Release Notes

  • New Features
    • Added comprehensive end-to-end testing infrastructure supporting multiple SDKs (C, Go, Node, Python) across the full request chain (API → runner → VM).
    • Introduced automated GitHub Actions E2E testing workflow on main branch changes.
    • Added two-sided regression testing capability to validate fixes against main branch issues.

G4614 added 6 commits June 8, 2026 11:32
Existing make test:integration:* uses the local PyO3/FFI path
(`Boxlite.default()`) and bypasses both the NestJS API and
boxlite-runner. Bugs that surface only on the REST → API → runner chain
— e.g. boxlite-ai#563's exec-stdout drop, boxlite-ai#627's attach re-drain — leak into
production with the unit + integration suites all green.

This suite always uses `Boxlite.rest(opts)` against the local API
and proves the path on every run.

Layout (scripts/test/e2e/):
  bootstrap.sh             Install PG, Redis, registry, mountpoint-s3,
                           runner binary, systemd units. Idempotent.
  fixture_setup.py         Register alpine:3.23 + ubuntu:22.04
                           snapshots (waits for active), patch admin
                           org quota, write [profiles.p1] in
                           ~/.boxlite/credentials.toml. Idempotent.
  cases/conftest.py        rt / image / box fixtures — REST-only.
  cases/test_path_verification.py
                           Meta-test: asserts the runtime is REST-mode
                           against :3000, and one round-trip exec
                           leaves the runner journal with the box id.
                           Catches the case where SDK silently
                           degrades to local FFI or talks to the wrong
                           endpoint.
  cases/test_p0_6_exec_stdout_race.py
                           First real regression case — runs N boxes
                           × 2 execs and asserts loss rate ≤ 5%.
                           Against stock 0.9.5 runner: ~80–90% loss
                           (FAIL). Against a runner rebuilt with boxlite-ai#563:
                           0% loss (PASS).
  lib/path_verification.py Helpers used by the meta-test.
  two_sided.sh             Builds runner from MAIN_REF, runs pytest
                           (expect FAIL=1), builds from PR_REF, runs
                           pytest (expect PASS=0). Only verdict 1+0 is
                           "PR removes the bug + test detects it".
                           Restores original binary on any exit.

Make targets (make/test.mk):
  make test:e2e:setup       Bootstrap services + fixture data.
  make test:e2e             Run the suite.
  make test:e2e:two-sided   PR_REF=<branch> required.

CI workflow (.github/workflows/e2e-stack.yml):
  Triggers on push/PR touching sdks/c/src/exec/, src/boxlite/src/rest/,
  apps/runner/, apps/api/src/. Requires a self-hosted runner labeled
  `kvm` because GitHub-hosted runners lack /dev/kvm and cannot run
  libkrun.

Verified locally on alpine:3.23, py3.14, against both 0.9.5 stock
runner (1 FAIL of 3) and a runner rebuilt from
boxlite-ai#563 (3 PASS of 3).
Lifts 9 existing make test:integration test files (Rust + Python) into
the e2e suite where they exercise the real production chain
(SDK→API→Runner→VM) instead of the local PyO3/FFI path.

Existing make test:integration uses Boxlite::default() / Boxlite.default(),
which never touches NestJS API or boxlite-runner. Bugs in the proxy
controller, REST wire format, or runner CGO won't break those tests
even though they're broken in production. These ports run the same
behavioural contracts through REST so the regression coverage tracks
the actual production deployment.

9 new files in scripts/test/e2e/cases/:

  test_lifecycle.py         — create / list / get_info / remove CRUD
                              (from src/boxlite/tests/lifecycle.rs)
  test_exec_options.py      — cwd, env, tty exit-code surface
                              (from src/boxlite/tests/exec_options.rs)
  test_execution_shutdown.py — wait resolves after box.stop();
                              exec on stopped box must NOT 5xx
                              (from src/boxlite/tests/execution_shutdown.rs)
  test_shutdown.py          — close() idempotent; second REST runtime
                              shares the same world view
                              (from src/boxlite/tests/shutdown.rs)
  test_exec_unit_shape.py   — round-trip exec result shape + types
                              (from sdks/python/tests/test_exec.py)
  test_exec_timeout.py      — timeout kills a sleeping process and a
                              SIGTERM-ignoring process
                              (from sdks/python/tests/test_exec_timeout_sigalrm.py)
  test_box_management.py    — named box, list_info round-trip, env-var
                              propagation client→guest
                              (from sdks/python/tests/test_box_management.py)
  test_resize_tty.py        — resize_tty(rows, cols) on TTY exec; refuse
                              on non-TTY
                              (from sdks/python/tests/test_resize_tty.py)
  test_errors.py            — unknown image surfaces typed error; bad
                              bearer token → 401/403, not 500
                              (from sdks/python/tests/test_errors.py)

Plus ubuntu:24.04 added to fixture_setup.py's snapshot registration
(some Rust cases use it).

Verified locally on alpine:3.23 + ubuntu:22.04 + ubuntu:24.04 with a
boxlite-ai#563-rebuilt runner: 24 PASS / 1 FAIL. The single failure
(test_exec_on_stopped_box_is_typed_error) is a real production bug —
the API surfaces "Handle invalidated after stop()" with HTTP 500 +
code=11 instead of mapping it to a typed 4xx. Test stays failing
until that mapping fix lands; flipping to xfail would defeat the
point of the regression coverage.
Wraps the session-scoped `rt` fixture in a `_TrackingRuntime` shim that
records the box id of every successful `rt.create()` call. An autouse
fixture snapshots the runner journal timestamp before each test and,
after the test body, asserts every recorded box id appears in the
runner journal at least once.

If a test silently degrades to local FFI, talks to a wrong endpoint, or
the API stops forwarding to the runner, the autouse assertion fires
during teardown with the offending box id(s). Tests that don't create
any boxes (e.g. `test_invalid_token_returns_401_not_500`,
`test_sdk_runtime_is_rest_against_local_api`) are unaffected — the
bucket stays empty, the check is a no-op.

Sanity-verified: deliberately appending a fake box id to `rt._created`
without an actual create call fails the teardown with the expected
message, confirming the bypass detection has real teeth.

Side effect: the meta-test `test_exec_reaches_runner_journal` is now
redundant with the autouse fixture, but kept for clarity — it
documents the contract explicitly rather than relying on shared
fixture machinery.
The e2e suite exercises box lifecycle (create / exec / lifecycle); none
of those paths touch S3 / volumes. The API's VolumeManager constructor
short-circuits when S3_ENDPOINT is empty (see
apps/api/src/sandbox/managers/volume.manager.ts:47), and the
ObjectStorageService is only constructed on demand by the volume API
which e2e doesn't exercise. So leaving S3_ENDPOINT="" cleanly disables
the AWS surface.

Before this commit, bootstrap.sh required:
  - AWS creds (aws login or static keys)
  - boxlite-dev-storagebucket-bzhbzvme to exist in account 064212132677
  - Two uncommitted patches to volume.manager.ts / object-storage.service.ts
    so the S3Client constructor wouldn't `getOrThrow('s3.accessKey')` on empty
    values

After this commit:
  - bootstrap.sh runs on any nested-KVM Ubuntu host
  - No AWS creds needed
  - No code patches needed — the early-return path is followed instead
  - 24/25 e2e tests pass identically (the lone failure is the
    test_exec_on_stopped_box_is_typed_error mapping bug, unchanged)

Verified locally on alpine:3.23, ubuntu:22.04, ubuntu:24.04 with main
HEAD checked out and no working-tree modifications.
…hosts

Previous bootstrap.sh assumed node 22 + yarn + npx + pip were already
installed on the host. Fresh Ubuntu 24/26 EC2s don't have any of those
preinstalled — `yarn install` blew up immediately. Add:

  - python3-pip + ca-certificates + curl via apt
  - Node.js 22 via NodeSource if missing or older than 20
  - yarn via corepack enable

run.sh also split the pip install chain so a missing single package
doesn't shadow the others.

cargo + go intentionally NOT installed — they're only needed by
test:e2e:two-sided (PR comparison flow), not the default
make test:e2e. The two-sided README still documents them as prereqs.

After this commit, a fresh nested-KVM Ubuntu 26 EC2 with git + sudo
can run `make test:e2e:setup && make test:e2e` and reach the 24-pass /
1-fail baseline without any other preinstalls or configuration.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b3710f0c-79d3-48d6-b857-9ded5e50eddb

📥 Commits

Reviewing files that changed from the base of the PR and between 5e08966 and 9d98c0a.

📒 Files selected for processing (32)
  • .github/workflows/e2e-stack.yml
  • apps/package.json
  • make/help.mk
  • make/test.mk
  • scripts/test/e2e/.gitignore
  • scripts/test/e2e/README.md
  • scripts/test/e2e/bootstrap.sh
  • scripts/test/e2e/cases/conftest.py
  • scripts/test/e2e/cases/test_box_management.py
  • scripts/test/e2e/cases/test_c_entry.py
  • scripts/test/e2e/cases/test_cli_entry.py
  • scripts/test/e2e/cases/test_errors.py
  • scripts/test/e2e/cases/test_exec_options.py
  • scripts/test/e2e/cases/test_exec_timeout.py
  • scripts/test/e2e/cases/test_exec_unit_shape.py
  • scripts/test/e2e/cases/test_execution_shutdown.py
  • scripts/test/e2e/cases/test_go_entry.py
  • scripts/test/e2e/cases/test_lifecycle.py
  • scripts/test/e2e/cases/test_node_entry.py
  • scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py
  • scripts/test/e2e/cases/test_path_verification.py
  • scripts/test/e2e/cases/test_resize_tty.py
  • scripts/test/e2e/cases/test_shutdown.py
  • scripts/test/e2e/fixture_setup.py
  • scripts/test/e2e/lib/path_verification.py
  • scripts/test/e2e/pytest.ini
  • scripts/test/e2e/run.sh
  • scripts/test/e2e/sdks/c/e2e_basic.c
  • scripts/test/e2e/sdks/go/e2e_basic.go
  • scripts/test/e2e/sdks/node/e2e_basic.ts
  • scripts/test/e2e/teardown.sh
  • scripts/test/e2e/two_sided.sh

📝 Walkthrough

Walkthrough

This PR adds a complete end-to-end testing infrastructure for Boxlite. It includes a GitHub Actions workflow that targets KVM-capable self-hosted runners, shell scripts to bootstrap/teardown a local API and runner stack, Python fixtures and a pytest-based test harness, REST-mode validation tests, SDK entry-point smoke tests for multiple languages, and utilities for verifying the request path through the system.

Changes

End-to-end test infrastructure and SDK validation suite

Layer / File(s) Summary
Build configuration and workflow integration
make/help.mk, make/test.mk, .github/workflows/e2e-stack.yml, apps/package.json
Make targets for E2E bootstrap/test/two-sided, GitHub Actions workflow with KVM runner, maturin SDK rebuild, and node-forge dependency added.
Stack provisioning and lifecycle scripts
scripts/test/e2e/bootstrap.sh, scripts/test/e2e/teardown.sh, scripts/test/e2e/two_sided.sh
Bootstrap provisions OS/database/Docker dependencies, builds toolchains, compiles runner, creates/starts systemd services. Teardown reverses provisioning with optional data wipe. Two-sided validates regressions via runner rebuild from two refs.
Fixture setup and credential provisioning
scripts/test/e2e/fixture_setup.py
Patches admin quotas, registers snapshots, discovers API path prefix, and writes credentials to ~/.boxlite/credentials.toml for test consumption.
Test harness, fixtures, and path verification helpers
scripts/test/e2e/pytest.ini, scripts/test/e2e/cases/conftest.py, scripts/test/e2e/lib/path_verification.py, scripts/test/e2e/.gitignore
Pytest config with asyncio and e2e markers, fixtures for REST runtime (rt), per-test box creation, stream draining, and autouse box-tracking/journal verification. Path helpers inspect API logs and runner systemd journal for request passage evidence.
REST API and lifecycle tests
scripts/test/e2e/cases/test_box_management.py, scripts/test/e2e/cases/test_lifecycle.py, scripts/test/e2e/cases/test_shutdown.py, scripts/test/e2e/cases/test_path_verification.py
Tests cover box creation/naming, list/get/remove operations, runtime multi-instance sharing, and validate the SDK runtime points to the local API and that exec reaches the runner journal.
Execution features and timeout tests
scripts/test/e2e/cases/test_exec_options.py, scripts/test/e2e/cases/test_exec_timeout.py, scripts/test/e2e/cases/test_exec_unit_shape.py, scripts/test/e2e/cases/test_execution_shutdown.py, scripts/test/e2e/cases/test_resize_tty.py
Tests validate working directory, environment variables, TTY mode, timeout enforcement with signal escalation, output stream collection, exec result shape, and shutdown semantics.
Error handling and regression tests
scripts/test/e2e/cases/test_errors.py, scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py
Tests assert error typing (non-5xx for client errors, explanatory messages), invalid token rejection, and P0-6 stdout-drop race with configurable loss-rate tolerance.
SDK entry-point smoke tests
scripts/test/e2e/cases/test_c_entry.py, scripts/test/e2e/cases/test_go_entry.py, scripts/test/e2e/cases/test_node_entry.py, scripts/test/e2e/sdks/c/e2e_basic.c, scripts/test/e2e/sdks/go/e2e_basic.go, scripts/test/e2e/sdks/node/e2e_basic.ts
Language-specific smoke drivers: C compiles against libboxlite with create/remove round-trip, Go reads env config and validates box/exec with journal verification, Node runs via tsx with OK marker confirmation.
Test runner and documentation
scripts/test/e2e/run.sh, scripts/test/e2e/README.md
Runner script conditionally bootstraps, runs fixture setup, auto-installs test dependencies, and executes pytest. README documents suite purpose, prerequisites, running/teardown commands, and case addition workflow.

🎯 4 (Complex) | ⏱️ ~45 minutes

A rabbit's ode to the E2E trail,
Where bootstrap meets fixture, and failures tell tales,
🐰 runs boxes through API and runner—
Path verified, journals singing, each test a winner!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
scripts/test/e2e/sdks/c/e2e_basic.c

scripts/test/e2e/sdks/c/e2e_basic.c:14:10: fatal error: 'boxlite.h' file not found
14 | #include "boxlite.h"
| ^~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/9d98c0a81ae6d1e0e0b62b398c9977ca3fc852de-976c439587b05d25/tmp/clang_command_.tmp.1a38f6.txt
++Contents of '/tmp/coderabbit-infer/9d98c0a81ae6d1e0e0b62b398c9977ca3fc852de-976c439587b05d25/tmp/clang_command_.tmp.1a38f6.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu" "-emit-obj" "-mrelax-all" "-disable-free"

... [truncated 714 characters] ...

x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/976c439587b05d25/file.o" "-x" "c"
"scripts/test/e2e/sdks/c/e2e_basic.c" "-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Comment @coderabbitai help to get the list of available commands and usage tips.

Adds 4 cases under cases/test_cli_entry.py that exercise the same
SDK→API→Runner→VM chain through the `boxlite` CLI binary instead of
the Python SDK. The CLI shares the Rust `boxlite::rest` client with
the Python SDK at the bottom layer, but adds CLI-only surface
(argparse, exit-code propagation, table output, auth login) that
nothing else in the e2e suite exercises.

Cases:
  test_cli_whoami_against_local_api
      `boxlite auth whoami` returns the expected admin profile and
      URL — proves the CLI is talking to the same local API the
      Python SDK uses.
  test_cli_ls_returns_table
      `boxlite ls` succeeds and emits table-shaped output even with
      zero boxes.
  test_cli_run_exec_chain
      Full lifecycle via CLI: `boxlite run -d <image>`,
      `boxlite exec <id> -- echo`, `boxlite rm -f <id>`. Asserts
      stdout capture, list visibility, AND the runner journal
      contains the new box id. The journal check is the CLI-specific
      path-bypass guard (the autouse fixture only watches
      `Boxlite.rest._created`; CLI subprocesses are invisible to it).
  test_cli_exec_exit_code_propagates
      Non-zero exit inside the box round-trips back through the CLI's
      own exit code. CLI-specific because argv-to-exit-code mapping
      is not on the SDK path.

Locally: 4/4 pass; total suite is now 28 / 29 (the single failure is
still the runner Stopped → 500 mapping bug — unchanged).
G4614 added 3 commits June 8, 2026 13:12
Adds 3 new entry-point cases (1 each for Go, Node, C SDK), each routing
through SDK → API → Runner → libkrun VM via the local API. Combined
with the existing Python SDK + CLI entries, this covers all 5 binding
layers above the shared Rust REST client:

  - Python SDK (PyO3 + asyncio bridge)
  - CLI binary (subprocess + table output)
  - Node SDK (napi-rs + JS Promise marshalling)  ← new
  - Go SDK (CGO + context + slice conversion)    ← new
  - C SDK (raw C ABI + callback drain pump)      ← new

Each binding layer has its own marshalling / serialization / async-glue
code path, so a regression in any one of them (e.g. a napi enum that
forgets to add a variant, a Go BoxOption that drops a field, a missing
pthread join) is invisible to the others.

Notes per SDK:

  Go: builds against sdks/go in the workspace; runs against
  target/release/libboxlite.so via LD_LIBRARY_PATH. Full
  create → exec → remove path, asserts captured stdout + exit code +
  runner journal hit.

  Node: imports from the LOCAL sdks/node build (the repo root has a
  stale @boxlite-ai/boxlite 0.9.5 install with field-name glitches —
  BoxliteRestOptions stores `prefix` but JsBoxlite.rest reads
  `pathPrefix`; importing from `../../../../../sdks/node` skips the
  cache). Smoke is create + remove only — exec stream draining is
  covered by Python / Go / CLI.

  C: pthread-synchronized create_box callback + dedicated
  boxlite_runtime_drain pump thread (callbacks don't fire without
  it — same model as sdks/go's ensureDrainRunning). Smoke is also
  create + remove only; exec would add 80+ lines of stream-pump glue
  that doesn't change what the REST chain proves.

All three drivers read connection settings from env (BOXLITE_E2E_URL /
API_KEY / PREFIX / IMAGE) so pytest fixtures can swap them per-test
without rebuilding the native binary.

Path-bypass guarantee: each new test snapshots
journalctl -u boxlite-runner before the run and asserts the
runner journal contains the box id afterwards. The autouse Python
guard cannot see CLI / native subprocesses (it only watches the
Python `rt._created` bucket), so each new entry test does the
verification explicitly.

Local results: 31 PASS / 1 FAIL — the lone failure is unchanged
(test_exec_on_stopped_box_is_typed_error, Go runner Stopped → 500
mapping bug at apps/runner/pkg/api/controllers/boxlite_exec.go:77).
The previous bootstrap.sh curl'd boxlite-runner-vX.X.X-linux-amd64.tar.gz
from GitHub Releases, pinning the runner to whatever was last cut. That
meant the e2e suite was permanently testing stale code:

  - PRs that touch the runner (e.g. boxlite-ai#563 sdks/c/ FFI drain, or the
    boxlite_exec.go:77 mapping fix this suite catches) would never
    surface in CI — the release binary still has the old behaviour
  - test:e2e:two-sided was the only way to actually validate a PR,
    and it relied on side-channel build steps

Now bootstrap.sh:
  1. Installs rustup + Go if missing (skips otherwise)
  2. cargo build --release -p boxlite-c (produces libboxlite.a)
  3. cd apps/runner && CGO_ENABLED=1 go build → /usr/local/bin/boxlite-runner

cargo + go both do incremental builds, so re-running bootstrap on a
clean working tree finishes in seconds; the cold first build is
~5 min (the Rust release build dominates).

The CI workflow no longer needs the separate "Rebuild boxlite-runner
binary from this checkout" step — bootstrap covers it. Left a restart
step in so a no-op bootstrap still picks up other env-file drift.

Side effect of this change: the e2e suite against main HEAD now FAILs
7 more tests (the boxlite-ai#563 stdout race surfaces on every exec smoke
case, not just test_p0_6). This is the correct signal — main currently
ships the race, and the e2e suite is now honest about it. Merging boxlite-ai#563
flips all 7 (plus the original test_p0_6) green.
…guns

Addresses 11 review points on PR boxlite-ai#678:

1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens),
   persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned).
   Read back on every re-bootstrap so DB-encrypted data and the
   existing admin user keep working.
2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0.
3. `yarn install` no longer swallows stderr — silent install failures
   were the reviewer's "挂了不知道为啥" pain.
4. Runner health check is now a real wait loop (`pgrep` of the binary
   path + ss listening on :8080) instead of `sleep 3`. API already
   poll /api/health.
5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin
   key. Catches encryption-key mismatch, broken migrations, stale
   admin user / new-token mismatch BEFORE the e2e suite runs (with a
   helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch).
7. tslib + node-forge are now real `apps/package.json` deps (committed
   here). Bootstrap no longer yarn-adds them, so it no longer dirties
   the working tree mid-run.
9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves
   secrets via the secrets file). Previous behaviour only wrote when
   missing, so PRs that added env vars never landed on existing hosts.
10. New teardown.sh — basic / --wipe-data / --full modes that revert
    the bootstrap state cleanly.
11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr
    instead of being silent.
12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented
    in-line — Google is a fetchable .well-known endpoint; e2e auth
    goes through the admin API key path and never touches Google.
13. REPO autodetects from the script's own location (no more
    $HOME/ws/boxlite default that only worked on one machine).

Two reviewer concerns were left as-is with comments:

  6. apps/apps self-symlink — the real fix is to rewrite the
     `apps/api/...` paths in every `project.json` to `api/...`. That's
     an apps-workspace refactor PR, not e2e infra.
  8. API runs via npx ts-node, not webpack bundle — production parity
     diverges only on TypeORM entity discovery (the cron
     `databaseName undefined` warning), which doesn't intersect any
     code path e2e exercises. Webpack bundle would add 5–10 min per PR.

Side change to fixture_setup.py: now also writes the `default` profile
to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests
hit the default profile when running `boxlite auth whoami` etc, so a
re-bootstrap with a new admin key needs to update both or CLI tests
401.

Verified end-to-end: teardown --wipe-data → fresh bootstrap →
fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails
are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug —
both real production bugs the suite is specifically here to catch).
DorianZheng added a commit that referenced this pull request Jun 8, 2026
…port (#679)

## Problem

After enabling the merge queue, `Test (conclusion)` and `Lint
(conclusion)` were made **required** status checks. But `test.yml` /
`lint.yml` still had a `paths:` filter on the **`pull_request`**
trigger, so a PR not touching those paths never runs the workflow — and
the required conclusion check stays **Pending**, blocking the PR from
merging.

Observed live on #678 (changed only `scripts/test/e2e/**` + `make/**` →
`test.yml` never ran → `Test (conclusion)` missing). Confirmed by
[GitHub
docs](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/troubleshooting-required-status-checks):
*"a workflow skipped due to path filtering... checks... remain
'Pending'... blocked from merging."*

## Fix

Remove the `paths:` filter from the `pull_request` trigger on both
workflows so they run on **every** PR (matches rust-analyzer/bevy, which
have no top-level paths on their gating workflows). Nothing else
changes:
- Per-suite filtering still happens in the `changes` job (heavy jobs
skip when unaffected).
- The conclusion job reports `success`-or-`skipped`, so it always
reports green for unrelated PRs.
- `push:` paths are kept as-is (irrelevant to PR gating).

Cost: ~3 small jobs (config/changes/conclusion) per workflow per PR.

## Note

PRs opened before #677 merged still lack the conclusion jobs entirely
and need a rebase onto `main` to pick them up — separate from this fix.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated CI/CD workflows to ensure status checks remain responsive on
all pull requests targeting main, preventing them from getting stuck in
a pending state.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ustup defer

Three reviewer fixes around build-tooling install:

🔴 make setup:build before cargo build
    Fresh Ubuntu doesn't ship meson / ninja / build-essential, but
    `cargo build -p boxlite-c` shells out to them via the
    bubblewrap-sys / e2fsprogs-sys / libkrun-sys crates. Without the
    canonical `make setup:build` (routes to scripts/setup/setup-ubuntu.sh),
    bootstrap died on `meson: command not found` on every fresh host.
    Now invoked before the C SDK build. Also swap the raw cargo call
    for `make dist:c` so the staged sdks/c/dist/ output is in place
    for any future tooling that expects it.

🟡 Go version from apps/runner/go.mod
    Was hardcoded GO_VER=1.23.4. The runner module requires 1.25.4;
    the old pin lagged the workspace and `make setup:build` correctly
    rejected it. Now reads:

        GO_VER=$(awk '/^go [0-9]/ {print $2; exit}' apps/runner/go.mod)

    Subtle gotcha: Go 1.21+ auto-downloads a matching toolchain to
    ~/go/pkg/mod/golang.org/toolchain@... when a go.mod requires a
    newer version, and `go version` then reports the auto-toolchain
    version, not the system Go install. `make setup:build` and most
    other subprocesses see the system Go. So we read the truthful
    system version with:

        cd /tmp && GOTOOLCHAIN=local go version

    before deciding whether to install. Without this, bootstrap would
    see "1.25.4" (auto-toolchain) and skip the install, leaving
    setup:build still failing on the system 1.23.4.

🟡 rustup defers to rust-toolchain.toml
    Was `sh -s -- -y --default-toolchain stable`. The repo's
    `rust-toolchain.toml` already says channel = "stable", so the
    rustup install pre-downloads stable, then the first cargo call
    re-resolves the toolchain anyway (one wasted round-trip). Now:

        sh -s -- -y --no-modify-path --profile minimal --default-toolchain none

    rustup installs without a default; the first `cargo build` reads
    the toolchain file and pulls whatever it asks for. Smaller
    install, less duplication, automatically tracks the toml if it
    ever pins a specific version.
@DorianZheng DorianZheng marked this pull request as ready for review June 8, 2026 14:12
@DorianZheng DorianZheng merged commit e4bbd0e into boxlite-ai:main Jun 8, 2026
19 of 21 checks passed
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
…guns

Addresses 11 review points on PR boxlite-ai#678:

1. Random ADMIN_API_KEY (+ proxy / ssh-gateway / runner tokens),
   persisted in /etc/boxlite-secrets.env (mode 600, $USER-owned).
   Read back on every re-bootstrap so DB-encrypted data and the
   existing admin user keep working.
2. Docker registry now bound to 127.0.0.1:5000, not 0.0.0.0.
3. `yarn install` no longer swallows stderr — silent install failures
   were the reviewer's "挂了不知道为啥" pain.
4. Runner health check is now a real wait loop (`pgrep` of the binary
   path + ss listening on :8080) instead of `sleep 3`. API already
   poll /api/health.
5. End-of-bootstrap smoke: GET /v1/me with the freshly-minted admin
   key. Catches encryption-key mismatch, broken migrations, stale
   admin user / new-token mismatch BEFORE the e2e suite runs (with a
   helpful HINT to run teardown.sh --wipe-data if it's a key-mismatch).
7. tslib + node-forge are now real `apps/package.json` deps (committed
   here). Bootstrap no longer yarn-adds them, so it no longer dirties
   the working tree mid-run.
9. /etc/boxlite-api.env is regenerated EVERY bootstrap (preserves
   secrets via the secrets file). Previous behaviour only wrote when
   missing, so PRs that added env vars never landed on existing hosts.
10. New teardown.sh — basic / --wipe-data / --full modes that revert
    the bootstrap state cleanly.
11. HOST_IP fallback to 127.0.0.1 now prints a WARNING to stderr
    instead of being silent.
12. OIDC_ISSUER_BASE_URL=https://accounts.google.com is now commented
    in-line — Google is a fetchable .well-known endpoint; e2e auth
    goes through the admin API key path and never touches Google.
13. REPO autodetects from the script's own location (no more
    $HOME/ws/boxlite default that only worked on one machine).

Two reviewer concerns were left as-is with comments:

  6. apps/apps self-symlink — the real fix is to rewrite the
     `apps/api/...` paths in every `project.json` to `api/...`. That's
     an apps-workspace refactor PR, not e2e infra.
  8. API runs via npx ts-node, not webpack bundle — production parity
     diverges only on TypeORM entity discovery (the cron
     `databaseName undefined` warning), which doesn't intersect any
     code path e2e exercises. Webpack bundle would add 5–10 min per PR.

Side change to fixture_setup.py: now also writes the `default` profile
to ~/.boxlite/credentials.toml (not just `p1`). The CLI entry tests
hit the default profile when running `boxlite auth whoami` etc, so a
re-bootstrap with a new admin key needs to update both or CLI tests
401.

Verified end-to-end: teardown --wipe-data → fresh bootstrap →
fixture_setup → 32 pytest cases run, 24 pass / 8 fail. The 8 fails
are unchanged (7× boxlite-ai#563 stdout race + 1× Stopped→500 mapping bug —
both real production bugs the suite is specifically here to catch).
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on boxlite-ai#678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a
fresh location that doesn't match the repo's existing per-SDK test
layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/,
sdks/c/tests/, sdks/go/*_test.go).

This PR moves the cases to match. Driver scripts stay in
scripts/test/e2e/ (cross-SDK infra); test files now live next to
the SDK they exercise:

  scripts/test/e2e/cases/test_c_entry.py    → sdks/c/tests/e2e/
  scripts/test/e2e/cases/test_go_entry.py   → sdks/go/tests/e2e/
  scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/
  scripts/test/e2e/cases/test_cli_entry.py  → src/cli/tests/e2e/
  scripts/test/e2e/cases/test_*.py (Python pytest)
                                            → sdks/python/tests/e2e/
  scripts/test/e2e/cases/conftest.py        → sdks/python/tests/e2e/
  scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}
                                            → sdks/{c,go,node}/tests/e2e/

Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh,
two_sided.sh, fixture_setup.py, pytest.ini, README.md.

Supporting wiring (no test-logic change):
  - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args
    (testpaths in pytest.ini can't resolve outside its rootdir)
  - pytest.ini: comment explaining the testpaths omission
  - .github/workflows/e2e-stack.yml: paths filter expanded to all
    new e2e locations
  - README.md: layout map updated

Pure rename + supporting wiring. Zero new test logic, zero source
changes. Originally part of boxlite-ai#681 which also bundled 21 new test
cases + 2 fixes; this split-out covers reorganize only — the rest
ships in follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
PR boxlite-ai#678 shipped e2e pytest cases under scripts/test/e2e/cases/, a
fresh location that doesn't match the repo's existing per-SDK test
layout (src/cli/tests/*.rs, sdks/python/tests/, sdks/node/tests/,
sdks/c/tests/, sdks/go/*_test.go).

This PR moves the cases to match. Driver scripts stay in
scripts/test/e2e/ (cross-SDK infra); test files now live next to
the SDK they exercise:

  scripts/test/e2e/cases/test_c_entry.py    → sdks/c/tests/e2e/
  scripts/test/e2e/cases/test_go_entry.py   → sdks/go/tests/e2e/
  scripts/test/e2e/cases/test_node_entry.py → sdks/node/tests/e2e/
  scripts/test/e2e/cases/test_cli_entry.py  → src/cli/tests/e2e/
  scripts/test/e2e/cases/test_*.py (Python pytest)
                                            → sdks/python/tests/e2e/
  scripts/test/e2e/cases/conftest.py        → sdks/python/tests/e2e/
  scripts/test/e2e/sdks/{c,go,node}/e2e_basic.{c,go,ts}
                                            → sdks/{c,go,node}/tests/e2e/

Stays in scripts/test/e2e/: bootstrap.sh, teardown.sh, run.sh,
two_sided.sh, fixture_setup.py, pytest.ini, README.md.

Supporting wiring (forced by the renames):
  - run.sh: pytest invoked with the 5 SDK e2e dirs as explicit args
    (testpaths in pytest.ini can't resolve outside its rootdir)
  - pytest.ini: comment explaining the testpaths omission
  - .github/workflows/e2e-stack.yml: paths filter expanded to all
    new e2e locations
  - README.md: layout map updated

Two collisions the rename exposed (both fixed in this PR):

  1. SDK unit-test pytest workflow auto-discovers sdks/python/tests/
     and would import the new e2e/ subdir, whose conftest needs
     Python 3.11+ tomllib and a live API stack the unit-test
     workflow doesn't provide. Fix: norecursedirs = e2e in
     sdks/python/pytest.ini. The e2e runner passes the dir
     explicitly on the CLI which bypasses norecursedirs.

  2. pytest's default --import-mode=prepend puts each test file's
     rootpath on sys.path. For sdks/python/tests/e2e/test_*.py the
     rootpath becomes sdks/python/, which then shadows the installed
     boxlite wheel with the source-tree stub package
     (sdks/python/boxlite/__init__.py — missing the compiled .so so
     Rust-bound symbols like BoxliteRestOptions are absent). Every
     fixture using boxlite.BoxliteRestOptions failed at setup with
     AttributeError. Fix: --import-mode=importlib in
     scripts/test/e2e/pytest.ini (modern mode, no sys.path
     injection). Tests that previously did `from conftest import
     drain` couldn't keep doing that under importlib mode (conftest
     no longer on sys.path) — extracted drain/collect_stream/
     stdout_line_count from conftest into
     scripts/test/e2e/lib/e2e_helpers.py (alongside path_verification
     which already used the same lib/ + sys.path.insert pattern),
     and updated the 8 test files' imports.

Verified locally on Ubuntu 22.04 + Python 3.11 + maturin-built
boxlite wheel — make test:e2e against PR-A: 21 passed, 7 failed
(the 7 are real production bugs the suite is designed to catch:
PR-C fixes runner exec error mapping, PR-D fixes FFI drain race).
Before this PR's collision fix: 0 passed, 32 fixture-setup errors.

Originally part of boxlite-ai#681 which also bundled 21 new test cases + 2
fixes; this split-out covers reorganize only — the rest ships in
follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
Reframes boxlite-ai#563 around the SDK contract: Go SDK (sdks/go/exec.go:266-275)
documents that on_exit is always registered AFTER on_stdout/on_stderr,
specifically so the Exit dispatch can safely free shared state (the
cgo.Handle) without racing stream callbacks. With that contract,
register_exit sees all stream pumps already registered, and the existing
`stream_done_rx: Vec<oneshot::Receiver<()>>` snapshot-and-await pattern
correctly orders Stdout/Stderr* before Exit.

The bug wasn't in that path. It was that `execution_wait` ran as an
INDEPENDENT terminal task pushing Wait — with no stream drain — so the
wait gRPC reply could land in the queue ahead of still-flushing chunks
(`box.Exec` in the Go SDK returned with empty stdout when the user
observed Wait before the Stream callbacks).

This change makes exit_pump the SOLE owner of terminal-event dispatch:

  ExecutionHandle:
    + pending_waits: Arc<Mutex<Vec<(CExecutionWaitFn, usize)>>>
    + exit_pump_spawned: Arc<AtomicBool>
    - (no watch / Notify / AtomicI32 wakeup primitive)

  spawn_exit_pump_if_needed:
    First of {register_exit, execution_wait} to call wins the
    compare_exchange and spawns exit_pump; subsequent callers just
    register their cb into exit_dispatch / pending_waits.

  execution_wait (no longer spawns its own task):
    Hold pending_waits lock while reading exit_dispatched:
      - already true → push Wait directly (Exit already drained)
      - false → append to pending_waits, ensure exit_pump spawned

  exit_pump (single task per execution):
    1. wait_on_clone(process)
    2. drain stream_done_rx (snapshot via mem::take)
    3. set process_completed (only on Ok wait — preserves existing
       contract that an Err wait doesn't prove process exit)
    4. claim exit_dispatched, push Exit if exit_dispatch registered
    5. take pending_waits, push Wait per registration
       (each Wait gets its own wait_on_clone — same shared execution,
        already exited, returns immediately)

Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task,
so the queue's push order IS the dispatch order. No inter-task
synchronization, no watch::Sender, no Notify, no
register-then-check footgun.

Late `boxlite_execution_wait` races the `exit_dispatched=true` flag:
under pending_waits lock, either the late caller is part of the
take()'d snapshot OR observes the flag and spawns its own
direct-push task. Never lost.

Test plan
---------
- `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass
- Two-side e2e verified (Ubuntu 22.04 + Python 3.11 + maturin-built
  wheel against `make test:e2e:setup` infra from boxlite-ai#678):

  Side A — revert sdks/c/src/exec/execution.rs to upstream/main
           (oneshot vec for Exit path, execution_wait as separate
           task with no drain):
    test_p0_6_exec_stdout_race:           FAIL 70% loss rate
    test_563...complete_stdout_sequence:  FAIL truncated stdout

  Side B — apply this PR:
    test_p0_6_exec_stdout_race:           PASS
    test_563...complete_stdout_sequence:  PASS

Adds scripts/test/e2e/cases/test_563_exec_drain_barrier.py as a
deterministic single-shot regression (seq 1 200), complementary to
test_p0_6_exec_stdout_race's statistical (ROUNDS×2 execs) coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
Reframes boxlite-ai#563 around the SDK contract: Go SDK (sdks/go/exec.go:266-275)
documents that on_exit is always registered AFTER on_stdout/on_stderr,
specifically so the Exit dispatch can safely free shared state (the
cgo.Handle) without racing stream callbacks. With that contract,
register_exit sees all stream pumps already registered, and the existing
`stream_done_rx: Vec<oneshot::Receiver<()>>` snapshot-and-await pattern
correctly orders Stdout/Stderr* before Exit.

The bug wasn't in that path. It was that `execution_wait` ran as an
INDEPENDENT terminal task pushing Wait — with no stream drain — so the
wait gRPC reply could land in the queue ahead of still-flushing chunks
(`box.Exec` in the Go SDK returned with empty stdout when the user
observed Wait before the Stream callbacks).

This change makes exit_pump the SOLE owner of terminal-event dispatch:

  ExecutionHandle:
    + pending_waits: Arc<Mutex<Vec<(CExecutionWaitFn, usize)>>>
    + exit_pump_spawned: Arc<AtomicBool>
    - (no watch / Notify / AtomicI32 wakeup primitive)

  spawn_exit_pump_if_needed:
    First of {register_exit, execution_wait} to call wins the
    compare_exchange and spawns exit_pump; subsequent callers just
    register their cb into exit_dispatch / pending_waits.

  execution_wait (no longer spawns its own task):
    Hold pending_waits lock while reading exit_dispatched:
      - already true → push Wait directly (Exit already drained)
      - false → append to pending_waits, ensure exit_pump spawned

  exit_pump (single task per execution):
    1. wait_on_clone(process)
    2. drain stream_done_rx (snapshot via mem::take)
    3. set process_completed (only on Ok wait — preserves existing
       contract that an Err wait doesn't prove process exit)
    4. claim exit_dispatched, push Exit if exit_dispatch registered
    5. take pending_waits, push Wait per registration
       (each Wait gets its own wait_on_clone — same shared execution,
        already exited, returns immediately)

Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task,
so the queue's push order IS the dispatch order. No inter-task
synchronization, no watch::Sender, no Notify, no
register-then-check footgun.

Late `boxlite_execution_wait` races the `exit_dispatched=true` flag:
under pending_waits lock, either the late caller is part of the
take()'d snapshot OR observes the flag and spawns its own
direct-push task. Never lost.

Test plan
---------
- `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass
- Two-side e2e verified (Ubuntu 22.04 + Python 3.11 + maturin-built
  wheel against `make test:e2e:setup` infra from boxlite-ai#678):

  Side A — revert sdks/c/src/exec/execution.rs to upstream/main
           (oneshot vec for Exit path, execution_wait as separate
           task with no drain):
    test_p0_6_exec_stdout_race:           FAIL 70% loss rate
    test_563...complete_stdout_sequence:  FAIL truncated stdout

  Side B — apply this PR:
    test_p0_6_exec_stdout_race:           PASS
    test_563...complete_stdout_sequence:  PASS

Adds scripts/test/e2e/cases/test_563_exec_drain_barrier.py as a
deterministic single-shot regression (seq 1 200), complementary to
test_p0_6_exec_stdout_race's statistical (ROUNDS×2 execs) coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
Go SDK's box.Exec returns with stdout chunks still in flight — the
user's stream callbacks fire after the synchronous wait already
returned, so box.Exec hands the caller a result with truncated stdout.

Pre-fix, two terminal-event paths existed, but only ONE drained streams:

  exit_pump          → drains stream_done_rx, then push Exit          ✅
  execution_wait     → wait_on_clone + push Wait, no drain            ❌

The Exit path was already correctly ordered behind every Stdout/Stderr
event. The Wait path raced the still-flushing stream chunks: Wait lands
in the FFI queue ahead of late Stdout/Stderr, the Python SDK closes the
stdout iterator on Wait, the user's `drain(ex)` returns truncated.

Fix: consolidate. exit_pump becomes the sole owner of terminal-event
dispatch. execution_wait stops spawning its own task and instead
appends (cb, user_data) to a pending_waits vec that exit_pump iterates
AFTER pushing Exit:

  [register_exit / execution_wait — first caller spawns exit_pump]
    register_exit:    set exit_dispatch slot
    execution_wait:   append to pending_waits  (or push direct if
                                                exit_dispatched=true)

  [exit_pump — single task per execution]
    1. wait_on_clone(process)
    2. drain stream_done_rx (existing oneshot vec)
    3. claim exit_dispatched, push Exit if exit_dispatch registered
    4. take pending_waits, push Wait per registration

Queue order: Stdout/Stderr* → Exit → Wait* — all from the SAME task,
so push order IS dispatch order. No inter-task synchronization, no
watch / Notify / AtomicI32 waker primitive.

The on_exit registration contract (Go SDK `sdks/go/exec.go:266-275`:
on_exit always registered AFTER on_stdout/on_stderr) means
register_exit sees every stream pump's done_rx already in the vec.
We didn't have to invent the Exit ordering — it already worked. Just
needed to route Wait through the same path.

Late `boxlite_execution_wait` race: caller holds pending_waits lock
while reading exit_dispatched. Either it lands in exit_pump's
mem::take snapshot, OR observes the flag and spawns a direct-push
task — never lost.

What changed beyond the Wait loop itself
----------------------------------------
- pending_waits, exit_pump_spawned: new fields
- spawn_exit_pump_if_needed helper: single spawn whether register_exit
  or execution_wait arrives first
- stream_done_rx, exit_dispatch Arc-wrapped: exit_pump now lives
  outside register_exit's scope, needs Arc clones of state
- Both constructors initialise the 2 new fields
- register_exit simplified to set the dispatch slot + call helper

Test plan
---------
- `make test:unit:ffi FILTER=exec` → 20/20 boxlite-c exec tests pass
- Two-side verified (Ubuntu 22.04 + Python 3.11 + maturin-built wheel)
  against the existing boxlite-ai#678 regression suite case
  `scripts/test/e2e/cases/test_p0_6_exec_stdout_race.py` (which boxlite-ai#678
  already cites boxlite-ai#563 in its docstring):

  Side A — revert sdks/c/src/exec/execution.rs to upstream/main:
    test_p0_6_exec_stdout_race: FAIL 70% loss rate

  Side B — apply this PR:
    test_p0_6_exec_stdout_race: PASS

- Full `make test:e2e` against this PR's runner: 27 passed / 2 failed
  / 4 skipped. The 2 fails are unrelated (test_exec_on_stopped_box —
  fixed by PR boxlite-ai#684 runner error mapping; test_node_sdk_create_exec_remove
  — local Node SDK build setup gap, not FFI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
21 new e2e cases under scripts/test/e2e/cases/ that pin contracts the
existing local-FFI integration tests can't reach. Cases shipped with
xfail(strict=True) markers for the known production bugs they pin —
a follow-up fix-PR flipping a bug closed will trip the strict marker
into a CI failure, surfacing the fact that the xfail can now be dropped.

Files added:
  scripts/test/e2e/cases/test_error_code_mapping.py
  scripts/test/e2e/cases/test_quota_enforcement.py
  scripts/test/e2e/cases/test_runner_concurrency.py

Replaces an earlier split-out attempt that stacked on a reorganize PR
(boxlite-ai#682). That structural reorg is now abandoned; new e2e cases land
under scripts/test/e2e/cases/ matching boxlite-ai#678's existing layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
POST /v1/{prefix}/boxes/{id}/exec was leaking HTTP 500 when the
underlying box was already gone / stopped / in a non-runnable state
by the time the runner tried to spawn:

  exec on a removed box   -> 500 "spawn_failed: build failed"
  exec on a stopped box   -> 500 (same shape)
  exec on box mid-rebuild -> 500 (same shape)

Root cause: BoxliteExec in boxlite_exec.go wraps the SDK error in a
generic 500 without consulting the SDK's typed error helpers
(IsNotFound, IsStopped, IsInvalidState). Other handlers in this file
(signal/resize/kill/status) already classify these; POST /exec was
the one that got missed.

Fix: extend classifyExecError to recognise the SDK's typed errors and
map them to 404 (NotFound), 409 (Stopped, InvalidState) per the
canonical mapping at src/shared/src/errors.rs.

Pin: test_exec_on_stopped_box_is_typed_error in boxlite-ai#678's e2e suite goes
from 500 -> 4xx after this change. Re-validation: stack on top of
boxlite-ai#683 (which adds test_runner_concurrency.py) for the
exec-after-soft-deleted-box variant.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
The Go SDK's box.Exec would occasionally return with stdout chunks
still in flight — the user's callback got OnExit (or the wait gRPC
reply) before the matching OnStdout/OnStderr chunks landed on the
queue. From the caller's perspective, the exec had finished but its
stdout was silently truncated.

Root cause: execution_wait spawned an independent terminal task that
pushed the Wait event as soon as wait_on_clone returned — with no
drain barrier — so the wait reply could land in the event queue
ahead of still-flushing stream pumps.

Fix: make exit_pump the sole owner of terminal-event dispatch. Both
execution_wait and register_exit fan into it; exit_pump awaits all
stream pump receivers before pushing the terminal event. Queue order
becomes Stdout/Stderr* -> Exit -> Wait*, all from the same task.

Pin: test_p0_6_exec_stdout_race in boxlite-ai#678's e2e suite goes from ~70%
stdout-loss to 0%.

Replaces an earlier split-out attempt that stacked on boxlite-ai#682's reorg
(now abandoned). Branch rebuilt against current main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 10, 2026
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on boxlite-ai#678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 10, 2026
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on boxlite-ai#678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 10, 2026
PR boxlite-ai#678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on boxlite-ai#678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit that referenced this pull request Jun 10, 2026
PR #678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on #678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
G4614 added a commit that referenced this pull request Jun 12, 2026
PR #678 wires up an e2e suite but its workflow only triggers on PRs
that touch specific paths. GitHub branch protection requires a check
to ALWAYS report a result — a path-filtered workflow that doesn't
fire leaves the required check pending forever, so the PR can't merge.

Reshape so e2e can be a required check:

1. Drop `paths:` from the trigger — workflow always starts on every
   PR to main. Inside, a cheap `changes` job on a GitHub-hosted
   runner does the path-filter detection via dorny/paths-filter@v3.

2. The expensive `e2e` job (self-hosted kvm runner, ~10-30 min)
   stays gated on `changes.outputs.relevant == true`. PRs that don't
   touch relevant paths skip the kvm runner entirely — no contention
   cost.

3. New `e2e-gate` job ALWAYS runs (`if: always()`) and collapses
   the conditional outcome into one stable check name
   `E2E gate (required)`. Reports success if e2e passed OR no
   relevant paths changed; fail if e2e ran and failed.

Branch protection should be configured (separately, via repo
Settings → Branches → main) to require the `E2E gate (required)`
status check. That's a repo-settings action a maintainer needs to
do once after this merges.

Cost on irrelevant PRs: one ~10s GitHub-hosted job for `changes`
+ ~5s for `e2e-gate`. Negligible. No self-hosted runner consumed.

Cost on relevant PRs: same as before (full e2e on kvm runner).

Stacks on #678 — this commit only makes sense once that PR's
e2e-stack.yml is in main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants