Bump version to 0.2.2 for all packages by DorianZheng · Pull Request #4 · boxlite-ai/boxlite

DorianZheng · 2025-12-10T10:33:57Z

No description provided.

Signed-off-by: dorianzheng <xingzhengde72@gmail.com>

…on guide Add resize_tty(rows, cols) binding to PyExecution (Rust) and SyncExecution (Python sync wrapper), enabling terminal resizing for TTY-enabled executions without the stty stdin workaround. Also add a comprehensive AI agent integration guide covering configuration, concurrency, timeouts, security, and file transfer patterns. Closes #218 items #4 and #6. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(examples): add interactive Claude Code box example Provide a persistent interactive terminal example for installing Claude Code and document it in the examples README. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(examples): address review feedback on interactive Claude example Rename file to remove "ubuntu" misnomer, remove unused BoxOptions, consolidate imports, and improve documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(sdk): expose resize_tty in Python SDK and add AI agent integration guide Add resize_tty(rows, cols) binding to PyExecution (Rust) and SyncExecution (Python sync wrapper), enabling terminal resizing for TTY-enabled executions without the stty stdin workaround. Also add a comprehensive AI agent integration guide covering configuration, concurrency, timeouts, security, and file transfer patterns. Closes #218 items #4 and #6. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: dorianzheng <8065637+DorianZheng@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

CRITICAL: - Secret::env_key() returns Result instead of panicking on bad names - Fix singleConnListener goroutine leak: replace http.Server + listener with direct HTTP/1.1 serving via ReadRequest loop (no Accept() block) SERIOUS: - Remove dead ca_cert_pem field from GvproxyInstance (Rule #4: Only What's Used) - Cert cache: check TTL on hits, evict expired certs, cap at 10000 entries - Document hardcoded Libkrun engine type MODERATE: - CA install tracks success count, logs error if zero installed - Document WebSocket frame substitution limitation - Fix pipe write comment (producer-consumer, not buffer size)

* feat(net): add secret substitution via TLS MITM proxy (#412) AI agent sandboxes need API keys (OpenAI, Anthropic, etc.) but exposing raw secrets inside the guest VM is a security risk — malicious code can exfiltrate them. This adds a TLS MITM proxy that substitutes placeholder tokens with real secret values at the network boundary. The guest only sees placeholders like `<BOXLITE_SECRET:openai>`. When an HTTPS request targets a secret host, the gvproxy bridge intercepts it, terminates TLS with an ephemeral CA cert, substitutes placeholders in headers and body via a streaming replacer, then forwards to the real upstream server. The real secret value never enters the guest VM. Key components: - BoxCA: ephemeral ECDSA P-256 CA with per-host cert cache (sync.Map) - Streaming replacer: sliding window algorithm, constant memory, handles placeholders spanning chunk boundaries - Reverse proxy: httputil.ReverseProxy for HTTP/1.1 + HTTP/2 + keep-alive - WebSocket support: detect upgrade, substitute headers, bidirectional splice - Secret host matcher: O(1) exact + wildcard hostname lookup - CA trust injection: base64 PEM via env var, guest agent installs at boot - Python SDK: Secret class with value redaction in repr/Debug Test coverage: 110 Go tests (with -race), 11 Rust tests, 38 Python tests. * fix(python): skip secret tests when SDK lacks Secret class CI uses cached wheels from the warm-caches workflow. Until the cache is rebuilt with the new Secret class, the test module skips gracefully instead of failing with AttributeError. * fix(net): fix MITM upstream transport and make /etc/hosts writable - Fix upstream TLS: set ServerName for SNI, use ForceAttemptHTTP2, dial destAddr directly (same approach as standardForward) - Make /etc/hosts writable in containers (remove ro mount flag) - Add debug logging to mitmAndForward for troubleshooting - Add test_secret_server.py for local E2E testing - Update interactive shell example with OpenAI API test instructions Verified end-to-end: guest sends placeholder in header → MITM substitutes real key → OpenAI returns 401 "Incorrect API key: sk-test-*2345" confirming the secret was substituted. HTTP/1.1, HTTP/2, and keep-alive all work correctly. * fix(guest): inject MITM CA cert into container rootfs The guest agent's initramfs is too small to write the CA cert file. Instead, read BOXLITE_CA_PEM env var directly during Container.Init and append the decoded PEM to the container's CA bundle on the QCOW2 disk. This makes HTTPS clients inside the container trust the MITM proxy without --no-check-certificate. * refactor(net): clean up MITM code after review - Fix host cert lifetime: 1h → 24h (match CA lifetime, prevents cert expiration for long-running boxes) - Extract matchesWildcard() helper to eliminate duplication in SecretHostMatcher.Matches() and SecretsForHost() - Remove all debug log.Printf leftovers from mitm_proxy.go; use logrus consistently for all MITM logging - Add 30s dial timeout on upstream connections (prevent hanging) - Remove unused conn field from singleConnListener (only addr needed) - Inline needsInspect variable in TCPWithFilter - Remove dead SSL trust var injection from lifecycle.rs (already handled by container_rootfs.rs on host side) - Downgrade noisy per-request logs from Info to Warn/Debug * revert(examples): remove secret test artifacts from interactive examples These files were accidentally modified/added as part of the MITM development and don't belong in the example directory. * refactor(net): move MITM CA generation from Go to Rust - Generate ephemeral ECDSA P-256 CA in Rust using `rcgen` crate - Pass CA cert+key to Go via GvproxyConfig JSON (no FFI round-trip) - Remove `gvproxy_get_ca_cert` FFI export (Go→Rust) - Add `NewBoxCAFromPEM()` Go constructor to parse Rust-generated PEM - Fix MITM bypass: port 443 with secrets now always inspects SNI before checking IP allowlist (was short-circuiting to standardForward) - Fix stale Go builds: watch entire gvproxy-bridge/ directory in build.rs - Clean up Python tests: 38→14 (remove Rust duplicates, merge integration tests, add E2E MITM verification + non-secret passthrough test) * refactor(net): move CA to crate::net::ca, pass via NetworkBackendConfig - Move ca.rs from net/gvproxy/ to net/ (CA is not gvproxy-specific) - Add ca_cert_pem/ca_key_pem fields to NetworkBackendConfig - Generate CA in vmm_spawn.rs where NetworkBackendConfig is built - GvproxyInstance::new() receives CA from caller instead of generating * fix(security): address code review findings for MITM PR CRITICAL: - Remove InsecureSkipVerify on upstream transport (use system cert pool) - Move shim config from CLI arg to temp file (0600) to avoid /proc/cmdline exposure of CA keys and secrets - WebSocket upstream now uses TLS (was plain TCP) SERIOUS: - Remove duplicate secret env injection from vmm_spawn.rs (single source of truth in container_rootfs.rs) - Replace log.Printf with logrus in websocket handler MODERATE: - CA generation failure now clears secrets (disables MITM) instead of silently continuing with broken substitution MINOR: - Rename `box` to `sandbox` in Python tests (box is a builtin) - Replace Node.js TODO with clear limitation comment - Struct alignment in main.go matches existing style * fix(security): second review round — pipe transport, revert /etc/hosts CRITICAL fixes: - Config passed via stdin pipe (not CLI arg or temp file) — eliminates /proc/cmdline exposure and disk persistence of CA keys + secrets - Shim reads config from stdin until EOF, parent closes write end SERIOUS fixes: - Streaming replacer: loop reading from src instead of returning (0, nil) when buffer has insufficient data (was violating io.Reader contract) - Set req.Host = hostname in Director (HTTP/1.1 Host header must match) - WebSocket relay: close connections after both directions done instead of CloseWrite on tls.Conn (sends close_notify, not TCP half-close) - Revert /etc/hosts from rw back to ro (writable /etc/hosts allows DNS hijacking — unrelated security regression) MODERATE: - Mark Go NewBoxCA() as test-only (production uses NewBoxCAFromPEM) - Actually rename box to sandbox in Python tests (previous sed failed) * refactor(proto): pass CA cert via gRPC CACert instead of env var Add CACert proto message and ca_certs field to ContainerInitRequest. The container's CA cert now flows explicitly via gRPC instead of the BOXLITE_CA_PEM env var. Two-layer CA injection: - Guest agent: still uses BOXLITE_CA_PEM env var (needs CA at boot, before gRPC is up) - Container: now receives CA via Container.Init gRPC ca_certs field (clean, scoped, no env var inheritance or cleanup needed) * refactor(guest): replace env var CA injection with CaInstaller - Rewrite ca_trust.rs as CaInstaller struct (source-agnostic, accepts PEM bytes) - Container.Init uses CaInstaller::with_bundle() for CA injection - Remove BOXLITE_CA_PEM env var injection from shim (no longer needed) - Remove install_ca_from_env() (guest agent doesn't make HTTPS calls) - Remove SSL_TRUST_VARS (CA is in default bundle path, no env vars needed) CA flow is now: Rust generates PEM → gRPC CACert field → CaInstaller writes to container's /etc/ssl/certs/ca-certificates.crt. Zero env vars. * refactor: extract named helpers from inline logic blocks Rust: - Secret::env_key() / env_pair() — placeholder env var format lives on Secret - GvproxyInstance::from_config() — creates instance + endpoint from NetworkBackendConfig (replaces 50-line inline block in shim) Go: - resolveUpstreamTLS() — deduplicates TLS config resolution from mitmAndForward and handleWebSocketUpgrade - linkLocalSubnet package-level init — parsed once, not per-packet * fix: remove NewBoxCA() from production, fix stale Go library detection - Delete NewBoxCA() from mitm.go — production uses NewBoxCAFromPEM only - Add newTestCA(t) test helper (generates CA in Go for tests) - Fix build.rs: watch each .go file individually (directory-level cargo:rerun-if-changed only detects file additions, not content changes) - Remove #[cfg(feature = "gvproxy")] from CA generation — rcgen is pure Rust, no Go dependency. The Python SDK doesn't enable gvproxy feature, so CA generation was silently skipped. * fix: address third review round (10 issues) 1. GvproxySecretConfig: custom Debug that redacts value (was derive(Debug)) 2. forked_tcp init(): panic on parse error instead of swallowing 3. Streaming replacer: boundary detection uses actual placeholder first bytes, not hardcoded '<' (supports custom placeholders) 4. WebSocket relay: close both connections when one direction finishes (prevents goroutine leak on hanging upstream) 5. Remove dead _engine_type param from ShimSpawner::new() 6. Secret::env_key(): validate name is alphanumeric/underscore/hyphen 8. GvproxyInstance::new() now pub(crate) — use from_config() instead * chore: remove stale CA trust comment from guest main.rs * fix: address review round 4 (8 issues) CRITICAL: - Secret::env_key() returns Result instead of panicking on bad names - Fix singleConnListener goroutine leak: replace http.Server + listener with direct HTTP/1.1 serving via ReadRequest loop (no Accept() block) SERIOUS: - Remove dead ca_cert_pem field from GvproxyInstance (Rule #4: Only What's Used) - Cert cache: check TTL on hits, evict expired certs, cap at 10000 entries - Document hardcoded Libkrun engine type MODERATE: - CA install tracks success count, logs error if zero installed - Document WebSocket frame substitution limitation - Fix pipe write comment (producer-consumer, not buffer size) * fix: address review round 5 (10 issues) CRITICAL: - Revert to http.Server for H1 with proper srv.Close() after connection ends (responseWriter was a broken reimplementation of HTTP) - Secret.value: add #[serde(skip_serializing)] SERIOUS: - Cert cache: evict expired first, then halve (was nuclear + race) - Secret::env_key() sanitizes invalid chars instead of panicking - safeBoundary returns min index across all prefix bytes - InstanceSpec.engine: #[serde(default)] + Default for VmmKind - Remove dead ca_cert_pem from GvproxyInstance TESTS: - test_secret_env_key_valid_names + sanitizes_invalid_names - test_secret_serde_value_skipped * feat: persist MITM CA to box dir for restart support The CA cert+key are now stored as files in ~/.boxlite/boxes/{id}/ca/: - cert.pem (0644) — public, injected into guest trust store - key.pem (0600) — private, passed to gvproxy On first start: generate + write. On restart: load from files. This ensures the same CA is used across restarts — the container rootfs already has the CA cert from the first start. The ca/ directory is NOT under shared/ (virtio-fs mount point), so the guest VM cannot read the private key. Also: - Remove #[serde(skip_serializing)] from Secret.value — it was silently breaking DB persistence (BoxConfig roundtrip lost values) - Fix test_secret_serde_json_fields (was broken by skip_serializing) - Add test_load_or_generate_persists_and_reloads

OwnedFfiPtr<T>::drop reconstructed `Box::from_raw(ptr)` and let `T`'s default Drop run. For Rust types with their own Drop (CBoxHandle is the only such variant in our usage) this is sufficient. For C-ABI repr(C) FFI structs (CImagePullResult, CImageInfoList, CBoxInfo, CBoxInfoList), Drop runs no destructor for raw pointer fields, so `CString::into_raw`'d allocations and `Vec::from_raw_parts` items leaked when the queue closed mid-flight (which OwnedFfiPtr exists to handle). Fix: extend OwnedFfiPtr<T> with an optional type-aware destructor hook. Producers who own a Rust-Drop-sufficient payload still call `OwnedFfiPtr::new(boxed)` (Box::drop runs). Producers who own an FFI struct with nested allocations call `OwnedFfiPtr::new_with(boxed, free_xyz_ptr)`, where `free_xyz_ptr` is the matching `boxlite_free_*` body (made `pub(crate)`). Drop dispatches based on the stored function pointer. Closure-parametric (over a single OwnedFfiPtr<T>) was chosen over 4 type-specific Owned* wrappers (Option A from REFLECT) because: - Single type, single API surface — minimum diff to existing code. - Producer site is explicit about which destructor applies via the constructor choice. - `unsafe fn(*mut T)` is a Copy zero-sized fn pointer — no runtime overhead beyond the existing AtomicPtr. Producer sites updated to `new_with`: - images.rs: image_pull -> free_image_pull_result, image_list -> free_image_info_list. - info.rs: get_info -> free_box_info_ptr, box_list -> free_box_info_list. CBoxHandle producers (box_handle.rs) keep `new` — Rust struct, Box::drop suffices. Reproducer (commit b358542a) flips red -> green: all 4 leaky variants reclaim the expected number of inner CStrings. Adjacent contracts also tested: - take() must disarm the type-aware destructor (otherwise the C consumer would double-free) -- new test `owned_ffi_ptr_with_free_take_disarms_drop`. - CBoxHandle path unchanged (existing `owned_ffi_ptr_reclaims_allocation_on_drop` still passes). Test isolation: a `pub(crate) static FREE_STR_LOCK: Mutex<()>` serializes the leak-reproducer tests so parallel cargo runs don't interleave their `FREE_STR_CALLS` snapshots. Full C SDK lib: 42 passed, 0 failed, 1 ignored. REFLECT note: variant enumeration (the cheap check) — listing all 6 payload types and writing down each one's destructor obligations — would have caught this in 10 minutes. Recorded in the round-3 reflection.

The 8 rounds of Codex review on this branch landed with commit-message narrative ("round-N", "Codex finding #M", "F1") leaked into source-file comments. Per CLAUDE.md, comments must explain why, not how a fix evolved — and references to specific commits rot as the codebase changes. The history is preserved in git. This commit cleans the production-source side: - sdks/c/src/lib.rs : drop "Codex round-3 finding #4" pointer - sdks/c/src/event_queue.rs : drop round-1/2/3/4 references in OwnedFfiPtr docs and test prologues - sdks/c/src/exec/execution.rs : drop round-3/4/6/7/8 references in field docs, execution_free, and the dead "round-7 reproducer" doc block whose test was deleted by round-8 - sdks/go/runtime.go : drop "(added by round-3 #3)" from activeHandles doc - sdks/go/bridge_callback.go : collapse 5x copy-pasted "(round-4 finding A)" comment block to a single "see claimHandleForDispatch" line (the helper's doc already explains it) No code changes — comments only. All boxlite-c (44) and boxlite (672) tests pass; cargo clippy --tests clean for both packages; go vet and go build clean for sdks/go.

* feat(c-ffi): post-and-drain async callback C API (phase 2 rust) Replace ~17 sync block_on C functions with async + callback variants backed by a per-runtime event queue. All async work spawns a Tokio task that pushes a typed RuntimeEvent to the queue; users dispatch callbacks on their own thread via boxlite_runtime_drain. Callbacks structurally cannot fire on a Tokio worker, eliminating the worker-park bug class that surfaced with the original boxlite_execute API. Highlights: - sdks/c/src/event_queue.rs (new): EventQueue (Mutex+Condvar over VecDeque, capacity 4096), RuntimeEvent enum, push_event with cooperative yield on full queue, plus all CBox*/CExecution*/ CRuntime* callback typedefs as extern "C" fn aliases. - sdks/c/src/runtime.rs: RuntimeHandle gains Arc<EventQueue>; add boxlite_runtime_drain (pop + dispatch with the queue lock released before user code) and convert boxlite_runtime_shutdown to async + cb. - sdks/c/src/box_handle.rs, copy.rs, images.rs, info.rs, metrics.rs: every block_on becomes a Tokio spawn that pushes the typed RuntimeEvent. Pre-spawn validation errors still return inline. - sdks/c/src/exec/execution.rs: replace boxlite_execute with boxlite_box_exec (sync handle creation) plus _on_stdout/_on_stderr/ _on_exit registration that lazily spawns per-stream pumps. _wait, _kill, _resize_tty become async + cb. _write_stdin remains sync. - sdks/c/include/boxlite.h: regenerated by cbindgen with all callback typedefs, async signatures, and the drain function. - sdks/c/tests/*.c: remove the C-level integration tests that exercised the removed sync API. Coverage of those code paths now lives in the Go SDK + runner integration suite. test_simple_api.c is preserved (uses the still-sync boxlite_simple_* surface). * test(c-ffi): callbacks fire on drain thread only (phase 2 regression guard) Adds three focused unit tests in sdks/c/src/event_queue.rs that guard the post-and-drain redesign's cardinal architectural invariant — callbacks NEVER fire on Tokio worker threads. These directly catch the bug class that motivated the redesign (May 2026 outage): [A] callbacks_fire_on_drain_thread_only — pushes 100 Stdout events from a 4-worker Tokio runtime, drains on the test thread, asserts the single recorded callback thread id == drain thread id and is disjoint from the captured worker-id set. [B] drain_blocks_user_thread_only — single-worker Tokio runtime, 5 sleep-100ms events drained on a separate std thread; asserts a Tokio canary task keeps ticking while drain is blocked, proving the worker is free. [C] pump_yields_when_queue_full — single-worker Tokio runtime, capacity=4 queue, 100 events posted concurrently with a canary; asserts the canary makes progress, proving the producer cooperatively yields rather than monopolizing the worker. Visibility tweak: extracted push_event_with_capacity as pub(crate) from push_event so [C] can exercise the cooperative-yield path with a small queue without touching the production QUEUE_CAPACITY default. Stability: each new test passed 5/5 back-to-back runs. Full lib suite is 24 passing (21 existing + 3 new); clippy clean; fmt clean. * feat(go-sdk): drain goroutine + cgo callbacks for async C API (phase 2 go) Adapt the Go SDK to the post-and-drain callback C API. Each Runtime starts a single drain goroutine that calls boxlite_runtime_drain in a loop; callbacks therefore fire on a Go-owned thread, so cgo callbacks can write to pipe writers, channels, and atomics without hijacking a Tokio worker. Per-method changes: - All async ops (Create/Get/Remove/Start/Stop/CopyInto/CopyOut/Metrics/ ListInfo/GetInfo/Shutdown/ImagePull/ImageList/ExecutionWait/Kill/ ResizeTTY) post a cgo.Handle pointing at a buffered result channel and block on select(channel, ctx.Done()). - StartExecution registers stdout/stderr/exit callbacks via the C streaming API; the runner-facing ExecutionOptions{Stdout,Stderr, OnStdout,OnStderr,TTY} shape is unchanged. - Stdin writes call boxlite_execution_write_stdin synchronously. bridge.{c,h} expose typed cb<X>() accessor functions that hand back the Go-exported callback symbols cast to the cbindgen-generated typedefs, so every cgo file in the package can pass C.cbXxx() directly into the API. The streaming cgo.Handle is intentionally leaked: stream events and the exit event are pushed concurrently by independent Rust pumps with no strict ordering guarantee, so deleting the handle from the exit callback could race a sibling stream callback's value lookup. The leak is bounded to one map entry plus a small struct per execution. * fix(shim): emit @loader_path/$ORIGIN rpath from shim's own build.rs The boxlite-shim split out of the boxlite crate in 3193f50 but the rpath link-args were left behind in src/boxlite/build.rs, where they no longer apply (boxlite is rlib-only post-refactor — cargo:rustc-link-arg is scoped to the emitting crate's binaries). Without LC_RPATH, the shim binary's libkrun cannot dlopen libkrunfw.<X>.dylib collected next to it in the runtime directory. Move the link-args to a new src/shim/build.rs and delete the dead emission in boxlite's build.rs. Verified: shim now has LC_RPATH @loader_path; make test:integration:c passes (was failing with "Couldn't find or load libkrunfw.5.dylib"). * fix(shim): move --allow-multiple-definition link-arg from boxlite to shim CI on Linux x86_64 fails with: invalid instruction `cargo:rustc-link-arg-bins` from build script of `boxlite v0.9.3`. The package boxlite v0.9.3 does not have a bin target. Same root cause as the rpath fix in this PR: boxlite is rlib-only post-#494, so cargo:rustc-link-arg-bins is rejected. Move the linker flag to src/shim/build.rs (where the libkrun staticlib std-symbol conflict is actually resolved at link time), keep cargo:rustc-link-arg- tests on boxlite for its integration test binaries. * chore(skills): add adversarial-iteration skill Captures the implement -> review -> reflect -> reproducer-first -> fix loop used during the round-1/round-2 Codex review iterations on this branch. Encodes the four cheap pre-implementation checks (timeline draw, upstream contract read, full lifecycle trace, variant enumeration) and the recurring failure shapes seen in this codebase, so future iterations can REFLECT on root cause instead of patching symptoms. * fix(c-ffi): round-1 + round-2 review follow-ups Bundles two adversarial-review rounds on the post-and-drain async callback C/Go SDK (PR #495). Squash-merging because the diffs interleave across event_queue.rs, execution.rs, runtime.rs, and the producer sites, with round-2 build-depending on round-1 (e.g. dispatch_handle_event<T> signature requires OwnedFfiPtr<T>). Round-1 findings: - NULL callback at FFI boundary invoked UB before any Rust code ran. Fix: typedefs are now Option<extern "C" fn(...)>; an unwrap_cb_or_return! macro at every public entrypoint rejects NULL with InvalidArgument. - Go SDK leaked cgo.Handle (and live VMs, for Create) when caller's ctx cancelled before the C task completed. Fix: abandonAsync / abandonAsyncErr / drainAndDelete helpers detach cleanup onto a goroutine so callers return ctx.Err() promptly AND the handle is reclaimed when the result eventually arrives. Round-2 findings: - Execution::wait held the inner mutex across recv().await, blocking kill/signal/resize_tty when wait was parked. Fix: split lock into inner (interface ops) and wait_state (OnceCell + rx); wait now &self. Knock-on: cargo clippy --fix removed obsolete `mut` at 58 callsites across test-utils, node SDK, and boxlite integration tests that no longer needed mutable bindings. - push_event_with_capacity dropped events on close, leaking Box::into_raw'd payloads for 6 lifecycle variants. Fix: OwnedFfiPtr<T> smart-pointer reclaims the heap on Drop, take() disarms on dispatch. - Per-execution cgo.Handle leaked because stream/exit callbacks were not strictly ordered. Fix: stream pumps signal completion via oneshot; exit_pump awaits all stream-dones before pushing Exit; execution_free synthesises an EXIT_CODE_FORCE_CLOSED Exit on abort. Go now safely deletes the handle in the exit callback. Tests: - src/boxlite/src/litebox/exec.rs: wait_does_not_block_kill (round-2 #1) - sdks/c/src/event_queue.rs: owned_ffi_ptr_* + closed-queue tests (round-2 #2) - sdks/c/tests/test_null_callback.c: NULL callback rejection (round-1) - sdks/go/abandon_async_test.go: round-1 abandonAsync helpers + TestExecutionStreamState_HandleDeletedOnExit (round-2 #3) * fix(c-ffi): claim-once Exit dispatch (round-3 #1) execution_free races exit_pump on Exit dispatch. execution_free reads `completed = false`, then proceeds through block_on(kill+wait) and a synth-Exit push; concurrently, exit_pump sets `completed = true` and pushes its own Exit. Both Exits land in the queue. The Go SDK's exit callback deletes the shared cgo.Handle on the first Exit; the second tries to Value() a freed handle and panics the drain goroutine. Fix: split the conflated `completed: Mutex<bool>` into two distinct AtomicBool fields, each with a single responsibility: - process_completed: "process exited naturally" -- gates the kill+wait branch in execution_free. - exit_dispatched: "Exit was published to the queue" -- both execution_free and exit_pump compare_exchange(false, true, AcqRel, Acquire); only the claimer pushes Exit. The two-AtomicBool design replaces a Mutex-based check-then-act guard that had a TOCTOU window arbitrarily wide (block_on(kill+wait) sat inside it). compare_exchange is the idiomatic claim-once primitive; conflating process state and dispatch state is what created the original bug. Reproducer (commit 09e73ac2) flips red -> green: test ::race_produces_at_most_one_exit_event ... ok Helper-level invariant verified: two concurrent dispatches through `dispatch_exit_test_helper` produce exactly 1 Exit event regardless of interleaving. Adjacent contracts to verify (out of scope for this commit; covered by integration tests in subsequent loop iterations): - Force-close after natural completion: exit_pump claims, execution_free sees claim taken, skips push. - Force-close before natural completion: execution_free claims, exit_pump (still alive) sees claim taken, skips push. - Process-already-completed: execution_free skips kill+wait on process_completed=true; still attempts the dispatch claim (in case exit_pump hadn't pushed yet). * fix(c-ffi): forward stream chunks byte-exact (round-3 #2) stdout_pump and stderr_pump appended `b'\n'` to every chunk before pushing the Stdout/Stderr event with a comment claiming to "restore the line terminator that the upstream stream stripped". The upstream does not strip terminators. Per `boxlite::portal::interfaces::exec::route_output`: String::from_utf8_lossy(&chunk.data).to_string() — raw byte chunks forwarded as-is, no line splitting, no newline stripping. The variable name `line` was a fiction; chunks are arbitrary byte boundaries. Concrete corruption observed: - `printf hello` (no trailing newline) -> consumer saw `hello\n`. - Multi-chunk output -> extra `\n` injected at chunk boundaries. Fix: delete `bytes.push(b'\n')` and the misleading comment in both pumps. Rename local `line` -> `chunk` to match upstream terminology and prevent future readers from making the same incorrect assumption I did. Reproducer (commit 04ca0459) flips red -> green: test ::stdout_pump_forwards_chunks_byte_exact ... ok test ::stderr_pump_forwards_chunks_byte_exact ... ok The reproducer covers chunks with no trailing newline, multi-chunk sequences, already-newlined chunks, and control-byte-bearing chunks. Full C SDK lib suite: 37 passed, 0 failed. REFLECT note: Upstream contract read (a 30-second `grep` on `stdout_tx.send` in `src/boxlite/src/portal/`) would have caught this. Recorded in the round-3 reflection. * fix(go-sdk): broadcast Close to in-flight async callers (round-3 #3) Runtime.Close stopped the drain goroutine BEFORE freeing the C runtime handle. Any concurrent async caller (Create, Get, Shutdown, Pull, ImageList, Info, Metrics, Start, Stop, Copy, Wait, Kill, ResizeTTY, ListInfo, GetInfo) blocked on `select { case res := <-ch: case <-ctx.Done(): }` would never receive a result after stopDrain killed the only goroutine that pumps events from C to Go. With a non-cancellable ctx, the caller blocked forever and its abandonAsync cleanup goroutine (which shares the same ch) also stranded — leaking cgo.Handles unboundedly across long-lived processes. Fix: a `closing chan struct{}` on Runtime, closed by Close BEFORE stopDrain runs. Every async caller's select now has three cases: ch, ctx.Done(), and closing. When Close fires `closing`, all in-flight callers wake up and return ErrRuntimeClosed (*Error{Code: ErrInvalidState}). The detached abandonAsync / abandonAsyncErr / drainAndDelete goroutines also select on closing so their cgo.Handle Delete runs immediately rather than waiting forever for a Tokio task whose result event will be dropped by the now-closed C queue. Why closing-channel broadcast (not context.Context, not sync.Cond): closed channels are the idiomatic Go signal-many primitive; readers familiar with `<-ctx.Done()` recognise the pattern instantly. Adds one field + one extra select case per call site. No new types, no new dependency. Files touched (15 select sites + 3 helper signatures + Runtime struct + Close ordering): runtime.go: Runtime{closing,closingOnce}, NewRuntime init, Close closes closing first, Shutdown/Create/Get/ removeBox selects, abandonAsync helpers refactored. errors.go: ErrRuntimeClosed sentinel. exec.go: Execution{closing}, StartExecution wires it, Wait/Kill/ResizeTTY selects. images.go: Pull/List selects. info.go: ListInfo/GetInfo selects. metrics.go: Runtime.Metrics/Box.Metrics selects. box_handle.go: Start/Stop selects. copy.go: CopyInto/CopyOut selects. Reproducer (commit 3642e2c1) flips red -> green: TestRuntimeCloseDoesNotStrandPendingPull ... ok Pull unblocked 9.389333ms after Close; err = boxlite: runtime closed (code=4) Adjacent-contract tests (also passing): TestAbandonAsync_RespondsToCloseSignal TestAbandonAsyncErr_RespondsToCloseSignal TestDrainAndDelete_RespondsToCloseSignal plus the existing round-1/round-2 helper tests retrofitted to the new (ch, h, closing[, cleanup]) signatures. REFLECT note: full lifecycle trace (the cheap check) drawn during REFLECT showed the gap between stopDrain and free was where in-flight callers got stranded; the fix puts a wakeup mechanism in front of that gap. * fix(c-ffi): type-aware OwnedFfiPtr destructor (round-3 #4) OwnedFfiPtr<T>::drop reconstructed `Box::from_raw(ptr)` and let `T`'s default Drop run. For Rust types with their own Drop (CBoxHandle is the only such variant in our usage) this is sufficient. For C-ABI repr(C) FFI structs (CImagePullResult, CImageInfoList, CBoxInfo, CBoxInfoList), Drop runs no destructor for raw pointer fields, so `CString::into_raw`'d allocations and `Vec::from_raw_parts` items leaked when the queue closed mid-flight (which OwnedFfiPtr exists to handle). Fix: extend OwnedFfiPtr<T> with an optional type-aware destructor hook. Producers who own a Rust-Drop-sufficient payload still call `OwnedFfiPtr::new(boxed)` (Box::drop runs). Producers who own an FFI struct with nested allocations call `OwnedFfiPtr::new_with(boxed, free_xyz_ptr)`, where `free_xyz_ptr` is the matching `boxlite_free_*` body (made `pub(crate)`). Drop dispatches based on the stored function pointer. Closure-parametric (over a single OwnedFfiPtr<T>) was chosen over 4 type-specific Owned* wrappers (Option A from REFLECT) because: - Single type, single API surface — minimum diff to existing code. - Producer site is explicit about which destructor applies via the constructor choice. - `unsafe fn(*mut T)` is a Copy zero-sized fn pointer — no runtime overhead beyond the existing AtomicPtr. Producer sites updated to `new_with`: - images.rs: image_pull -> free_image_pull_result, image_list -> free_image_info_list. - info.rs: get_info -> free_box_info_ptr, box_list -> free_box_info_list. CBoxHandle producers (box_handle.rs) keep `new` — Rust struct, Box::drop suffices. Reproducer (commit b358542a) flips red -> green: all 4 leaky variants reclaim the expected number of inner CStrings. Adjacent contracts also tested: - take() must disarm the type-aware destructor (otherwise the C consumer would double-free) -- new test `owned_ffi_ptr_with_free_take_disarms_drop`. - CBoxHandle path unchanged (existing `owned_ffi_ptr_reclaims_allocation_on_drop` still passes). Test isolation: a `pub(crate) static FREE_STR_LOCK: Mutex<()>` serializes the leak-reproducer tests so parallel cargo runs don't interleave their `FREE_STR_CALLS` snapshots. Full C SDK lib: 42 passed, 0 failed, 1 ignored. REFLECT note: variant enumeration (the cheap check) — listing all 6 payload types and writing down each one's destructor obligations — would have caught this in 10 minutes. Recorded in the round-3 reflection. * fix(c-ffi): wait for exit_pump dispatch instead of abort (round-4 B) Codex round-4 finding: exit_pump's claim-then-push pattern (round-3 #1) flips `exit_dispatched=true` BEFORE `push_event.await` returns. Under queue backpressure the push yields cooperatively. During that yield window, `execution_free` would: 1. Read `exit_dispatched == true` → skip its synth Exit push. 2. Call `pumps.abort()` → tear down exit_pump mid-yield. Net: claim taken, no Exit pushed. Go SDK exit callback never fires, cgo.Handle leaks, "exactly one Exit per execution" invariant violated. Fix: separate exit_pump from the abortable stream-pump set. Track it in a dedicated `exit_pump_handle: Mutex<Option<JoinHandle<()>>>`. `execution_free` now branches: - We win the dispatch claim: push synth Exit, then abort exit_pump (which will see the claim taken and no-op when its push resumes). - exit_pump won the claim: wait (bounded 5s) for the task to finish so its push completes, instead of aborting it. Stream pumps continue to be aborted as before. Why wait-for-completion (option a from the test commit) over a tri-state machine (option b): smaller diff, follows the "boring code" principle — "wait for the task that took the lock" is more obvious than "tri-state with timeout fallback". The 5s timeout bounds worst-case teardown if the drainer is genuinely stuck. Reproducer (commit a9b8bd08) updated to mirror the new flow: - capacity-1 queue + filler event. - spawn exit_pump-shaped task (claim, push, yield on backpressure). - on observation of the claim, spawn a drain task that pops the filler after 20ms. - wait (bounded) for the pump to finish, instead of aborting. - assert marker Exit reached the queue. test ::aborted_exit_pump_after_claim_must_still_dispatch_exit ... ok Polarity: the test was reverse-engineered to fail under the old abort-then-drain shape (committed at a9b8bd08); after this fix the production wait-for-completion shape passes. Full C SDK lib: 43 passed, 0 failed, 1 ignored. REFLECT note (round-4 unifying shape): "Helper-level test passed the primitive's contract; system invariant under stress (saturated queue + shutdown abort race) failed." The round-3 #1 helper test verified two concurrent claims collapse to one push, but it never saturated the queue or exercised abort-mid-push. Recorded for the skill's Common Shapes list. Round-4 finding A (cgo.Handle race during Close) — separate next iteration. * fix(go-sdk): single-path cgo.Handle ownership (round-4 finding A) Codex round-4 finding A: Runtime.Close fires r.closing BEFORE stopDrain returns. During the up-to-100ms drain blocking call between close(r.closing) and stopDrain completing, the drain goroutine may still dispatch a queued C event whose Go callback Value+Delete's the same cgo.Handle that abandonAsync's closing branch (round-3 #3 fix) is also Deleting. cgo.Handle.Delete panics on double-Delete; Value() panics if Delete already ran. Either path winning the race produces a panic, turning a clean shutdown into a process crash. Fix: single-path ownership via a global registry of active handles. - `var activeHandles sync.Map` (key: uintptr(handle)). - `registerHandleForDispatch(cgo.NewHandle(...))` at every async-op call site (16 sites across runtime.go, exec.go, images.go, info.go, metrics.go, box_handle.go, copy.go). - `claimHandleForDispatch(h) bool` does atomic LoadAndDelete; first caller wins (returns true), subsequent callers no-op (return false). Exactly one caller proceeds to Value/Delete. - `deleteHandleForDispatch(h)` is the claim-then-Delete shorthand used by synchronous-failure paths and abandonAsync helpers. Every Value/Delete site now claims first: - bridge_callback.go (12 dispatch callbacks): claim before `defer h.Delete()`. If claim fails, return without touching the handle — abandonAsync's closing branch already owns it. - abandonAsync / abandonAsyncErr / drainAndDelete: claim in BOTH the ch and closing branches. (This also fixes a latent double-Delete from round-1 where the dispatch's `defer h.Delete()` raced abandonAsync's `case res := <-ch:` h.Delete() — same primitive eliminates both.) - Synchronous-failure paths (`if code != C.Ok { ... }`): use `deleteHandleForDispatch` for symmetry; here there's no race but consistency keeps the pattern uniform. Why a global sync.Map vs per-Runtime registry: dispatch callbacks in bridge_callback.go only have `userData *unsafe.Pointer` — no Runtime reference. Threading Runtime through every callback would be invasive. The handle's uintptr is unique within the process so a global map is the minimum-friction shape. Why LoadAndDelete vs sync.Once-per-handle: LoadAndDelete is one atomic op on an existing data structure. sync.Once-per-handle would need a per-handle Once value (extra struct allocation per call). Reproducer (commit 7749b68b) flips red -> green: test TestHandleOwnership_NoDoubleDeleteRaceDuringClose ... PASS (500/500 panics on the unfixed code; 0/500 after fix.) Adjacent contracts (all passing): - TestAbandonAsync_RunsCleanupOnSuccess - TestAbandonAsync_SkipsCleanupOnError - TestAbandonAsyncErr_DeletesHandle - TestDrainAndDelete_DeletesHandle - TestAbandonAsync_RespondsToCloseSignal - TestAbandonAsyncErr_RespondsToCloseSignal - TestDrainAndDelete_RespondsToCloseSignal - TestExecutionStreamState_HandleDeletedOnExit (round-2 #3) - TestRuntimeCloseDoesNotStrandPendingPull (round-3 #3) Streaming exec handle (`streamHandle` in exec.go, owned by dispatchExit in bridge_callback.go) is intentionally NOT registered: it has only one Delete path (the per-execution exit dispatch chain — round-2 #3 ordering invariant), no race to coordinate. REFLECT note: the unifying shape across rounds 3-4 is "two paths both taking ownership of the same resource without coordination." Round-3 #1 (Exit dispatch) addressed it with compare_exchange on exit_dispatched at the C-FFI level. Round-4 finding A applied the same shape to cgo.Handle ownership at the Go SDK level. * fix(c-ffi+go-sdk): teardown ordering + payload reclamation (round-6) Codex round-6 surfaced two siblings of prior teardown findings. Both are addressed in this single commit because they share the "forced-tear-down path didn't preserve invariants the natural-path established" shape. Finding 1 (sdks/c/src/exec/execution.rs:566-600): execution_free's force-close path published the synthetic Exit BEFORE aborting stdout/stderr pumps. Under backpressure or scheduling, a stream pump could enqueue Stdout/Stderr AFTER the synth Exit. The Go exit callback Deletes the per-execution cgo.Handle; a later stream callback would Value() a deleted handle and panic. This violated the round-3 #1 "Exit is strictly last" invariant on the force-close path (the natural-completion path via exit_pump correctly awaits each stream pump's done_tx before pushing Exit; force-close skipped that step). Fix: abort stream pumps FIRST in execution_free, then run the exit-dispatch claim race + synth Exit push. Aborting drops a pump's future before the next push_event re-check, so no Stdout/Stderr lands after the Exit. exit_pump (handled separately via the dedicated exit_pump_handle from round-4 #B) is unaffected. Finding 2 (sdks/go/runtime.go:387-405 and bridge_callback.go): when abandonAsync's closing branch claimed the cgo.Handle (round-4 #A primitive), a queued success callback that fired AFTER the claim returned silently without freeing its received C-side payload. Rust had already transferred ownership via OwnedFfiPtr::take() before invoking the callback, so neither side reclaimed the heap. For Create this leaks a live VM. For Get/ImagePull/ImageList/Info/InfoList it leaks the OwnedFfiPtr<T> allocation including its nested CString::into_raw'd inner pointers. Fix: introduce `claimOrFreePayload[P](h, payload, free)` — the dispatch-callback gate that claims, and on claim-failure invokes `free(payload)` before returning. Each affected callback in bridge_callback.go (5 sites: Create, Get, ImagePull, ImageList, Info, InfoList) is updated to use it with the matching `boxlite_free_*` C function. Error-only callbacks (Start/Stop/ Remove/Copy/Shutdown/Wait/Kill/Resize) keep using bare claimHandleForDispatch — no payload to reclaim. Reproducer (commit cc483053) flips red -> green: test TestClaimOrFreePayload_FreesPayloadWhenClaimAlreadyTaken ... PASS (got 0 invocations on the unfixed stub; got 1 after fix.) Adjacent-contract tests (also passing): TestClaimOrFreePayload_DoesNotFreeWhenClaimSucceeds — claim-success path does NOT call free (caller owns the payload normally). Plus all round-3 + round-4 reproducer tests + helper tests. Note on finding 1: I did not write a separate runtime test because the bug is fundamentally timing-dependent (race between stream pump's push_event yield and execution_free's abort). Code-review verification is the regression guard; the round-3 #1 invariant "Exit is strictly last" remains the explicit contract. Round-6 unifying shape (recorded for future REFLECT): "Forced- tear-down path didn't preserve invariants the natural-path established." Round-3 #1 (Exit ordering) addressed the natural path; round-4 #B added force-close ordering for exit_pump itself; round-6 finding 1 extends that to stream pumps; round-6 finding 2 extends payload reclamation to claim-failure paths. The recurring sibling pattern across rounds 3-6 suggests the close-path design warrants a single architectural pass once the immediate fixes land. * fix(c-ffi): only mark process_completed on Ok wait (round-8 F1) Codex round-8 F1: round-7's unconditional `process_completed=true` treated a failed wait as proof the process exited. For untrusted/sandboxed code, a wait_on_clone error (e.g. transport loss to the guest agent) may mean the process is still running. execution_free uses process_completed to skip kill+wait cleanup; under round-7 a transient error caused the handle to be freed without termination, leaving a live process running. Fix: guard the store with `result.is_ok()`. Errored waits leave process_completed false; execution_free correctly retries kill+wait. Ok waits set the flag and skip the redundant cleanup (the round-7 invariant — "natural exit shouldn't get EXIT_CODE_FORCE_CLOSED layered on top" — is preserved). Reproducer (commit 63b883a6) flips red -> green: test ::execution_wait_must_not_mark_process_completed_on_error ... ok Plan-mode discipline (per skill, no user gate): Approaches considered: (A) Only on Ok. RECOMMENDED, picked. Smallest change; matches the security invariant "wait observed != process exited". (B) Split into wait_observed + process_exited. Bigger refactor; deferred to architectural pass for round-9 B (F2 + F3). Codex's round-8 F1 recommendation explicitly named (A); picking it directly. Adjacent contracts (verified): - Empty-handle wait (Err) leaves process_completed=false. - Successful wait still sets process_completed=true (round-7 invariant; covered by boxlite/src/litebox/exec.rs::tests where a stub backend can produce Ok results). - exit_pump path unchanged. - Full C SDK lib: 44 passed. Round-8 F2 (Box ops schedule then check closing) and F3 (Create cleanup skipped on close) remain — these are siblings of the recurring teardown shape and are addressed in round-9 B as the 3-phase shutdown architectural change. * chore(c-ffi+go-sdk): drop iteration-round narrative from source comments The 8 rounds of Codex review on this branch landed with commit-message narrative ("round-N", "Codex finding #M", "F1") leaked into source-file comments. Per CLAUDE.md, comments must explain why, not how a fix evolved — and references to specific commits rot as the codebase changes. The history is preserved in git. This commit cleans the production-source side: - sdks/c/src/lib.rs : drop "Codex round-3 finding #4" pointer - sdks/c/src/event_queue.rs : drop round-1/2/3/4 references in OwnedFfiPtr docs and test prologues - sdks/c/src/exec/execution.rs : drop round-3/4/6/7/8 references in field docs, execution_free, and the dead "round-7 reproducer" doc block whose test was deleted by round-8 - sdks/go/runtime.go : drop "(added by round-3 #3)" from activeHandles doc - sdks/go/bridge_callback.go : collapse 5x copy-pasted "(round-4 finding A)" comment block to a single "see claimHandleForDispatch" line (the helper's doc already explains it) No code changes — comments only. All boxlite-c (44) and boxlite (672) tests pass; cargo clippy --tests clean for both packages; go vet and go build clean for sdks/go. * chore(tests): describe invariants instead of review rounds Section headers and assertion messages in test files referenced "Codex round-N finding #M" rather than the invariant under test. Future debuggers reading a failure see the bug, not the review iteration that found it. - sdks/c/src/tests.rs : "Codex finding #3" header → "NULL-callback rejection" - sdks/go/abandon_async_test.go : drop round-2/3 references in headers and bodies - sdks/go/handle_payload_test.go : drop round-4/6 references; describe the leak path - sdks/go/handle_race_test.go : drop round-3/4 references; describe the race - sdks/go/runtime_close_test.go : drop "(round-3 #3)" from a t.Fatalf message a future debugger would see in CI logs - src/boxlite/src/litebox/exec.rs : "Codex round-2 finding #1" header → "wait must not block kill" * chore(skills): add COMPACT step to adversarial-iteration loop Document a per-round compaction step (squash test+fix, audit round-N narrative out of source comments, tree-identity check) so the next adversarial review reads canonical code rather than a debugging notebook. Also tighten the trigger/non-trigger guidance.

#1 (security, blocking): fix root-home asymmetry in boxlite-disable-ssh. The script hard-coded HOME_DIR="/home/$UNIX_USER", so disable for root targeted /home/root/.ssh/.ssh_enabled (non-existent) instead of /root/.ssh/.ssh_enabled, leaving the marker on disk and allowing boxlite-ensure-ssh to silently re-enable sshd on the next VM restart — bypassing API-level revoke on the default code path. boxlite-ai#2: update sshUserOrDefault() doc and test to reflect "root" default. The comment claimed "boxlite" and TestSSHUserDefaultsToBoxlite asserted the old value; both now match the implementation. boxlite-ai#3: delete per-box mutex in cleanupSSHOnDestroy to prevent unbounded growth of boxSSHMu across the runner process lifetime. boxlite-ai#4: redact SSH bearer tokens in gateway log lines with redactToken() (keeps 8-char prefix for correlation, replaces remainder with "…"). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bump version to 0.2.2 for all packages

7420d23

Signed-off-by: dorianzheng <xingzhengde72@gmail.com>

DorianZheng merged commit 943e440 into main Dec 10, 2025
2 checks passed

DorianZheng deleted the release/v0.2.2 branch December 10, 2025 10:35

yingjunwu mentioned this pull request Feb 9, 2026

feat(sdk): expose resize_tty in Python SDK and add AI agent guide #220

Merged

5 tasks

DorianZheng mentioned this pull request Feb 13, 2026

feat(api): add OpenAPI 3.1.1 spec for cloud sandbox REST API #254

Merged

7 tasks

lilongen mentioned this pull request Apr 21, 2026

feat(litebox): add pause/resume API for zero-CPU VM freezing #413

Open

6 tasks

lilongen mentioned this pull request May 19, 2026

feat(ssh): real-SSH access for sandboxes via in-VM sshd + gvproxy port forwarding #541

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump version to 0.2.2 for all packages#4

Bump version to 0.2.2 for all packages#4
DorianZheng merged 1 commit into
mainfrom
release/v0.2.2

DorianZheng commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DorianZheng commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant