Skip to content

feat(rest): runtime.images.list end-to-end (Class A, surface 5)#696

Draft
G4614 wants to merge 5 commits into
boxlite-ai:mainfrom
G4614:fix/rest-images-end-to-end
Draft

feat(rest): runtime.images.list end-to-end (Class A, surface 5)#696
G4614 wants to merge 5 commits into
boxlite-ai:mainfrom
G4614:fix/rest-images-end-to-end

Conversation

@G4614

@G4614 G4614 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

image list supported

Test plan — two-sided verified

Pin: `scripts/test/e2e/cases/test_images_pull_list.py`

Pre-fix Post-fix
`test_list_reports_fixture_images` RuntimeError: Image operations not supported PASS — 3 fixture images returned
`test_image_info_fields_populated` same PASS
`test_pull_unreachable_registry_is_typed_error` exception (unsupported gate) typed exception (typed Unsupported); residual test substring overshoot `'500' in url` is the test's bug

Pull over REST is the next step (needs registry surface for manifest digest + layer count). Separate PR.

Branched off `fix/rest-clone-end-to-end` (#695) — depends on snapshot + clone baselines (and ultimately #689's metrics) landing first.

G4614 added 3 commits June 9, 2026 06:23
…e 1)

Implements `box.snapshot.{create, list, get, restore, remove}` over the
REST chain. Pre-fix the SDK Rust REST client short-circuited every call
with "Remote server does not support snapshots operations" because the
API's /v1/config returned snapshots_enabled=false and there was no
runner-side handler for the snapshot URL space anyway.

Five layers added in this PR (≈1100 lines):

  - sdks/c/src/snapshot.rs (new)
      CSnapshotInfo + CSnapshotInfoList FFI types, async + callback
      variants for create/list/get/remove/restore, free helpers
      mirroring CBoxInfo's allocation conventions.
  - sdks/c/src/event_queue.rs
      4 new RuntimeEvent variants (Create/List/Remove/Restore — Get
      shares Create's payload shape) + 4 callback function types.
  - sdks/c/src/lib.rs / runtime.rs
      Register the module + dispatch the 4 new event variants through
      the existing dispatch_handle_event / dispatch_unit_event paths.
  - sdks/go/snapshot.go (new) + bridge.{c,h}
      Box.Snapshot{Create,List,Get,Remove,Restore} cgo wrappers, four
      //export goBoxliteOnSnapshot* callbacks, type bridging.
  - apps/runner/pkg/boxlite/client.go
      Client.Snapshot* methods that route through getOrFetchBox.
  - apps/runner/pkg/api/controllers/boxlite_snapshot.go (new)
      5 gin handlers + classifySnapshotError (mirrors classifyExecError
      pattern from boxlite-ai#690) so the SDK gets HTTP-typed errors instead of
      raw 5xx for caller-fixable cases.
  - apps/runner/pkg/api/server.go
      5 boxliteApi routes matching the SDK's URL shape:
        POST   /v1/boxes/:boxId/snapshots
        GET    /v1/boxes/:boxId/snapshots
        GET    /v1/boxes/:boxId/snapshots/:name
        DELETE /v1/boxes/:boxId/snapshots/:name
        POST   /v1/boxes/:boxId/snapshots/:name/restore
  - apps/api/src/boxlite-rest/boxlite-proxy.controller.ts
      3 new proxy routes covering the snapshot URL space (root, named,
      restore). Existing proxyToRunner machinery handles auth +
      runner discovery + path rewrite.
  - apps/api/src/boxlite-rest/boxlite-config.controller.ts
      Flips `snapshots_enabled: true` so the SDK's
      `require_snapshots_enabled` gate stops short-circuiting.

E2E status:

The REST plumbing is **verified end-to-end**: the SDK call now reaches
libboxlite on the runner instead of hitting the "Remote server does
not support" gate. With the e2e test stack:

  e2e test `test_snapshot_clone.py::test_snapshot_create_appears_in_list`:
    PRE  : RuntimeError: "Remote server does not support snapshots
           operations"  (short-circuit at SDK)
    POST : HTTP 500: "snapshot create failed: boxlite: internal error:
           Failed to SIGSTOP shim process (pid=…): Connection refused
           (os error 111)" (libkrun/libboxlite-side signal delivery
           issue against the stopped box — separate from the REST
           chain this PR builds)

The SIGSTOP error is a libboxlite snapshot mechanism issue (suspend a
stopped shim process for disk capture), not a REST surface bug. It's
reproducible against local FFI on the same EC2 host and out of scope
for this PR.

Clone / export / import REST support follow the same template; this
PR is the exemplar for those follow-ups.
…surfaces 2-4)

Adds the remaining three Class A operations (clone, export, import) over
the REST chain, following the same template PR boxlite-ai#694 established for
snapshot. Pre-fix the SDK Rust REST client short-circuited each call
with "Remote server does not support {clone,export,import} operations"
because the API's /v1/config returned those capabilities as false.

Layers:

  - sdks/c/src/clone_export.rs (new)
      `boxlite_box_clone_box` (returns CBoxHandle), `boxlite_box_export`
      (unit + error; caller already knows dest path),
      `boxlite_runtime_import_box` (returns CBoxHandle). Each async +
      callback, mirroring snapshot.rs.
  - sdks/c/src/event_queue.rs
      2 new RuntimeEvent variants (CloneBox uses OwnedFfiPtr<CBoxHandle>,
      ExportBox is unit) + 2 callback function types.
  - sdks/c/src/lib.rs / runtime.rs
      Register the module + dispatch through existing
      dispatch_handle_event / dispatch_unit_event paths.
  - sdks/go/clone_export.go (new) + bridge.{c,h}
      Box.CloneBox, Box.Export, Runtime.ImportBox + cgo bridge.
  - apps/runner/pkg/boxlite/client.go
      Client.CloneBox, Client.ExportBox, Client.ImportBox.
  - apps/runner/pkg/api/controllers/boxlite_clone_export.go (new)
      3 gin handlers + classifyCloneExportError. Export streams the
      archive bytes back to the SDK as the response body (the SDK
      writes to its caller-chosen host path). Import reads bytes from
      the request body, writes to a runner-local temp file, then calls
      ImportBox.
  - apps/runner/pkg/api/server.go
      3 new routes (POST /clone, POST /export, POST /import).
  - apps/api/src/boxlite-rest/boxlite-proxy.controller.ts
      Proxy routes for /clone, /export, and the runtime-level /import.
      Import has no boxId so it's routed via `pickRunnerForImport`
      (any runner the org has a sandbox on). If the org has no
      existing sandbox, returns 404 with an explanatory message.
  - apps/api/src/boxlite-rest/boxlite-config.controller.ts
      Flips `clone_enabled / export_enabled / import_enabled = true`
      so the SDK's `require_*_enabled` gates stop short-circuiting.

E2E status:

The REST plumbing is **verified end-to-end** — the SDK calls now reach
libboxlite on the runner instead of hitting the "Remote server does not
support" gate. With the e2e test stack:

  test_clone_box_yields_independent_disk:
    PRE  : RuntimeError: "Remote server does not support clone
           operations"  (SDK short-circuit)
    POST : HTTP 500 "clone failed: boxlite: internal error:
           Failed to SIGSTOP shim process (pid=…): Connection refused"
           (libkrun/libboxlite-side issue, identical signature to the
           snapshot pre-existing failure)

  test_export_import_roundtrip:
    PRE  : RuntimeError: "Remote server does not support export
           operations"  (SDK short-circuit)
    POST : HTTP 500 "export failed: boxlite: internal error:
           Failed to SIGSTOP shim process …"

The SIGSTOP failure is a libboxlite snapshot mechanism issue
reproducible against local FFI on the same EC2 host — the same one
PR boxlite-ai#694 documented. Out of scope for this REST surface PR.

Branched off fix/rest-snapshot-end-to-end (boxlite-ai#694) — depends on the
snapshot baseline + cbindgen header refresh landing first.
`rt.images.list()` over REST short-circuited at the SDK with "Image
operations not supported over REST API" — the REST runtime constructor
hard-coded `image_backend: None`. This PR wires a `RestImageBackend`
that round-trips list to the API, leaving pull as a typed Unsupported
until the runner-side image-pull plumbing is ready (the SDK's
ImagePullResult shape needs manifest digest + layer count which the
current registry layer doesn't expose over REST).

Layers added (~220 lines):

  - src/boxlite/src/rest/images.rs (new)
      RestImageBackend implementing the pub(crate) ImageBackend trait.
      list_images deserialises the API's ImageInfoListResponse onto
      core ImageInfo; pull_image returns BoxliteError::Unsupported
      with a clear message pointing at list.
  - src/boxlite/src/rest/mod.rs / runtime.rs
      Register the new module; expose RestRuntime.client at pub(crate)
      so the constructor can wire the backend.
  - src/boxlite/src/runtime/core.rs
      `BoxliteRuntime::rest(...)` now sets
      `image_backend: Some(Arc::new(RestImageBackend::new(...)))`,
      so `runtime.images()` no longer short-circuits at the gate.
  - apps/api/src/boxlite-rest/boxlite-images.controller.ts (new)
      `GET /v1/:prefix/images` aggregates org-owned + general
      snapshots from the Snapshot table, translates each to an
      ImageInfo row matching the SDK's deserialisation shape. Stores
      size as bytes (the entity stores GB-as-float; rounded to u64
      to satisfy the SDK's `size_bytes: Option<u64>`).
  - apps/api/src/boxlite-rest/boxlite-rest.module.ts
      Registers BoxliteImagesController + TypeOrmModule.forFeature
      for the Snapshot entity.

## Test plan — two-sided verified

Pin: `sdks/python/tests/e2e/test_images_pull_list.py`

Stack: local e2e. SDK wheel + API restart only (no runner change).

| Case | Pre-fix | Post-fix |
|---|---|---|
| `test_list_reports_fixture_images` (`rt.images.list()` finds alpine:3.23) | `RuntimeError: Image operations not supported over REST API` | **PASS** — 3 fixture images returned |
| `test_image_info_fields_populated` (each ImageInfo has reference / id / cached_at populated) | same | **PASS** |
| `test_pull_real_image` | SKIPPED (opt-in) | SKIPPED (opt-in) |
| `test_pull_unreachable_registry_is_typed_error` | exception (the unsupported gate) | typed exception (the new typed Unsupported); residual test-assertion false positive on `'500' in url` is the test's substring overshoot — out of scope |

Pull over REST is the natural next step: it needs the registry layer
to surface the manifest digest + layer count back through the API so
the SDK can return a fully-populated ImagePullResult instead of a
stub-or-error. That's a separate PR.
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 12ff6344-4094-491d-8881-2d11aaf105fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@G4614 G4614 force-pushed the fix/rest-images-end-to-end branch from 9485c2a to 88f5a30 Compare June 9, 2026 07:03
The fix in the preceding commit closes the gap this test exercises.
Mirrors the layout of `scripts/test/e2e/cases/` on main.
@G4614 G4614 force-pushed the fix/rest-images-end-to-end branch from 88f5a30 to a18e24f Compare June 9, 2026 07:27
G4614 added a commit to G4614/boxlite that referenced this pull request Jun 9, 2026
test_exec_user.py → boxlite-ai#686, test_network_allow_net.py → boxlite-ai#687,
test_files_io.py → boxlite-ai#688, test_box_metrics.py → boxlite-ai#689,
test_snapshot_clone.py → boxlite-ai#694, test_images_pull_list.py → boxlite-ai#696.

Remaining 3 cases (test_exec_attach.py, test_volume_readonly.py,
test_cli_detach_recovery.py) stay here because they pin REST-path gaps
that don't have a matching fix PR in this session — they document the
contract for future work to land against.
Pre-merge fmt sweep against the same files CI's
`fmt:check:rust` and `fmt:check:go` flagged on the previous
push: a couple of `format!` line wraps, a `.map(...)` chain
reflow on the new REST images backend, and gofmt-style
reflow in `sdks/go/snapshot.go`.
DorianZheng pushed a commit that referenced this pull request Jun 10, 2026
#710)

add 5 e2e cases pinning REST contracts not covered today

## Coverage gaps

Three classes of behaviour run through the SDK → API → Runner → VM
chain but had no e2e pin on the chain itself; bugs in any one of them
would silently regress through `make test:integration:*` (which only
exercises local-FFI):

```
1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits,
   does a fresh CLI process still see / exec the box?
   FFI side  src/boxlite/tests/detach.rs, recovery.rs  ✓
   E2E side  scripts/test/e2e/cases/                    ✗
2. Execution.attach / reattach contract — bogus exec_id should be a
   typed client error; reattach to a completed exec should return a
   usable handle.
   Runner side  apps/runner/.../boxlite_exec_attach_test.go  ✓
   SDK <-> API  end-to-end                                    ✗
3. host bind mount via REST — the cloud runtime intentionally dropped
   host bind mounts (#639); REST callers passing host paths must
   silently no-op, no /mnt/<x> in the guest.
   FFI surface  src/boxlite/tests/mount_security.rs  ✓
   REST contract                                      ✗
```

## Cases shipped

```
scripts/test/e2e/cases/test_cli_detach_recovery.py   (2 cases)
scripts/test/e2e/cases/test_exec_attach.py           (2 cases)
scripts/test/e2e/cases/test_volume_readonly.py       (1 case)
```

5 cases total. Author also dropped 6 cases from this branch's earlier
incarnation that were already committed alongside their respective
fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is
strictly net-new coverage.

## Test plan — run against current main

Stack: local e2e, runner unchanged, no source edits.

| Case | Result | Notes |
|---|---|---|
| `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS |
exit-code passthrough across CLI processes |
| `test_detached_box_survives_cli_exit_and_is_reusable` | ⚠️ XFAIL
(strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then
hits the stdout-drop race that #563 fixes — marker drops when #563 lands
|
| `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id →
typed `Exception` (not 5xx, not silent) |
| `test_reattach_after_original_completes` | ⚠️ XFAIL (strict) | same
stdout-drop race (#563) on the original exec's `out=='first-output'`
assertion |
| `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box
created with `volumes=[(host_dir, "/mnt/ro", True)]` reports
`MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is
untouched — REST silently dropped the host path |

The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into
Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the
markers come off, and the suite is 5/5 green. No additional fix work is
needed in this PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Added end-to-end tests verifying CLI detach survival and box
reusability across fresh CLI processes, including exit code propagation
tests.
* Added end-to-end tests for SDK reattach functionality, validating
session state after execution completion.
* Added end-to-end test confirming proper host bind mount handling
during REST execution.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant