Converge A2 + MVP box journey into main (super PR — review & split)#715
Conversation
|
Important Review skippedToo many files! This PR contains 291 files, which is 141 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (9)
📒 Files selected for processing (291)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Work log — convergence + snapshot/image-subsystem removalRecording what this branch did, in order, so reviewers can split it cleanly. Head 1. Convergence merge (
|
| Intact | Broken (rebuild next PR) |
|---|---|
| box entity core, list/delete, org/user/region/volume/ssh/auth, non-template notifications | box create/boot (no image resolution), templates page, template-based usage metering, template webhooks |
Notes for the rebuild PR
pre-image-rewritetag holds the old implementation —git show pre-image-rewrite:<file>to reference.- Migration
1781200000000still CREATEs emptybox_template+runner_artifact_cachetables (entities deleted). Fresh DB gets the table shells; rebuild only needs to re-add entities/services (or amend that migration). - Two dead-but-compiling methods left (
assignWarmPoolBox,fetchWarmPoolBox) — clean up with the warm-pool rebuild. BoxDto.templatekept optional pending client regen.
Verification
Zero dangling references to deleted symbols; zero conflict markers; migrations (except A2's own) and generated clients untouched by the deletion. Could not run tsc locally (no node_modules) — type/compile errors surface in CI by design (red-then-fix).
Replaces the 60-commit branch history (previous head 748d874) with a single commit of the net apps/ diff rebased onto main @ e526b6c. Resolves apps/eslint.config.mjs by keeping both the .nx (main) and .sst (branch) ignore entries. Co-authored-by: Brian Luo <57960778+law-chain-hot@users.noreply.github.com> Co-authored-by: BrianL <ianloe666@gmail.com> Signed-off-by: dorianzheng <xingzhengde72@gmail.com>
748d874 to
623a043
Compare
Symmetric to the API-side image-subsystem removal: the runner no longer
receives PULL/REMOVE/INSPECT_ARTIFACT jobs, so these handlers are dead code,
and boxlite/registry.go carried the self-hosted-registry mirroring logic that
the team is retiring. Box create/start/stop/destroy lifecycle is untouched;
image-pull/boot stays intentionally non-functional until the rebuild PR.
Deleted (5): executor/artifact.go, api/controllers/artifact.go,
cache/artifact_error_cache.go, api/dto/{snapshot,image}.go.
Gutted (11): executor dispatch cases, backend interface+adapter image methods,
boxlite/{registry,client,stubs}.go (mirroring/pull helpers; kept registry-host
normalizers used by box create), api/server.go artifact routes, runner.go +
cmd wiring (ArtifactErrorCache).
apps/libs + api-client-go (generated, JobType enum) untouched. Pre-existing
boxlite.WithPort build gap (sdks/go binding, apps-only worktree) is unrelated.
Summary — what this PR removes (SnapshotManager + inherited image management)This PR strips the inherited Daytona-fork image-management stack down to the studs so it can be rebuilt the team's own way. Box image-pull/boot is intentionally non-functional until the rebuild PR; box CRUD and all non-image subsystems stay intact. Pre-deletion state is preserved at tag Verified end state: zero live references to Three layers removed1. Self-hosted image infrastructure (SnapshotManager)
2. Inherited image management (box_template + pull ledger)
3. Runner-side pull layer (symmetric removal)
Kept intact
Resulting breakage (accepted — rebuild next PR)Box create/boot (no image resolution), templates page, image-based usage metering, template webhooks, runner image pull. Notes for the rebuild PR
VerificationZero dangling references to deleted symbols; zero conflict markers; migrations and generated clients untouched by the deletion. Local |
Both have zero references anywhere in api/dashboard/openapi: - runner-status.dto.ts: runners report via RunnerHealthcheckDto, not this - registry-push-access-dto.ts: self-hosted-registry push credentials, dead with the SnapshotManager removal Verified not used as nested field types either (unlike RunnerHealthMetricsDto / BoxInfoDto, which are kept).
Zero references anywhere (api/dashboard/openapi/generated): - exec.dto.ts (ExecRequestDto/ExecResponseDto): unused by boxlite-rest exec - url.dto.ts (UrlDto): standalone class dead; live hits were *UrlDto substrings - create-audit-log.dto.ts (CreateAuditLogDto): superseded by CreateAuditLogInternalDto
The 'images cached on runner' metric is meaningless after the image subsystem removal. Removed the field (+ its deprecated currentSnapshotCount alias) across the runner status/metrics chain: API DTOs (runner/runner-health), runner-adapter, runner.service mapping, runner.entity column (fresh DB, no migration), admin spec; and the Go source: metrics collector (incl. the now-dead ListImages call), healthcheck reporter, runner info DTO/controller. RunnerDto and all other runner fields (cpu/mem/disk/alloc/status/version) intact. Generated docs.go + migrations untouched.
Admin observability module (new in Task12B) was authored pre-boxlite-ai#706 and never renamed: broken imports (sandbox-telemetry/, sandbox/ paths) + queried the wrong sandbox- service prefix / boxlite.sandbox_id while the daemon already emits box-<id>. Renamed emitter + consumer in lockstep so new data is fully box: - daemon: trace tracer scope boxlite.sandbox -> boxlite.box - api interceptor: consolidate to { keys:[boxId,boxIdOrName], attr:boxlite.box_id } - admin (11 files): symbols (SandboxState->BoxState, sandboxId->boxId), broken import paths -> box*, wire strings sandbox-/boxlite.sandbox_id -> box-/boxlite.box_id - box-telemetry trace-span doc example sandbox-<id> -> box-<id> Old ClickHouse data (sandbox-keyed) is disposable per decision; no CH schema change (attribute keys are Map values). Fixed two latent boxlite-ai#706 bugs in passing (currentStartedSandboxes->currentStartedBoxes, /admin/sandbox->/admin/box). Left intentionally sandbox: box.manager raw SQL (frozen sandbox DB table/column), PostHog product-analytics event names (separate system, pending decision).
…backup) Finish removing the deleted image-management system across all 5 subsystems. Box image-pull/boot stays intentionally dead until the rebuild PR. API: drop templateId from create-box DTO/controllers/mappers; delete system-templates + resolveSystemTemplateId; remove template event/quota; gut runnerAdapter v0/v2 (artifactRef/removeArtifact/inspectArtifactInRegistry/ RunnerArtifactInfo/snapshot:box.template); remove BoxState.PULLING_ARTIFACT. CLI: delete cmd/snapshot + views/snapshot dirs + registration; drop --template. Dashboard: drop templateId from CreateBoxSheet/PlaygroundProvider/code-snippets; remove useCreateSandboxFromTemplateMutation. Runner: delete backup_info(+cache), GetBuildLogFilePath (dead build-system code); remove PULLING_ARTIFACT/backup residue. Proxy: drop template/build target routing. KEPT (boxlite-internal boundary, per scope): runner CreateBoxDTO.ArtifactRef -> runtime.Create(imageRef) — the minimal 'boot this image' interface the runtime needs. DB-frozen names (sandbox table/column/enum, JobType enum) and email/ code-snippet templates untouched. sdks/ + src/ (Rust runtime) untouched. 19 files deleted, 49 gutted.
The convergence merge (623a043) left apps/api internally inconsistent: some files used pre-boxlite-ai#706 'sandbox' naming while the enums/constants they reference were already 'box' on main (and vice-versa) -> 65 compile errors. main is the authoritative box-consistent version. Renamed identifiers to box (NOT wholesale checkout from main, which would re-introduce the deleted image system): AuditTarget.SANDBOX->BOX, WRITE/DELETE_SANDBOXES->WRITE/DELETE_BOXES, SANDBOX_EVENT_CHANNEL->BOX_EVENT_CHANNEL, SANDBOX_STATES_CONSUMING_*->BOX_*, SANDBOX_SORT_*->BOX_*, SANDBOX_WARM_POOL_*->BOX_*, SANDBOXES_ADMIN->BOXES_ADMIN (UUID unchanged), plus duplicate-field merge artifacts in box-lookup-cache-invalidation + box.repository. Frozen (kept sandbox, matches main): DB entity/column/enum/index, raw SQL aliases, PostHog event names, JobType enum values. Image/snapshot deletion untouched. api tsc: 65 -> 0 errors. 16 files.
The convergence merge (623a043) regressed nearly all of apps code from main's post-boxlite-ai#706 box naming back to pre-boxlite-ai#706 sandbox. main's non-migration code has ZERO sandbox (verified) -- it is the authoritative all-box version. This branch had 834 sandbox in apps/api/src alone (JobType.CREATE_SANDBOX vs main's CREATE_BOX, @Index('sandbox_*') vs box_*, etc.). Applied sandbox->box codemod across all apps source to match main, keeping image deletion intact (rename only, no content from main): - apps/api (834->0), apps/dashboard (115->0), apps/libs/sdk-typescript (2->0) - apps/cli (25->0), apps/runner (88->0), apps/daemon (5->0), apps/infra (3->0) - fixed codemod duplicate-property collisions (sandboxId+boxId -> boxId) in admin observability dashboard hooks - @entity('box') restored to match main Verified: api tsc 0 errors; cli/daemon go build clean; runner only the pre-existing boxlite.WithPort gap; dashboard remaining errors are module-resolution (generated clients not linked in this worktree, 272->244 pre/post codemod). Migrations and generated clients untouched. No image/snapshot symbols reintroduced.
Filenames lagged the content rename. main has these as box; renamed:
- apps/dashboard/src/lib/sandbox-identity.ts -> box-identity.ts
(fixes ~12 dashboard 'Cannot find module @/lib/box-identity' errors -- code
already imported box-identity)
- 5 api spec files: sandbox{.dto,-lifecycle.dto,-start.action,.service.box-id,
-to-box.mapper}.spec.ts -> box*.spec.ts
KEPT as sandbox (OS-isolation primitive, frozen, main keeps too):
src/boxlite/src/jailer/sandbox/*.rs, docs/guides/macos-sandbox-debugging.md.
dashboard tsc 244 -> 232 (remaining are generated-client module-resolution, env).
# Conflicts: # apps/dashboard/src/components/Sidebar.tsx # apps/dashboard/src/pages/Onboarding.tsx
Finish the image-system deletion in the parts the earlier pass missed: - DELETE: organization/dto/template-usage-overview-internal.dto.ts, sdk-typescript/Template.ts - GUT (template billing): organization-usage.service/helper, organization-usage-overview.dto, create-organization-quota.dto (templateQuota), organization.entity (templateQuota column), organization-events.constant (SUSPENDED_TEMPLATE_DEACTIVATED), organization.controller/service, app.service, configuration (dead templateQuota keys) - GUT (SDK): index.ts + BoxLite.ts (TemplateService + TemplatesApi + .template accessor) - GUT (CLI): box/create.go + mcp/create_box.go (--snapshot flag, SetSnapshot, BUILDING_SNAPSHOT) False positives KEPT (React snapshot, exec backlog snapshot, CPU/VM snapshot, metrics.interceptor TODOs). api/src tsc 0; cli go build clean. Migrations/generated clients untouched.
Remove the entire billing vertical from apps/api:
- DELETE (17): usage/ module (usage.service, box-usage-period{,-archive}.entity,
usage.module), organization-usage.service+helper, 7 usage/quota DTOs,
region-quota.entity, 3 box/volume-states-consuming-* constants
- GUT (20): app.module/organization.module (UsageModule/RegionQuota),
organization.service/controller (updateQuota/usage endpoints + quota columns),
box.service/volume.service/job-state-handler (usage tracking + quota checks),
metrics.interceptor (quota capture), configuration (quota/defaultTemplate keys),
app.service (admin quota seed), audit enums (UPDATE_QUOTA/TEMPLATE), user chain
(defaultOrganizationQuota), app.service.spec (stale quota assertion)
Verified: api/src tsc 0, api spec tsc 0, cli/daemon go build clean.
No snapshot/template (api/src=0), no sandbox except OS-isolation jailer (api/src=0),
no billing system (api/src=0). Migrations/generated clients untouched.
Kept: Region.enforceQuotas flag + external billingApiUrl config (not the usage
module). Dashboard create-flow templateName left — it's the image-selection state
machine, part of the dashboard image rebuild (dashboard has 103 pre-existing
mid-rebuild errors, not compilable in this worktree regardless).
232a072 to
c17103e
Compare
The convergence merge updated apps/package.json but apps/yarn.lock was left stale. CI runs `yarn install --immutable` (and plain `yarn install` is immutable under CI=true), so both the API-client-drift and E2E-stack jobs aborted at install with YN0028 before running anything. Regenerated via `yarn install --mode=update-lockfile`; `yarn install --immutable` now passes. Only the lockfile changed.
…follow-up) (#726) ## What Regenerates the committed API clients against the post-#715 (merged A2 + MVP) API surface — the follow-up that #715's merge commit explicitly deferred: >⚠️ **CI will be red until generated clients are regenerated** against the merged API surface … **Generated clients now carry ZERO diff in this PR** (reset to main in `f9ea0730`) — regenerate upstream against the merged API surface. Since that merge, the **API client drift** check fails on every PR touching `apps/**` (e.g. #725's run 8 minutes after the merge). This PR turns it green again. ## Content **Commit 2 — the regen (`apps/libs/api-client`, `apps/api-client-go`, 231 files).** Pure `openapi-generator` 7.23.0 output, zero hand edits, produced with the exact `api-client-drift.yml` recipe (pinned generator via `openapitools.json`, NestJS spec boot with local Redis, GNU sed for the postprocess script). `analytics-api-client` and `toolbox-api-client` regenerated to **zero diff** (already current since #721/#723). Surface delta (mirrors the A2 + MVP API changes): - **removed:** snapshots / docker-registry / build / backup / archive-lifecycle / quota / usage-overview endpoints and models; `BoxState` build states (`pending_build`, `build_failed`, …); `write:snapshots` + `delete:snapshots` permission values; `listBoxesPaginated`'s `snapshots` filter param - **added:** `SystemRole`, `UpdateOrganizationName` (+ `PATCH /organizations/{organizationId}/name`), admin overview/observability models **Commit 1 — prek lint unblock (34 deleted lines, dashboard).** The Sandbox→Box rename left `LEGACY_*` route enum members byte-identical to the canonical ones — 4 pre-existing `@typescript-eslint/no-duplicate-enum-values` errors at HEAD that fail the repo's prek pre-commit hook (`make lint:fix`) for *every* local commit. The legacy routes are unreachable (identical paths, canonical registrations precede them), so this deletes them plus the orphaned `LegacyBoxRedirect`. No behavior change. Included here because nothing can be committed locally until it lands. ## Verification - `go build ./...` passes in `apps/api-client-go` (standalone), `apps/common-go`, `apps/otel-collector/exporter`. - The **API client drift** check on this PR is the canonical byte-for-byte proof. ## Known follow-up (intentionally split) Per review preference, this PR is generated code only. Three consumers still reference removed APIs and will not compile against the new clients until the prepared follow-up PR lands (branched on top of this one): - `apps/cli` — Dockerfile-build flow (`CreateBuildInfo`, `BOXSTATE_BUILD_FAILED`/`PENDING_BUILD`, `--dockerfile`/`--context`, MCP `buildInfo` arg, `pkg/minio`) - `apps/dashboard` — Registries page + registry hooks, usage-overview wiring in Spending/Limits, `templates` filter arg - `apps/libs/sdk-typescript` — `Box.buildInfo`/`backupState`/`backupCreatedAt`, `getBuildLogsUrl` No CI workflow compiles these consumers on PR today (the drift check is the only `apps/**` gate), so this PR is green-mergeable; the follow-up restores local builds. Note `apps/runner` has a **pre-existing** unrelated compile failure on main (`boxlite.WithPort` undefined in `pkg/boxlite`) — out of scope here.
…API clients (#727) ## What The consumer-adaptation follow-up that #726 disclosed: makes `apps/cli`, `apps/dashboard`, and `apps/libs/sdk-typescript` compile against the regenerated API clients by removing code whose server-side API was deleted in the A2+MVP merge (#715). 52 files, +115/−3063 — almost entirely deletions. ## cli - Deletes the Dockerfile-build flow: `--dockerfile`/`-f` and `--context`/`-c` flags on `boxlite create`, `CreateBuildInfo` construction (`cmd/common/build.go` with its Dockerfile parsing + MinIO context upload), build-log streaming (`cmd/common/logs.go`, hit the removed `/build-logs` endpoint), the MCP `create_box` `buildInfo` argument, and `pkg/minio` (its only consumer was the build flow; `go mod tidy` drops the dependency). - Drops `BOXSTATE_BUILD_FAILED` / `BOXSTATE_PENDING_BUILD` handling (states removed from the enum). - Regenerates cobra docs via `hack/generate-cli-docs.sh` — also clears stale `boxlite snapshot` docs left from the super PR. ## dashboard - Deletes the **Registries** page, `RegistryTable`, the 4 registry hooks, its route enum + hidden-routes entry, and `apiClient.ts` wiring (`DockerRegistryApi` was removed). The page was already in `HIDDEN_DASHBOARD_ROUTES`. - Removes the **usage-overview** wiring (`getOrganizationUsageOverview` removed with no successor): the `UsageOverview`/`UsageOverviewIndicator`/`LimitUsageChart` components, the quota-driven usage timeline chart (its "percent of quota" mode is built on the deleted `RegionUsageOverview` quotas throughout), the hook + query keys + `LiveIndicator` they orphaned. **Spending and Limits keep their billing/tier features** (wallet, cost breakdown, tier comparison, rate limits). - Drops the orphaned `templates` box filter (the `snapshots` query param left `listBoxesPaginated`; nothing set the filter). ## sdk-typescript - `Box`: drops `template`/`backupState`/`backupCreatedAt`/`buildInfo` — fields no longer on the wire model. - `BoxLite.create()`: the wire `CreateBox` accepts neither `buildInfo` nor `templateId` anymore, so `create()` now **throws a clear `BoxliteError`** when `image` or `templateId` params are provided instead of silently dropping them. The `CreateBoxFromImageParams` type, overload, and `Image` class stay exported (marked `@deprecated`) because the dashboard Playground imports them — its image flow now gets an honest runtime error (it was already broken server-side); the full Playground rework belongs to the MVP track (`PlaygroundProvider.tsx` already carries `TODO(image-rewrite)` markers). - Deletes the dead `processStreamingResponse` helper (`stdDemuxStream` stays — `Process.ts` uses it). - **Adds guard tests** (`__tests__/BoxLite.create-guards.test.ts`) for the two new throws, wiring the dormant jest harness to `tsconfig.spec.json` + the workspace path aliases so it actually runs (`yarn jest --config libs/sdk-typescript/jest.config.js`, 2/2 pass). The asserted messages are produced only by the guards — without them the call rejects with a network `AxiosError`. ## Verification - `go build ./...` + `gofmt` clean in `apps/cli`; `go mod tidy` applied. - sdk `tsconfig.lib` **and** `tsconfig.spec` typecheck clean; jest guard tests 2/2. - dashboard `tsc`: **216 errors vs the 232-error pre-merge baseline, zero new** (position-normalized diff; the dashboard has never been tsc-clean — #719 tracks that). - `make lint:fix` exits 0 modifying nothing. - Grep sweeps: zero remaining references to any removed client symbol outside the generated dirs. ## Out of scope (pre-existing) `apps/runner` fails to compile on main (`boxlite.WithPort` undefined in `pkg/boxlite`) — fails identically at clean HEAD, unrelated to the regen.
- sdks/go: add WithPort so apps/runner compiles. #715 added boxlite.WithPort() call sites without the SDK function; the C ABI, Rust FFI, and runtime port-forwarding all already exist, only the Go layer was missing. Mirrors the WithVolume pattern; arg order (guest, host) matches the C ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot (was crashing on configService.getOrThrow). Pre-existing on main; verified by reproducing on a clean checkout.
PR #715 ("Converge A2 + MVP box journey") added call sites in apps/runner/pkg/boxlite/{client,stubs}.go: opts = append(opts, boxlite.WithPort(ToolboxGuestPort, toolboxHostPort)) but didn't add WithPort to sdks/go, breaking every runner build with: runner/pkg/boxlite/client.go:268:30: undefined: boxlite.WithPort runner/pkg/boxlite/stubs.go:62:30: undefined: boxlite.WithPort Add a stub WithPort that records the request on a new boxConfig.ports field. The field is currently unused — port forwarding is not plumbed through the C FFI bridge (sdks/c has no port-mapping API), so any WithPort call is effectively a no-op at runtime. The TODO header in the doc-comment marks where to wire bridge.c → libkrun networking once the C side gains the API. This unblocks deploy-runner.yml end-to-end testing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sdks/go: add WithPort so apps/runner compiles. #715 added boxlite.WithPort() call sites without the SDK function; the C ABI, Rust FFI, and runtime port-forwarding all already exist, only the Go layer was missing. Mirrors the WithVolume pattern; arg order (guest, host) matches the C ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot (was crashing on configService.getOrThrow). Pre-existing on main; verified by reproducing on a clean checkout.
- sdks/go: add WithPort(host, guest) and PortSpec so apps/runner compiles and Go's port shape matches the Python SDK layer: host, guest, protocol, and host_ip. The C bridge still receives guest_port, host_port because that is the existing C ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot.
- sdks/go: add WithPort(host, guest) so apps/runner compiles, backed by an internal portSpec whose fields mirror the Python SDK input shape: host, guest, protocol, and host_ip. The C bridge still maps this to the existing guest_port, host_port ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot.
- sdks/go: add WithPort(host, guest) so apps/runner compiles. The internal portSpec mirrors Python's SDK input shape: host, guest, protocol, and host_ip; toCSpec mirrors Python's PyPortSpec -> PortSpec mapping as host_port, guest_port, protocol, and host_ip before calling the existing C ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot.
- sdks/go: add WithPort so apps/runner compiles. #715 added boxlite.WithPort() call sites without the SDK function; the C ABI, Rust FFI, and runtime port-forwarding all already exist, only the Go layer was missing. Mirrors the WithVolume pattern; arg order (guest, host) matches the C ABI. - api test: fix OrganizationService mock argument order in the default-org-membership spec so configService lands in the right constructor slot (was crashing on configService.getOrThrow). Pre-existing on main; verified by reproducing on a clean checkout.
Adds a reusable workflow that deploys the boxlite-runner binary to the
Tokyo e2e-ci EC2 over SSH+SCP via EC2 Instance Connect, replacing the
SSM Run Command dispatch (the agent on that EC2 has been in
ConnectionLost since the original RunnerProfile was deleted out of
IAM — see commit body of the runner deploy script for the recovery
plan).
Mechanism (deploy job):
1. ec2-instance-connect:SendSSHPublicKey — 60s ephemeral key, no
pre-shared keypair or GHA secret needed.
2. ec2:AuthorizeSecurityGroupIngress — temp inbound 22 from the GHA
runner's egress IP, unconditionally revoked on exit (success or
failure).
3. scp tarball + ssh stop / extract / start / smoke-check the
boxlite-runner service.
The reusable workflow accepts a `workflow_dispatch` input
`runner_artifact_run_id` so an existing artifact from a prior run can
be redeployed without rebuilding the C SDK + Go binary. Internal
`changes` job detects whether the push actually touched runner source
(against github.event.before, not the default branch) so workflow-only
commits don't trigger a 30+ min build.
sdks/go/options.go: add a no-op WithPort BoxOption + portMapping field
on boxConfig. PR boxlite-ai#715 added call sites in apps/runner/pkg/boxlite/
{client,stubs}.go that reference boxlite.WithPort, but the function
was never added to the sdks/go package — the runner build breaks on
HEAD without this stub. Port forwarding is not yet wired through the
C FFI bridge (sdks/c has no port-mapping API), so this records the
request on boxConfig and is otherwise a no-op until the bridge gets a
port-forwarding API. TODO: wire boxConfig.ports through bridge.c →
libboxlite's libkrun networking layer when that API lands.
OIDC role perms added in a separate IAM change (the role was
recreated with BoxLiteDeveloperPermissionsBoundary attached):
ec2:AuthorizeSecurityGroupIngress / RevokeSecurityGroupIngress /
DescribeSecurityGroupRules — scoped to sst:app=boxlite SGs
ec2-instance-connect:SendSSHPublicKey — scoped to
Name=boxlite-runner instances
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
boxlite-ai#734) ## Summary - add the Go SDK `WithPort` option required by the runner build path after boxlite-ai#715 - update the default organization membership spec mock argument order to match the current service call ## Verification - pre-push `make test:changed` was triggered by the local hook and failed in the existing macOS Go/native link test path before push - pushed with `--no-verify` per handoff flow; CI is the source of truth for this PR <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added configurable host↔guest port mappings (host port, guest port, protocol, host IP). * **Bug Fixes** * Updated how the toolbox port is exposed between host and guest (mapping direction corrected). * **Tests** * Updated tests and test helpers to reflect the new port-mapping configuration and the adjusted toolbox mapping. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
waitForToolboxReady was introduced in fc88aa0 (2026-06-05, "feat: add agent-ready runtime catalog", which then landed on main via PR #715 on 2026-06-10). It HTTP-polls http://127.0.0.1:<hostPort>/version expecting a daemon on guest TCP port 2280 — that interface dates from the Daytona-daemon era and was never reimplemented after the Rust guest agent rewrite landed in dbb11ec (2026-04-01). The new agent binds **vsock://2695** for gRPC + notifies the host via vsock://2696; nothing inside the VM listens on TCP:2280, so libkrun's port-forward accepts the SYN and immediately reset-by-peer's, and every CREATE_BOX times out 30 s in. Production data from a Tokyo runner: in 24 h, 490 CREATE_BOX events, 0 toolbox-ready successes, 181 toolbox-ready failures. The exec path that fires immediately afterward (via the same vsock gRPC channel) **always succeeds** — confirming the box VM is healthy, the readiness check itself is the bug. Remove the dead probe: drop waitForToolboxReady from client.go's Create and Start, drop the function + its TCP/HTTP imports, drop the toolboxReadyTimeout field, drop the ToolboxReadyTimeout/ DaemonStartTimeoutSec config plumbing in main.go + config.go, drop the two now-unreachable tests. Box readiness is now signalled by bx.Start(ctx) returning (which itself blocks on the vsock notification from the guest). Branched off chore/e2e-required-merge-gate (PR #724) so the e2e-cloud stack picks this up next dispatch.
What this is
A convergence "super PR": it merges everything sitting on the
codex/overnight-20260608line intomain, so thatmainbecomes the single source of truth again. Intended to be reviewed then split — opened as one piece for visibility, not to be squash-merged blind.main← merge of (52 commits ahead): MVP box journey + agent-ready image catalog + A2 snapshot-manager deletion + Task12B observability wiring + default-org membership + W3C traceId, reconciled againstmain'sSandbox→Boxrename (#706) and recent REST/CI fixes.Net vs main: 771 files, +85k/−23.7k.
Suggested split for review (5 logical blocks)
box_template+runner_artifact_cache; direct ghcr pull; 68 files deleted; migrations17809…–17812…Conflict-resolution policy (527 conflicts)
api-client,runner-api-client,api-client-go): took main's regenerated side; a fresh regen against the merged API surface is still pending (see Caveats).Sandbox→Box, with the frozen-literal allowlist preserved — DB table/column/enum, OS-isolationsandbox, telemetryboxlite.sandbox.*, webhook event names).box/paths.BOXSTATE_*).Caveats — read before merging
1780200000000is present (verified) — do not reintroduce a second.Not in this PR
feat/admin-ui-redesign(POL-14 admin UI v2) and the unmerged Task12 ClickHousesst.config.tswiring are intentionally left out — separate follow-ups.Commit inventory — 48 commits, 6 blocks (+1 merge)
A2 — delete snapshot-manager machinery (7)
p2 direct-ghcr · p3 drop build · p4 drop backup · p5 drop registry→2 tables · p6 rebuild
box_template+runner_artifact_cache· P1 runtime-scoped ghcr auth · ghcr credential delivery (Secrets Manager)MVP box journey (16)
streamline box journey · templates-as-images · public BoxID + archive retirement · simplified onboarding · BoxID/SDK onboarding polish · quickstart + dev-smoke
Template-artifact refactor (11)
from
d3a60c7c: cloud templates + runtime artifacts refactor and follow-up fixesObservability (3)
Task12B admin diagnose + saved-image fix · runner W3C traceparent · daemon traceId
Agent-ready images (3)
runtime catalog · pin digests · catalog merged into MVP journey
Org membership (3)
default-org state → memberships · backward-compatible · post-merge overview fix
Misc (5)
JWT issuer validation · ESLint flat-config fix · lint config · logo assets · yarn.lock
Superseded branches — intentionally NOT included
Verified by file-level comparison: these are earlier iterations whose final form is already in this PR. Re-submitting them would regress newer code.
feat/admin-overview,feat/frontend-slim-box-rename— fully superseded by Task12/12B + refactor: rename Sandbox -> Box (Part 1: apps/api epicenter) #706/MVP journeyfeat/admin-ui-redesign,codex/task7-observability-data-layer— ~52% file-overlap with Task12B's rewrite already here; any truly-missing UI/data-layer fragments will come later as small focused PRscodex/snapshot-image-naming,codex/runtime-artifact-split,codex/api-template-contract— abandoned saved_image naming line (carries a conflicting variant of migration1780200000000; must never merge)Generated clients now carry ZERO diff in this PR (reset to main in
f9ea0730) — regenerate upstream against the merged API surface.Image subsystem removed (commit fbf99d9)
The inherited Daytona-fork image-management (box_template + runner_artifact_cache) is fully removed so it can be rebuilt the team's own way. Box image-pull/boot is intentionally non-functional until the rebuild PR. Old impl preserved at tag
pre-image-rewrite.TODO(billing-rewrite)), dashboard create/playground/routes