feat(infra-local): BoxLite-based local dev stack (L1 + L2 + M5 native runner) by lilongen · Pull Request #595 · boxlite-ai/boxlite

lilongen · 2026-05-26T11:29:08Z

Summary

Adds apps/infra-local/ — a fully self-hosted BoxLite cloud-MVP
control plane that runs on a single Apple Silicon Mac. One command
(make stack-up) brings up the entire stack and lets a developer create
real microVM sandboxes through the dashboard — no AWS, no Docker daemon
for the application boxes; everything runs natively on M5.

Ships as milestone/infra-local/v0.1.0. Cold-start → working sandbox +
browser terminal (root@boxlite:~#) in ~80 s.

Dogfood principle: the control-plane services run as BoxLite boxes
(not docker-compose), so any BoxLite weakness is felt by the team
immediately.

Architecture

Layer	What	Where
L1 — infra-local	10 BoxLite microVM boxes: PostgreSQL, Redis, MinIO, OCI registry, Dex (OIDC), Caddy, Jaeger, pgAdmin, registry-ui, OpenTelemetry collector (+ one-shot minio-init)	libkrun microVMs on macOS Hypervisor.framework, orchestrated by the `boxlite_local` Python package
L2 — control plane	4 native processes: NestJS API (`:3001`), Go Runner (`:3003`), Go Proxy (`:4000`), Vite Dashboard (`:3000`)	Native macOS arm64, driven by `make stack-*`
L3 — user sandboxes	N user-created Ubuntu/Alpine sandboxes	libkrun microVMs spawned by the L2 Runner in `~/.boxlite-runner/`

The Runner runs natively on M5 (HVF + libkrun) — single runner host.
Multi-host autoscaler testing (the old Lima route) is intentionally out
of scope here and parked in a separate worktree.

What works end-to-end (verified)

Cold start: make stack-nuke && make stack-up boots all services + auto-seeds + waits for the default snapshot — ~80 s.
OIDC login via Dex (admin@boxlite.dev / password, plus a normal test01@boxlite.dev); API auto-creates user + Personal org + owner row on first login.
Sandbox lifecycle: create → start → stop → destroy via dashboard or POST /api/sandbox.
Live terminal: dashboard Terminal → Connect → real interactive shell inside the microVM (browser → Caddy → Proxy → Runner → microVM).
Snapshot pull: API auto-creates the ubuntu:22.04 default snapshot; runner pulls the arm64 layer from the local registry.
Region + API-key management, audit logs, etc.

Verified twice via full cold-start + browser-driven E2E (login → snapshot Active → create sandbox → terminal root@boxlite:~#).

Operate-by-make

make stack-up is the single, self-healing entry point — works from
a fresh checkout, after a reboot, or after make stack-down:

auto-runs make install if the orchestrator package isn't importable
auto-builds the native runner/proxy binaries if missing (e.g. /tmp cleared on reboot)
load-schema is idempotent — skips cleanly when the PG data volume survived a reboot

Other targets: stack-status, stack-logs COMPONENT=…, stack-restart COMPONENTS=…, stack-rebuild-l1-box BOX=…, tiered cleanup
stack-reset / stack-reset-hard / stack-nuke. See
docs/apps/infra-local-usage.md.

Notable implementation notes

Dev runner-score override (stack-up.sh): the Go runner reports
host-wide CPU/RAM/disk to the API. On a dev Mac sharing RAM with
IDE/Chrome/Docker, that drags availabilityScore below the prod
threshold and the API rejects sandbox-create with "No available
runners". stack-up.sh exports RUNNER_AVAILABILITY_SCORE_THRESHOLD=5
/ RUNNER_MEMORY_PENALTY_THRESHOLD=95 / RUNNER_DISK_PENALTY_THRESHOLD=95
before launching the API. Documented in CONNECTIONS.md; structural
fix (runner reports only its own usage) tracked as follow-up.
Supporting app changes to make the milestone work: PostHog
feature-flag bootstrap (api + dashboard), runner runtime.GOARCH
image pull (fixes ENOEXEC on arm64), jwt.strategy user-create
fix, Caddy host-regexp → proxy for sandbox port-preview URLs.

Known limitations (intentional, not bugs)

PostHog, Billing (Stripe), Svix webhooks, Snapshot Manager (Dockerfile
builds), SSH Gateway, ClickHouse, OpenSearch, SMTP, dns-shim + TLS are
mocked / deferred — see the
milestone doc
"Known limitations" table. Production-parity gap (HVF vs KVM libkrun
backend) is acknowledged and revisited when the autoscaler lands.

Docs

apps/infra-local/README.md — quick start + make targets + architecture
apps/infra-local/CONNECTIONS.md — endpoint / credential / env-var reference
docs/apps/infra-local-usage.md — day-to-day workflow + tiered cleanup decision tree
docs/apps/infra-local-status.md — real vs mock vs missing inventory
docs/apps/milestones/2026-05-25-milestone-infra-local-v0.1.0.md — milestone summary

Test plan

make stack-up from fresh checkout (auto install + build + up)
make stack-nuke && make stack-up cold start → ~80 s to working stack
Reboot → make stack-up (load-schema idempotent skip, binaries rebuilt)
Dashboard login via Dex → snapshot ubuntu:22.04 Active
Create sandbox via UI → state Started → terminal root@boxlite:~#
All committed docs English-only (CLAUDE.md); no Lima refs in scripts/code

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Full local dev orchestrator and CLI for running the multi-service local stack, with service registry, healthchecks and startup tooling.
- Local no-network dashboard mock and local feature-flag bootstrap for easier offline development.
- New helper scripts and Make targets for building, running, resetting and seeding the stack.
- Expanded .gitignore for local artifacts.
Bug Fixes
- Fixed sandbox query ordering issues.
- New users now get the platform default region set at creation.
Documentation
- Comprehensive local-dev usage guide and connections reference added.
Tests
- Large suite of unit and integration tests covering orchestrator, services, and end-to-end local stack.
Chores
- Updated dev tooling configs and package/tooling versions; added local packaging metadata.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces a BoxLite-dogfooded apps/infra-local/ stack (Python orchestrator + shell wrappers) and documentation for a fully self-hosted local control plane, alongside several fixes to unblock local Apple Silicon development (runner image arch selection, local PostHog flag bootstrapping, etc.).

Changes:

Add apps/infra-local/ Python orchestrator (boxlite_local) with unit/integration tests, plus L2 wrapper scripts (stack-*) and schema loader tooling.
Add extensive infra-local milestone and usage/status docs plus connection references.
Adjust local-dev behavior in runner (platform selection) and dashboard/API (PostHog bootstrap flags + MSW local config).

Reviewed changes

Copilot reviewed 50 out of 54 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tsconfig.base.json	Adds a top-level tsconfig reference (currently appears as a path placeholder).
libs	Adds a top-level libs reference (currently appears as a path placeholder).
docs/apps/milestones/2026-05-25-milestone-infra-local-v0.1.0.md	Milestone snapshot + architecture + operations overview.
docs/apps/infra-vs-local-infra.md	Rationale/comparison doc updated with “what shipped” notes.
docs/apps/infra-local-usage.md	Day-to-day usage guide and troubleshooting.
docs/apps/infra-local-status.md	Current state inventory of local stack.
apps/runner/pkg/boxlite/registry.go	Switch snapshot pull platform arch from hardcoded amd64 to host arch.
apps/package.json	Adds forge deps/types, swc-loader, bumps @swc/core, adds resolutions.
apps/infra-local/tests/unit/test_topo.py	Unit tests for topo sorting.
apps/infra-local/tests/unit/test_orchestrator.py	Unit tests for orchestrator helper behavior.
apps/infra-local/tests/unit/test_doctor_lsof.py	Unit tests for lsof parsing and ownership rules.
apps/infra-local/tests/unit/test_config.py	Unit tests for InfraConfig defaults and env overrides.
apps/infra-local/tests/integration/test_multi_service.py	Integration smoke test for multi-service bring-up.
apps/infra-local/tests/integration/test_e2e_full.py	Full E2E protocol-level integration suite.
apps/infra-local/sql/REFRESH.md	Procedure to refresh prod schema baseline.
apps/infra-local/scripts/stack-up.sh	Starts L1 + L2 components and seeds init data.
apps/infra-local/scripts/stack-status.sh	One-screen L1/L2 status output.
apps/infra-local/scripts/stack-restart.sh	Restarts L2 components (rebuilds runner).
apps/infra-local/scripts/stack-reset.sh	Soft/hard/nuke reset workflows.
apps/infra-local/scripts/stack-logs.sh	Tail component logs.
apps/infra-local/scripts/stack-down.sh	Stops L2 components and optionally L1.
apps/infra-local/scripts/stack-build.sh	Builds native runner/proxy binaries and installs node deps.
apps/infra-local/scripts/seed-init-data.sh	Waits for API seeding + default snapshot readiness.
apps/infra-local/scripts/apply-schema.sh	Loads schema-baseline.sql into local postgres.
apps/infra-local/scripts/_stack-common.sh	Shared helpers/paths/ports for stack scripts.
apps/infra-local/pyproject.toml	Defines `boxlite_local` Python package + test extras.
apps/infra-local/goal.md	Original infra-local goal note (currently non-English).
apps/infra-local/configs/minio/init.sh	MinIO init helper script.
apps/infra-local/boxlite_local/types.py	Dataclasses/types for orchestrator and doctor.
apps/infra-local/boxlite_local/services.py	Declarative ServiceSpec registry for L1 services.
apps/infra-local/boxlite_local/orchestrator.py	Topo sort, up/down/ps, health checks, SDK workarounds.
apps/infra-local/boxlite_local/execwrap.py	Helper to collect exec stdout/stderr and exit code.
apps/infra-local/boxlite_local/doctor.py	Preflight checks (SDK, runtime, port conflicts).
apps/infra-local/boxlite_local/config.py	InfraConfig + env loading and derived properties.
apps/infra-local/boxlite_local/cli.py	argparse CLI for `python -m boxlite_local`.
apps/infra-local/boxlite_local/main.py	Module entrypoint.
apps/infra-local/boxlite_local/init.py	Package metadata/version.
apps/infra-local/README.md	Primary infra-local documentation and workflows.
apps/infra-local/Makefile	Make targets wrapping Python CLI + L2 stack scripts.
apps/infra-local/CONNECTIONS.md	Endpoints/credentials/env var reference.
apps/go.work	Expands go.work workspace list (adds proxy, reorders entries).
apps/dashboard/src/mocks/handlers.ts	Adds local-dev MSW fixtures + local `/config` response.
apps/dashboard/src/components/PostHogProviderWrapper.tsx	Adds local-dev bootstrap feature flags fallback.
apps/api/src/common/providers/openfeature-posthog.provider.ts	Adds bootstrapFlags fallback when PostHog not configured.
apps/api/src/auth/jwt.strategy.ts	Passes default region id when auto-creating personal org.
apps/api/src/app.module.ts	Wires bootstrapFlags into OpenFeature PostHog provider config.
apps/api/project.json	Disables generatePackageJson for API build.
CLAUDE.md	Adds “Committed documentation MUST be in English” rule.
.gitignore	Ignores infra-local logs/env/symlinks and local artifacts.

Comments suppressed due to low confidence (4)

tsconfig.base.json:1

This file appears to contain only a path string, which is not valid tsconfig JSON and will break TypeScript tooling that expects a JSON object. If the intent is to reference apps/tsconfig.base.json, commit this as a symlink (so Git records it as a link) or replace the file contents with a real tsconfig that uses extends to point at the apps config.
libs:1
This looks like it was intended to be a directory alias/symlink, but as committed it’s a regular file containing a path string. That will break any build tooling that expects libs/ to be a directory. Commit a real symlink (recorded by Git) or remove this file and update references to point at the correct directory.
apps/infra-local/scripts/stack-up.sh:1
On macOS, BSD xargs does not support -r, so this command won’t reliably kill the prior listener; it can leave the port occupied and cause the subsequent start to fail with EADDRINUSE. Remove -r (since the port is known to be listening here) or replace this with a macOS-safe loop over PIDs.
apps/runner/pkg/boxlite/registry.go:1
The variable name linuxAmd64Platform is now misleading because it no longer forces amd64. Rename it to something like linuxPlatform / defaultPullPlatform to avoid confusion when reading call sites.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,
+      environment: 'local',
      billingApiUrl: BILLING_API_URL,
-    })
+    } as Partial<BoxliteConfiguration>)


+    // defaults above. All network behavior disabled: no /decide call (we
+    // pre-load flags via bootstrap), no event capture, no autocapture.
+    return (
+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call


DorianZheng · 2026-05-27T02:43:59Z

+
+exporters:
+  debug:
+    verbosity: basic


Otel collector is wired but doesn't feed Jaeger — local Jaeger UI will be empty.

Prod's OtelCollector (apps/infra/sst.config.ts:438) is BoxLite's custom build with the ClickHouse exporter. Here the local collector uses upstream otel/opentelemetry-collector:latest with exporters: debug only — traces land on stdout, never reach SPEC_JAEGER.

Two clean fixes, pick one:

(a) Wire it through: add otlp exporter pointing at jaeger:4317 and add jaeger to the pipelines.traces.exporters list. Local Jaeger UI becomes useful.

(b) Drop SPEC_OTEL entirely: Jaeger 1.67 accepts OTLP natively (COLLECTOR_OTLP_ENABLED=true is already set on SPEC_JAEGER) — have api/runner OTLP straight at jaeger and remove the collector hop.

Right now we ship both services but they don't talk, which is worse than shipping neither. Worth fixing in this PR or immediate follow-up.

Fixed in 01850227 — went with option (a), keeping the collector in the path for prod parity.

_otel_config now adds an otlp/jaeger exporter and puts it on the traces pipeline alongside debug (metrics/logs stay debug-only — Jaeger doesn't ingest them).

Jaeger's OTLP gRPC receiver is now host-mapped (26687 → box :4317); the collector reaches it via host-as-hub (host.boxlite.internal:26687). COLLECTOR_OTLP_ENABLED=true was already set.

SPEC_OTEL now depends_on=["jaeger"].

Verified end-to-end on a live stack: POST an OTLP/HTTP trace to the collector (:24318) → the Jaeger query API returns the service + trace within ~2s, so the Jaeger UI is no longer empty. Docs updated (README §Known-limitations #2, CONNECTIONS §6/§8). The only remaining gap vs prod is the custom boxlite_exporter build (ClickHouse / api push-back), still noted as a limitation.

Copilot

Pull request overview

Copilot reviewed 50 out of 54 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (4)

apps/runner/pkg/boxlite/registry.go:1

linuxAmd64Platform no longer represents amd64 specifically (it’s now host-arch dependent). Rename to something like linuxHostPlatform/defaultLinuxPlatform to avoid misleading future readers and call sites.
apps/package.json:1
Using a caret range in resolutions can reduce determinism (you may get different transitive trees over time). Prefer pinning @types/express to an exact version here (and ensure the lockfile resolves consistently) to avoid unexpected type changes across installs.
tsconfig.base.json:1
This file content is not valid JSON unless it’s being committed as a git symlink target. Ensure tsconfig.base.json is actually a symlink (mode 120000) to apps/tsconfig.base.json; otherwise TypeScript tooling will fail to parse it.
docs/apps/infra-vs-local-infra.md:1
These port numbers conflict with the rest of the infra-local docs/scripts in this PR (API is started on :3001; dashboard on :3000). Update the table to reflect the actual local ports so readers don’t follow incorrect endpoints.

+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,


+  http.get(`${API_URL}/*`, async ({ request }) => {
+    console.log('[MSW catch-all GET]', request.url)
+    if (request.url.includes('paginated')) {
+      return HttpResponse.json({ items: [], totalItems: 0, totalPages: 0 })
+    }
+    return HttpResponse.json([])
+  }),
+
+  // Catch-all POST/PUT/PATCH/DELETE — return {} so mutations don't error.
+  http.post(`${API_URL}/*`, async ({ request }) => {
+    console.log('[MSW catch-all POST]', request.url)
+    return HttpResponse.json({})
+  }),
+  http.put(`${API_URL}/*`, async ({ request }) => {
+    console.log('[MSW catch-all PUT]', request.url)
+    return HttpResponse.json({})
+  }),
+  http.patch(`${API_URL}/*`, async ({ request }) => {
+    console.log('[MSW catch-all PATCH]', request.url)
+    return HttpResponse.json({})
+  }),
+  http.delete(`${API_URL}/*`, async ({ request }) => {
+    console.log('[MSW catch-all DELETE]', request.url)
+    return HttpResponse.json({})
+  }),


+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call
+          autocapture: false,
+          capture_pageview: false,
+          capture_pageleave: false,
+          opt_out_capturing_by_default: true,
+          disable_session_recording: true,
+          loaded: (posthog) => posthog.opt_out_capturing(),
+        }}


Copilot

Pull request overview

Copilot reviewed 50 out of 54 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (5)

tsconfig.base.json:1

This file content is not valid JSON. If the intent is to make the repo-root tsconfig.base.json a symlink to apps/tsconfig.base.json, ensure it is committed as a git symlink (mode 120000). If it lands as a regular file, TypeScript tooling reading tsconfig.base.json will fail to parse it.
libs:1
Like tsconfig.base.json, this looks like it’s meant to be a symlink. If it is committed as a normal file containing the text apps/libs, any tooling expecting libs/ to be a directory (or a workspace path) will break. Ensure this is a git symlink (mode 120000) rather than a regular file.
apps/infra-local/scripts/stack-up.sh:1
On macOS/BSD, xargs does not support -r (GNU-only), so this will error and can break set -e flows. Replace the pipeline with a macOS-compatible pattern (e.g., capture PIDs and conditionally kill, or use an xargs invocation that does not rely on -r). Apply the same fix to the other xargs -r occurrences in this script.
apps/runner/pkg/boxlite/registry.go:1
The variable name linuxAmd64Platform is now inaccurate since it can be arm64 (or others). Rename it to reflect the new behavior (e.g., linuxDefaultPlatform, linuxHostArchPlatform, etc.) to avoid confusing future readers.
apps/package.json:1
As of Aug 2025, @swc/core versions were not in the 1.15.x range; ^1.15.33 may be a non-existent release and would break installs. Please confirm the intended @swc/core version exists in npm (and align it with the repo’s SWC toolchain expectations).

+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,


+    const KEY = '__msw_api_keys__'
+    const load = (): ApiKey[] => {
+      try { return JSON.parse(sessionStorage.getItem(KEY) ?? '[]') } catch { return [] }
+    }
+    const save = (s: ApiKey[]) => sessionStorage.setItem(KEY, JSON.stringify(s))


+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call
+          autocapture: false,
+          capture_pageview: false,
+          capture_pageleave: false,
+          opt_out_capturing_by_default: true,
+          disable_session_recording: true,
+          loaded: (posthog) => posthog.opt_out_capturing(),
+        }}
+      >


Copilot

Pull request overview

Copilot reviewed 50 out of 54 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (4)

apps/infra-local/scripts/stack-up.sh:1

xargs -r is not supported by macOS/BSD xargs, so this command will fail on the target platform. Replace with a portable pattern (e.g., capture PIDs into a variable and only call kill when non-empty, or use xargs kill -9 guarded by if/grep -q .). Apply the same fix to the runner/proxy/dashboard blocks that use xargs -r.
apps/runner/pkg/boxlite/registry.go:1
The variable name linuxAmd64Platform is now misleading because it can resolve to non-amd64 architectures. Rename it to something accurate (e.g., linuxHostPlatform or defaultLinuxPlatform) to avoid confusion and incorrect future usage.
apps/package.json:1
Using a caret range in resolutions can reduce determinism (the resolved version may vary over time and across tooling). Consider pinning @types/express to an exact version to keep installs reproducible and avoid surprise type changes.
tsconfig.base.json:1
This appears to be a Git symlink (stored as a file containing the target path). On platforms/environments that don’t support symlinks (or where Git symlinks are disabled, common on Windows), this will become a plain text file and break TypeScript tooling expecting valid JSON at tsconfig.base.json. If Windows support is needed, consider a real JSON file that extends apps/tsconfig.base.json instead of a symlink.

+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,


+  defaultRegionId: 'local',
+  defaultRegion: 'local',
+  role: 'OWNER',
+}


+        userId: _USER_ID,
+        email: 'admin@boxlite.dev',
+        name: 'Local Admin',
+        role: 'owner',


+    return HttpResponse.json({
+      id: _USER_ID,
+      email: 'admin@boxlite.dev',
+      name: 'Local Admin',
+      role: 'OWNER',
+      personalOrganizationId: _ORG_ID,
+    })


+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call
+          autocapture: false,
+          capture_pageview: false,
+          capture_pageleave: false,
+          opt_out_capturing_by_default: true,
+          disable_session_recording: true,
+          loaded: (posthog) => posthog.opt_out_capturing(),
+        }}
+      >


Copilot

Pull request overview

Copilot reviewed 49 out of 53 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

apps/runner/pkg/boxlite/registry.go:1

The variable name linuxAmd64Platform is now misleading because Architecture is derived from runtime.GOARCH (often arm64 on Apple Silicon). Rename it to something architecture-agnostic (e.g., linuxDefaultPlatform / linuxHostPlatform) to avoid confusion and prevent future misuse.

+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,
+      environment: 'local',
      billingApiUrl: BILLING_API_URL,


Copilot

Pull request overview

Copilot reviewed 49 out of 53 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (4)

tsconfig.base.json:1

tsconfig.base.json must be valid JSON for TypeScript/Nx tooling; the current content is a plain path string and will cause parsing failures. If the intent is to share apps/tsconfig.base.json, either commit a real symlink (so the repo root tsconfig.base.json resolves to the apps file) or replace this with a minimal JSON file that extends the apps base config.
libs:1
A root-level libs entry is expected to be a directory (or a symlink to a directory) in most workspaces; committing it as a plain file containing a path will break module resolution and filesystem expectations. If this is intended to be a symlink to apps/libs, commit it as an actual symlink (or adjust workspace configuration so code references apps/libs directly).
apps/runner/pkg/boxlite/registry.go:1
The identifier linuxAmd64Platform is now misleading because it no longer hardcodes amd64 and can evaluate to arm64, etc. Rename it to something architecture-agnostic (e.g., linuxHostPlatform / defaultLinuxPlatform) to avoid incorrect assumptions elsewhere.
apps/package.json:1
The specified node-forge version ^1.4.0 may not exist on npm (as of common published versions, node-forge is typically 1.3.x). If this version is not actually published, installs will fail; please verify the published version and pin to an existing release (or update to the correct package/version if a different fork is intended).

+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,
+      environment: 'local',


+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call


Copilot

Pull request overview

Copilot reviewed 49 out of 53 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (6)

apps/runner/pkg/boxlite/registry.go:1

The variable name linuxAmd64Platform is now misleading since it can be arm64 (or other). Rename to something architecture-neutral (e.g. linuxHostPlatform / defaultLinuxPlatform) so callers don’t infer it is always amd64.
tsconfig.base.json:1
This appears to be a committed symlink (Git stores symlinks as a file whose contents is the target path). Symlinks can be problematic on Windows and in some packaging/CI environments; consider replacing this with a real tsconfig.base.json that uses extends to reference apps/tsconfig.base.json, or document/enforce symlink support in contributing/CI.
libs:1
Similar to tsconfig.base.json, this looks like a committed symlink to apps/libs. If cross-platform support matters, consider using an actual directory or a tooling-level path mapping instead of a symlink, or ensure CI and team dev environments can reliably handle symlinks.
apps/package.json:1
Based on versions available up to my knowledge cutoff (Aug 2025), node-forge@^1.4.0 and @swc/core@^1.15.33 look potentially non-existent or at least unusually high jumps. Please verify these versions exist on npm (and that the lockfile/tooling supports them); if not, pin to a published version or adjust the range to the intended release line.
apps/package.json:1
Based on versions available up to my knowledge cutoff (Aug 2025), node-forge@^1.4.0 and @swc/core@^1.15.33 look potentially non-existent or at least unusually high jumps. Please verify these versions exist on npm (and that the lockfile/tooling supports them); if not, pin to a published version or adjust the range to the intended release line.
apps/package.json:1
Using a caret range in resolutions can undermine determinism and make installs vary across time. Prefer pinning an exact version in resolutions (and relying on the lockfile) unless there’s a specific reason to allow upgrades here.

+    ts: TopologicalSorter[str] = TopologicalSorter()
+    for name, spec in services.items():
+        ts.add(name, *spec.depends_on)
+    ts.prepare()
+    layers: list[list[str]] = []
+    while ts.is_active():
+        layer = sorted(ts.get_ready())
+        if not layer:
+            break
+        layers.append(layer)
+        for name in layer:
+            ts.done(name)
+    return layers


+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,
+      environment: 'local',
      billingApiUrl: BILLING_API_URL,


DorianZheng · 2026-06-04T07:25:34Z

        "main": "apps/api/src/main.ts",
        "tsConfig": "apps/api/tsconfig.app.json",
-        "generatePackageJson": true,
+        "generatePackageJson": false,


Reverted in 3424fb25 — back to generatePackageJson: true. The local stack runs the API via nx serve api (not nx build), so this prod-build option was never needed locally; disabling it only risked the deploy artifact. Resolving.

Copilot

Pull request overview

Copilot reviewed 49 out of 53 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (4)

tsconfig.base.json:1

tsconfig.base.json is not valid JSON as committed (it contains only a path). If this is intended to be a symlink, ensure it’s committed as a symlink in git (mode 120000); otherwise TypeScript tooling will fail when reading the root tsconfig. If symlinks are not desired/portable, replace this with an actual JSON tsconfig that extends ./apps/tsconfig.base.json.
libs:1
This file content looks like a symlink target (and as a regular file it’s not a valid directory). If the intent is a repository-root libs symlink to apps/libs, make sure it’s committed as a symlink; otherwise consumers expecting a libs/ directory at repo root will break.
apps/runner/pkg/boxlite/registry.go:1
The variable name linuxAmd64Platform is now misleading because it no longer hardcodes amd64 (it uses runtime.GOARCH). Rename it to reflect the new semantics (e.g., linuxHostPlatform, linuxDefaultPlatform, or linuxPlatformForHostArch) to avoid confusing future readers and callers.
apps/package.json:1
As of my knowledge cutoff (Aug 2025), node-forge’s latest published version is 1.3.1 and ^1.4.0 is likely not resolvable from npm. Please verify the intended version exists; otherwise installs will fail in CI. If the goal is simply to add node-forge, consider pinning to ^1.3.1 (or whatever the current published latest is) and updating @types/node-forge accordingly.

+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call
+          autocapture: false,
+          capture_pageview: false,
+          capture_pageleave: false,
+          opt_out_capturing_by_default: true,
+          disable_session_recording: true,
+          loaded: (posthog) => posthog.opt_out_capturing(),
+        }}


+      proxyToolboxUrl: 'http://localhost:28080',
+      defaultSnapshot: 'ubuntu:22.04',
+      dashboardUrl: 'http://localhost:3000',
+      maxAutoArchiveInterval: 43200,
+      maintananceMode: false,
+      environment: 'local',
      billingApiUrl: BILLING_API_URL,
-    })
+    } as Partial<BoxliteConfiguration>)


+    // Bootstrap-aware (mirror evaluateFlag's behavior for object resolution).
    if (!this.isConfigured || !this.client) {
+      // Object flags are rare in our usage; not added to bootstrapFlags map
+      // (typed as boolean | string | number). Fall through to default.
      logger.debug(`PostHog not configured, returning default value for flag ${flagKey}`)
      return {
        value: defaultValue,


DorianZheng · 2026-06-04T07:24:04Z

+        // organization.defaultRegionId is set silently fails — leaving
+        // the new user with no Personal org and the dashboard with
+        // "Cannot read properties of undefined (reading 'id')".
+        personalOrganizationDefaultRegionId: this.configService.getOrThrow('defaultRegion.id'),


why change this in this PR?

Fair — out of scope for infra-local. Reverted in 3424fb25. It's a real fix (OIDC-login auto-created users get a Personal org with defaultRegionId=undefined, mirroring what app.service.ts already does for the seed path), but it belongs in its own PR rather than bundled here. I'll open a standalone PR for it. Resolving.

Standalone PR opened: #668 (with a reproducer test, two-sided verified).

DorianZheng · 2026-06-04T07:26:49Z

@@ -0,0 +1,217 @@
+# Milestone: `milestone/infra-local/v0.1.0`


do not file development documents not related to the repo

Agreed. Removed the development-journal docs in 3424fb25 — milestone snapshot, infra-local-status.md, and own-dog-food-local-infra-solution.md are gone (history lives in git), and the README/usage links that pointed at them are fixed. Kept only the operational README.md / CONNECTIONS.md / infra-local-usage.md. Resolving.

DorianZheng · 2026-06-04T07:27:38Z


 - [docs/development/rust-style.md](./docs/development/rust-style.md)

+## Documentation Language


do not add this

Removed in 3424fb25 — CLAUDE.md is reverted to main, so this PR no longer touches it. Resolving.

cla-assistant · 2026-06-06T08:39:57Z

All committers have signed the CLA.

coderabbitai · 2026-06-07T06:53:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a BoxLite-based local infra stack, local feature-flag bootstrap fallbacks for API/dashboard, Python orchestrator and shell wrappers for L1/L2, comprehensive integration and unit tests, workspace/build adjustments, and operational docs for running the local development environment.

Changes

Feature-flag bootstrap for local development

Layer / File(s)	Summary
API feature-flag bootstrap contract and evaluation `apps/api/src/app.module.ts`, `apps/api/src/common/providers/openfeature-posthog.provider.ts`	OpenFeature PostHog provider now supports optional `bootstrapFlags` that supply default flag values when PostHog is not configured locally, returned with `STATIC` resolution reason.
Dashboard local bootstrap and mock API `apps/dashboard/src/components/PostHogProviderWrapper.tsx`, `apps/dashboard/src/mocks/handlers.ts`	PostHog provider mounts in stub no-network mode for local unconfigured cases, bootstrapping flags from defaults. MSW handlers now provide local CRUD for org/user/API-key endpoints and catch-all fallbacks for unimplemented mutations.

BoxLite local infrastructure orchestration

Layer / File(s)	Summary
Workspace and build plumbing `.gitignore`, `apps/api/tsconfig.json`, `apps/api/webpack.config.js`, `apps/package.json`, `apps/runner/pkg/boxlite/registry.go`, `libs`, `tsconfig.base.json`	Updates root wiring (symlinks, build config extends path), dev-only Webpack plugin filtering, vitest version bump, and dynamic host-architecture platform detection for registry pulls.
Python types, configuration, and utilities `apps/infra-local/boxlite_local/types.py`, `apps/infra-local/boxlite_local/config.py`, `apps/infra-local/pyproject.toml`, `apps/infra-local/boxlite_local/__init__.py`, `apps/infra-local/boxlite_local/execwrap.py`	Defines health check and service spec data structures, `InfraConfig` dataclass loading from `BOXLITE_*` env vars, module entry point, and async execution wrapper for box command collection.
CLI and preflight validation `apps/infra-local/boxlite_local/cli.py`, `apps/infra-local/boxlite_local/__main__.py`, `apps/infra-local/boxlite_local/doctor.py`	Implements argparse-based CLI dispatcher (doctor/up/down/ps subcommands), preflight checks for SDK imports, runtime reachability, and port availability.
Service definitions and orchestration `apps/infra-local/boxlite_local/services.py`, `apps/infra-local/boxlite_local/orchestrator.py`	Declares services with health checks, dependencies, and config generation (Postgres, Redis, MinIO, registry, Dex, Jaeger, pgAdmin, OTEL, Caddy). Implements async orchestration: topological startup, health probes with retries, one-shot lifecycle, and status listing.
Makefile and shell orchestration `apps/infra-local/Makefile`, `apps/infra-local/scripts/_stack-common.sh`, `apps/infra-local/scripts/stack-build.sh`, `apps/infra-local/scripts/stack-down.sh`, `apps/infra-local/scripts/stack-logs.sh`, `apps/infra-local/scripts/stack-restart.sh`, `apps/infra-local/scripts/stack-status.sh`	Makefile wraps Python CLI and delegates to shell scripts. Shared helper script provides component constants, logging, PID/log file management, health checks. Individual scripts handle build, component lifecycle with port conflict resolution, status reporting, and log tailing.
Stack reset and full startup `apps/infra-local/scripts/stack-reset.sh`, `apps/infra-local/scripts/stack-up.sh`	Three-tier reset modes (soft: truncate tables; hard: drop/recreate schema; nuke: full wipe). Stack-up orchestrates L1 detection/bootstrap, L2 process startup with health checks, symlink setup, and seed-data verification.
Initialization and seeding `apps/infra-local/configs/minio/init.sh`, `apps/infra-local/scripts/seed-init-data.sh`	MinIO init script creates default bucket. Seed script verifies admin user, personal org, default region, and snapshot `active` state with configurable timeouts.
Comprehensive test coverage `apps/infra-local/tests/integration/test_e2e_full.py`, `apps/infra-local/tests/integration/test_multi_service.py`, `apps/infra-local/tests/unit/test_*.py`	E2E tests validate multi-service behavior: SQL/KV/S3 roundtrips, registry catalog, Dex JWKS, Jaeger API, OTLP receiver, Caddy routes, stability, and resource budget. Unit tests cover config parsing, doctor checks, orchestrator probes, and topology sorting.
Operational documentation `apps/infra-local/README.md`, `apps/infra-local/CONNECTIONS.md`, `docs/apps/infra-local-usage.md`, `CLAUDE.md`	README describes L1/L2 architecture and quick-start. CONNECTIONS.md documents service endpoints and env overrides. Usage guide covers prerequisites, setup, day-to-day loops, reset tiers, testing, and troubleshooting. CLAUDE.md adds communication guidance.
Supporting fixes `apps/api/src/sandbox/managers/sandbox.manager.ts`, `apps/api/src/auth/jwt.strategy.ts`	Normalizes SQL orderBy fields (remove quotes). Sets `personalOrganizationDefaultRegionId` on new user creation to prevent cascade failures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

boxlite-ai/boxlite#671: Both PRs change apps/runner/pkg/boxlite/registry.go to derive the registry platform from runtime.GOARCH (this PR does the same change).
boxlite-ai/boxlite#668: Overlaps the JwtStrategy.validate change that anchors personal organization default region to defaultRegion.id.

Suggested reviewers

law-chain-hot

Poem

🐰 I hopped through code to make dev life bright,
Bootstrapped flags when PostHog slipped from sight,
Boxes and scripts hum in a tidy stack,
Run make stack-up and watch the world come back!

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)

docs/apps/infra-local-usage.md-230-233 (1)
230-233: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block in Section 6.1.

Line 230 opens a fenced block without a language, which triggers markdownlint rule MD040.
💡 Proposed fix
-```
+```text
 http://localhost:3000  → pick admin / user to log in (dex static users)
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/apps/infra-local-usage.md around lines 230 - 233, The fenced code block
in Section 6.1 is missing a language identifier which triggers markdownlint
MD040; update the backticks that start the block around the line containing
"http://localhost:3000 → pick admin / user to log in (dex static users)" to
include a language hint such as text (e.g., change totext) so the fenced
block is properly annotated.
</details>



_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>docs/apps/infra-local-usage.md-16-31 (1)</summary><blockquote>

`16-31`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Use one component-selector variable consistently in examples.**

Line 16 uses `COMPONENT=...`, while Line 31 documents `COMPONENTS=` as the selector and later examples also mix forms. This is copy/paste-prone and can break command usage.

<details>
<summary>💡 Proposed fix</summary>

```diff
-make stack-logs COMPONENT=api
+make stack-logs COMPONENTS=api
@@
-make stack-logs COMPONENT=runner                 # single component
-make stack-logs COMPONENT=all                    # everything
+make stack-logs COMPONENTS=runner                # single component
+make stack-logs COMPONENTS=all                   # everything
```
</details>

  


Also applies to: 215-217

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/apps/infra-local-usage.md` around lines 16 - 31, The docs use two
different component selector variables (`COMPONENT=` vs `COMPONENTS=`) which is
inconsistent and error-prone; pick one name and update all examples to use it
consistently (e.g., replace `COMPONENT=api` with `COMPONENTS=api` and ensure
`make stack-restart`/`make stack-logs`/`make stack-down` examples and the
explanatory sentence all reference the same `COMPONENTS=` variable, including
the other occurrences noted around lines 215-217); also ensure the parenthetical
"(empty = all)" matches the chosen variable name.
```

</details>



</blockquote></details>
<details>
<summary>docs/apps/infra-local-usage.md-127-132 (1)</summary><blockquote>

`127-132`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Avoid direct `go build` in this workflow section; use make wrapper commands.**

This section bypasses the documented stack wrappers and can drift from repo-standard flags/paths. Prefer `make stack-restart COMPONENTS=runner` (or documented wrapper equivalents) for consistency with the rest of this guide.

Based on learnings: "Use `make` targets for build, test, lint, format, setup, and distribution. Do not run `cargo`, `npm`, `python`, `go`, or `cbindgen` directly when a make target exists."

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/apps/infra-local-usage.md` around lines 127 - 132, Replace the direct
`go build` workflow snippet that kills and rebuilds the runner with the
repo-standard make wrapper; update the block that currently shows `pkill -9 -f
boxlite-runner` and `cd apps/runner && go build -o /tmp/boxlite-runner
./cmd/runner && cd -` to use the documented make target (e.g., `make
stack-restart COMPONENTS=runner`) so builds and restarts go through the
repository's standardized wrappers and flags; ensure the surrounding text
references the status doc cheatsheet as before.
```

</details>



_Source: Learnings_

</blockquote></details>
<details>
<summary>apps/infra-local/boxlite_local/config.py-10-15 (1)</summary><blockquote>

`10-15`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Validate port env vars with bounds (1-65535) at load time.**

Current parsing accepts numeric but invalid ports (`0`, negatives, `>65535`), so failures are deferred to runtime networking paths instead of failing fast at config boundary.

<details>
<summary>Proposed fix</summary>

```diff
-def _parse_int_env(name: str, default: str) -> int:
+def _parse_int_env(
+    name: str,
+    default: str,
+    *,
+    min_value: int | None = None,
+    max_value: int | None = None,
+) -> int:
     raw = os.environ.get(name, default)
     try:
-        return int(raw)
+        value = int(raw)
     except ValueError as e:
         raise ValueError(f"{name} must be an integer, got: {raw!r}") from e
+    if min_value is not None and value < min_value:
+        raise ValueError(f"{name} must be >= {min_value}, got: {value}")
+    if max_value is not None and value > max_value:
+        raise ValueError(f"{name} must be <= {max_value}, got: {value}")
+    return value
```

```diff
-            pg_host_port=_parse_int_env("BOXLITE_PG_HOST_PORT", "25432"),
+            pg_host_port=_parse_int_env("BOXLITE_PG_HOST_PORT", "25432", min_value=1, max_value=65535),
...
-            redis_host_port=_parse_int_env("BOXLITE_REDIS_HOST_PORT", "26379"),
+            redis_host_port=_parse_int_env("BOXLITE_REDIS_HOST_PORT", "26379", min_value=1, max_value=65535),
```

(Apply the same bounds to all `*_PORT` fields.)
</details>

Based on learnings: “Validate inputs at the boundary. Untrusted inputs should be checked where they enter” and “Use explicit error handling: fail fast on missing config/invalid inputs.”  
   


Also applies to: 80-99

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/boxlite_local/config.py` around lines 10 - 15, The
_parse_int_env function currently converts env values to int but doesn't
validate port ranges, allowing invalid ports (0, negative, >65535); update
_parse_int_env to check that the parsed integer is within 1..65535 and raise a
ValueError with a clear message like "{name} must be a valid port between 1 and
65535, got: {raw!r}" when out of range, and ensure every configuration entry
that uses _parse_int_env for any *_PORT setting (all occurrences of parsing port
envs) uses this tightened validation so invalid ports fail fast at load time.
```

</details>



_Source: Learnings_

</blockquote></details>
<details>
<summary>apps/infra-local/CONNECTIONS.md-40-42 (1)</summary><blockquote>

`40-42`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Add language identifiers to unlabeled fenced code blocks (MD040).**

These blocks should use explicit fence languages (e.g., `text`) to satisfy markdownlint.





Also applies to: 45-47, 81-83, 86-88, 138-140, 144-146, 189-191

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @apps/infra-local/CONNECTIONS.md around lines 40 - 42, Several fenced code
blocks in CONNECTIONS.md are unlabeled which triggers markdownlint MD040; update
each unlabeled triple-backtick fence (including the shown postgres URI block and
the other occurrences at the commented ranges) to include a language identifier
such as "text" (e.g., change totext) so the code blocks are explicitly
labeled; search for similar unlabeled fences in CONNECTIONS.md and add the
language identifier consistently.
</details>



_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>apps/infra-local/README.md-226-230 (1)</summary><blockquote>

`226-230`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Add fence languages to satisfy MD040 markdownlint warnings.**

The fenced blocks here should specify a language (e.g., `text`) to keep docs lint-clean.




<details>
<summary>Suggested patch</summary>

```diff
-     ```
+     ```text
      127.0.0.1 pgadmin.boxlite.test
      127.0.0.1 jaeger.boxlite.test
      # ... etc
      ```
@@
-```
+```text
 apps/infra-local/
 ├── Makefile                          # convenience wrappers (L1 + stack-* L2)
 ...
 └── tests/
 ...
-```
+```
```
</details>


Also applies to: 276-325

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @apps/infra-local/README.md around lines 226 - 230, The Markdown fenced code
blocks that show the hosts entries (the block containing "127.0.0.1
pgadmin.boxlite.test" / "127.0.0.1 jaeger.boxlite.test") and the directory tree
block starting with "apps/infra-local/" (and the other similar fenced blocks in
the same README section) must specify a fence language to satisfy MD040; update
each opening fence from totext so the hosts block and the
directory-tree/other blocks include the language identifier while leaving the
block contents unchanged.
</details>



_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>apps/infra-local/CONNECTIONS.md-116-121 (1)</summary><blockquote>

`116-121`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Fix the `aws` example command syntax.**

`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` must prefix the command, not trail after `s3 ls`; the current snippet is not a valid invocation.




<details>
<summary>Suggested patch</summary>

```diff
-aws --endpoint-url http://127.0.0.1:29000 \
-    --region us-east-1 \
-    s3 ls \
-  AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin
+AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin \
+aws --endpoint-url http://127.0.0.1:29000 \
+    --region us-east-1 \
+    s3 ls
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/CONNECTIONS.md` around lines 116 - 121, The aws CLI example
is invalid because the environment variables are placed after the command; put
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY before the `aws --endpoint-url ...
s3 ls` invocation so they prefix the command (e.g. export or inline before `aws
--endpoint-url ... s3 ls`), ensuring the credentials are applied to the `aws`
call rather than trailing after `s3 ls`.
```

</details>



</blockquote></details>
<details>
<summary>apps/infra-local/scripts/build-all-in-one-sql.py-1027-1029 (1)</summary><blockquote>

`1027-1029`: _⚠️ Potential issue_ | _🟡 Minor_

**Narrow `except BaseException` to `except Exception` in `apps/infra-local/scripts/build-all-in-one-sql.py`**

This loop intentionally collects per-migration failures and continues, but the file has no other handling for `KeyboardInterrupt`/`SystemExit`, so catching `BaseException` can swallow normal abort semantics.

<details>
<summary>Suggested patch</summary>

```diff
-        except BaseException as exc:  # noqa: BLE001
+        except Exception as exc:
             failures.append((path, exc))
             continue
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/scripts/build-all-in-one-sql.py` around lines 1027 - 1029,
Replace the overly broad except BaseException in the migration loop with except
Exception so system-exiting signals aren't swallowed; specifically update the
except clause that currently does "except BaseException as exc:" (the block that
appends to the failures list using failures.append((path, exc)) and continues)
to "except Exception as exc:" to preserve KeyboardInterrupt/SystemExit behavior
while still collecting migration errors.
```

</details>



_Source: Learnings_

</blockquote></details>
<details>
<summary>apps/infra-local/scripts/seed-init-data.sh-125-129 (1)</summary><blockquote>

`125-129`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Flag parsing is order-dependent and can ignore `--no-bounce`.**

`--no-bounce` is only honored when it is the first argument. Example: `seed-init-data.sh --no-wait --no-bounce` still bounces API.





<details>
<summary>Proposed fix</summary>

```diff
-if [ "${1:-}" = "--no-bounce" ]; then
-  log "skipping API bounce (--no-bounce); assuming API just woke + is running its boot cycle"
-else
-  bounce_api_if_running
-fi
+NO_BOUNCE=false
+NO_WAIT=false
+for arg in "$@"; do
+  case "$arg" in
+    --no-bounce) NO_BOUNCE=true ;;
+    --no-wait) NO_WAIT=true ;;
+    *) die "unknown argument: $arg" ;;
+  esac
+done
+
+if [ "$NO_BOUNCE" = "true" ]; then
+  log "skipping API bounce (--no-bounce); assuming API just woke + is running its boot cycle"
+else
+  bounce_api_if_running
+fi
@@
-if [ "${1:-}" = "--no-wait" ] || [ "${2:-}" = "--no-wait" ]; then
+if [ "$NO_WAIT" = "true" ]; then
```
</details>


Also applies to: 144-144

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/scripts/seed-init-data.sh` around lines 125 - 129, The
script currently checks only the first positional parameter for "--no-bounce",
so flags like "--no-wait --no-bounce" are ignored; update the check in
seed-init-data.sh to scan all args (e.g., loop over "$@" or use a case pattern)
and set a boolean if any argument equals "--no-bounce", then use that boolean to
decide whether to call bounce_api_if_running; apply the same change to the other
occurrence referenced (around the second check) so both places reliably honor
--no-bounce regardless of argument order.
```

</details>



</blockquote></details>
<details>
<summary>apps/infra-local/scripts/_stack-common.sh-55-69 (1)</summary><blockquote>

`55-69`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Validate PID ownership before declaring a component “running”.**

Line 63 treats any live PID from the pidfile as valid. If the PID was recycled, `stack-up` can incorrectly skip startup for a component that is actually down. Please verify the PID belongs to the expected process (or pair liveness with the expected listening port) before returning it.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/scripts/_stack-common.sh` around lines 55 - 69,
component_pid currently trusts any live PID from pid_file as the component
owner; update it to validate ownership before returning the PID by (1) reading
the pid from pid_file (existing pid="$(cat "$pf")"), (2) checking
/proc/$pid/cmdline or /proc/$pid/exe (or lsof/ss on the expected listening port)
to confirm the process command or listening socket matches the expected
component identity, and only then echo the pid; if validation fails, remove the
stale pidfile (rm -f "$pf") and return empty. Use the same pid_file and
component_pid symbols so stack-up will skip only when a confirmed matching
process is running.
```

</details>



</blockquote></details>
<details>
<summary>apps/infra-local/scripts/stack-logs.sh-20-23 (1)</summary><blockquote>

`20-23`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Handle `all` mode when no log files exist.**

On Line 22, `tail -F "${LOGS_DIR}"/*.log` fails when there are no `*.log` files yet. Add a guard so `stack-logs.sh all` returns a clear message instead of an immediate error.

<details>
<summary>Proposed fix</summary>

```diff
 if [ "$1" = "all" ]; then
   # `tail -f` multiple files at once; each line gets ==> filename <== headers.
-  exec tail -F "${LOGS_DIR}"/*.log
+  compgen -G "${LOGS_DIR}/*.log" > /dev/null || die "no log files yet in ${LOGS_DIR}"
+  exec tail -F "${LOGS_DIR}"/*.log
 fi
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/scripts/stack-logs.sh` around lines 20 - 23, The "all"
branch in stack-logs.sh currently runs tail -F "${LOGS_DIR}"/*.log which fails
if no log files exist; update the if-block that handles "$1" = "all" to first
check whether any files match the pattern (e.g. test for the presence of
"${LOGS_DIR}"/*.log or use a glob check) and if none are found print a clear
message like "No log files found in $LOGS_DIR" and exit non‑error (or exit 0)
instead of invoking tail; keep using LOGS_DIR and the existing tail -F
"${LOGS_DIR}"/*.log when matches are present.
```

</details>



</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (4)</summary><blockquote>

<details>
<summary>apps/package.json (1)</summary><blockquote>

`116-116`: _⚡ Quick win_

**Move `@types/node-forge` to `devDependencies`.**

This is a type-only package and should not ship in runtime dependencies.

<details>
<summary>♻️ Proposed fix</summary>

```diff
-    "`@types/node-forge`": "^1.3.14",
     "algoliasearch": "^5.30.0",
```

```diff
   "devDependencies": {
+    "`@types/node-forge`": "^1.3.14",
     "`@babel/preset-env`": "^7.22.0",
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/package.json` at line 116, The dependency "`@types/node-forge`" is a
type-only package and should be moved from dependencies to devDependencies in
package.json; open the file, remove the "`@types/node-forge`" entry from the
top-level "dependencies" object and add the same entry (with the same version
"^1.3.14") under "devDependencies", then run npm/yarn install (or update
lockfile) to ensure the lockfile reflects the change and that no runtime code
imports this types-only package.
```

</details>



</blockquote></details>
<details>
<summary>apps/infra-local/tests/unit/test_doctor_lsof.py (1)</summary><blockquote>

`6-47`: _⚡ Quick win_

**Add focused unit tests for `check_port_free` subprocess branches.**

Please add monkeypatched tests for: `lsof` missing, non-zero + stderr, and timeout/error branch. These are the high-risk paths that gate `stack-up`.

   

Based on learnings: Add or update tests when behavior changes around branching, retries, security checks, or boundaries.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/tests/unit/test_doctor_lsof.py` around lines 6 - 47, Add
unit tests that monkeypatch subprocess.run for the check_port_free code paths in
boxlite_local.doctor: simulate (1) FileNotFoundError to test the "lsof missing"
branch, (2) a CompletedProcess with returncode != 0 and non-empty stderr to test
the non-zero+stderr branch, and (3) a subprocess.TimeoutExpired (or other
exception) to exercise the timeout/error branch; place these tests alongside the
existing _parse_lsof_F tests (e.g., in
apps/infra-local/tests/unit/test_doctor_lsof.py), use monkeypatch to replace
subprocess.run or the function that wraps it, and assert the expected return
values or error handling behavior of check_port_free (refer to the
check_port_free function and any helper that invokes subprocess.run).
```

</details>



_Source: Learnings_

</blockquote></details>
<details>
<summary>apps/infra-local/README.md (1)</summary><blockquote>

`348-349`: _⚡ Quick win_

**Use the Make target for status instead of direct Python invocation.**

Replace `python -m boxlite_local ps` with `make ps` to stay consistent with the project workflow contract.

Based on learnings: "Use make targets for build, test, lint, format, setup, and distribution. Do not run ... python directly when a make target exists."





<details>
<summary>Suggested patch</summary>

```diff
-**Debug a stuck service:** `make stack-status` first → identify the red
-component → use the lightest possible cleanup. `python -m boxlite_local ps`
+**Debug a stuck service:** `make stack-status` first → identify the red
+component → use the lightest possible cleanup. `make ps`
 shows L1 box state; `boxlite logs boxlite-local-<name>` shows guest
 logs; `make stack-logs COMPONENT=<name>` tails L2 logs from
 `apps/infra-local/.logs/`.
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/README.md` around lines 348 - 349, Replace the direct Python
invocation "python -m boxlite_local ps" with the Make target "make ps" wherever
it appears in documentation or examples (e.g., the README snippet referencing L1
box state); update the command examples and any surrounding text so they
instruct developers to run "make ps" for status, preserving existing notes about
"boxlite logs boxlite-local-<name>" and ensuring formatting/inline code
backticks match the rest of the doc.
```

</details>



_Source: Learnings_

</blockquote></details>
<details>
<summary>apps/infra-local/Makefile (1)</summary><blockquote>

`89-91`: _⚡ Quick win_

**Use `$(PY)` in `stack-rebuild-l1-box` for interpreter consistency.**

Line 91 hardcodes `python`, bypassing the Makefile’s configured interpreter variable (`PY`).





<details>
<summary>Proposed fix</summary>

```diff
-	python -m boxlite_local up $(BOX)
+	$(PY) -m boxlite_local up $(BOX)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/infra-local/Makefile` around lines 89 - 91, The rule for
stack-rebuild-l1-box hardcodes the interpreter by calling "python -m
boxlite_local up $(BOX)"; change it to use the Makefile's interpreter variable
by replacing that invocation with "$(PY) -m boxlite_local up $(BOX)" so the
target (stack-rebuild-l1-box) respects the configured PY value and interpreter
consistency across the Makefile.
```

</details>



</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @apps/api/src/sandbox/managers/sandbox.manager.ts:

Line 259: The ORDER BY clauses are using an unquoted mixed-case identifier
which PostgreSQL folds to lowercase; update the .orderBy calls that reference
activity.lastActivityAt (e.g., the occurrence in sandbox_last_activity and the
other .orderBy at the later query) to use a quoted identifier matching the WHERE
clauses — change the argument from 'activity.lastActivityAt' to
'activity."lastActivityAt"' in both places so it matches
activity."lastActivityAt" used elsewhere.

In @apps/infra-local/boxlite_local/cli.py:

Around line 30-35: The CLI currently lets p_up and p_down accept any service
names and silently ignore typos; update the argument parsing to validate the
"services" values against the canonical SERVICES list at parse-time and raise a
parser error for any unknown names. Change the add_argument call for "services"
(both p_up and p_down) to use a custom type or argparse Action that checks each
value is in SERVICES (or use choices=SERVICES) so argparse fails fast; ensure
the same validation logic is applied to the other parser(s) referenced around
lines 75-79 so all entry points reject invalid service names immediately.

In @apps/infra-local/boxlite_local/doctor.py:

Around line 121-126: The lsof call in check_port_free() can hang; add a
timeout (e.g. timeout=5) to the subprocess.run(...) invocation and catch
subprocess.TimeoutExpired around that call (the call in check_port_free used by
doctor() and orchestrator.up()). On TimeoutExpired, log or surface a clear error
via the same logger used by doctor() / orchestrator (or return False) so the
preflight fails fast instead of blocking stack-up; ensure the handler preserves
existing behavior for other non-zero results.

In @apps/infra-local/boxlite_local/execwrap.py:

Around line 34-39: exec_collect currently awaits execution.wait() with no
bound; add a timeout parameter to exec_collect (e.g., timeout: Optional[float] =
None), pass it through to the SDK call that creates the execution (box.exec(...,
timeout=timeout)) and also guard the final wait with
asyncio.wait_for(execution.wait(), timeout) so the underlying process is killed
on timeout (catch asyncio.TimeoutError, call execution.kill() or
execution.terminate() and await execution.wait() to ensure cleanup, then
re-raise or return a clear timeout result). Update callers such as
_wait_one_shot_exit to pass an appropriate timeout (or None) when invoking
exec_collect and keep _wait_healthy_exec behavior unchanged if it already wraps
exec_collect in asyncio.wait_for.

In @apps/infra-local/boxlite_local/orchestrator.py:

Around line 228-231: The try/except around runtime.get currently swallows all
exceptions; change it to only treat a "box not found" case as idempotent and let
other errors surface: catch the specific not-found exception type thrown by your
runtime client (or inspect the exception for a not-found indicator) when calling
runtime.get and return False only in that branch, but for any other exception
re-raise (or log and raise) so transport/runtime failures aren’t masked; update
the try/except that wraps runtime.get (in the down/remove flow) accordingly and
ensure unexpected exceptions are not silently suppressed.

In @apps/infra-local/boxlite_local/services.py:

Line 62: Replace floating Docker tags in services.py so images are pinned to
immutable versions/digests: update "minio/minio:latest", "minio/mc:latest",
"joxit/docker-registry-ui:main", and "otel/opentelemetry-collector:latest" to
specific semver tags or sha256 digests (preferred). Locate these image strings
in apps/infra-local/boxlite_local/services.py and change them to explicit
versions (or image@sha256:...) and run your local stack-up to verify startup;
ensure any README or docs referencing these images are updated accordingly.

In @apps/infra-local/scripts/seed-init-data.sh:

Line 147: The script currently treats wait_for_default_snapshot failures as
non-fatal by calling warn and continuing to print ok "init data ready"; change
this so snapshot readiness failures cause the script to fail fast: replace the
warn fallback for wait_for_default_snapshot with a non-zero exit (e.g., call die
or exit 1) or otherwise return a failing status before the ok "init data ready"
message so downstream automation receives a non-zero exit; locate the call to
wait_for_default_snapshot and the later ok "init data ready" message and ensure
the failure path prevents reaching that ok and returns an error.

In @apps/infra-local/scripts/stack-up.sh:

Around line 74-76: The current use of ln -sf to create "${APPS_DIR}/.env" and
"${APPS_DIR}/apps" will clobber existing non-symlink files; update the script to
first test each target: if the path does not exist create the symlink with ln
-s, if it exists and is a symlink leave it alone, and if it exists but is not a
symlink print a clear error and exit non-zero so we don't overwrite developer
files; apply this logic for the ".env" and "apps" targets (references: APPS_DIR,
"${APPS_DIR}/.env", "${APPS_DIR}/apps", and the ln -s invocation).

Around line 85-88: The current port cleanup blindly runs lsof/... | xargs kill
-9 which can terminate unrelated processes; update the port cleanup in the
blocks that call port_listening and warn to: prefer finding stack-owned PID(s)
first (check a corresponding pidfile if one exists, or filter lsof/ps output by
a known command pattern for this stack using pgrep -f or lsof -c), send SIGTERM
to those PIDs, wait a short period (e.g. poll for port closure up to a few
seconds), and only use SIGKILL (kill -9) as a fallback if the port is still
held; replace direct lsof ... | xargs -r kill -9 with a safe function that
accepts the port (used where port_listening is called) and implements
pidfile/command validation, TERM then KILL fallback, and preserves the existing
logging via warn.

In @apps/infra-local/sql/merged-schema.auto-gen.sql:

Line 10: The merged schema enables "uuid-ossp" but never enables "pgcrypto",
so calls to gen_random_uuid() (referenced in the diff as gen_random_uuid()) will
fail on a fresh DB; update the schema generation step to emit CREATE EXTENSION
IF NOT EXISTS "pgcrypto"; ensure this CREATE EXTENSION is placed before any use
of gen_random_uuid() in the generated SQL (keep the existing CREATE EXTENSION IF
NOT EXISTS "uuid-ossp" as well) and modify the generator/template that produces
merged-schema.auto-gen.sql so future regenerations include the pgcrypto
extension.

In @apps/infra-local/tests/integration/test_e2e_full.py:

Around line 83-90: The fixture currently calls asyncio.run(setup()) outside
the try, so if setup raises (including pytest.skip) the finally cleanup
(asyncio.run(teardown()), mp.undo(), shutil.rmtree(tmp)) never runs; change the
fixture to call setup() inside a try block and track whether setup completed
(e.g., setup_done flag set to True after asyncio.run(setup()) returns) then in
the finally always perform mp.undo() and shutil.rmtree(tmp, ignore_errors=True)
and only call asyncio.run(teardown()) if setup_done (or ensure teardown is safe
to call unconditionally); ensure pytest.skip exceptions are allowed to propagate
but cleanup still runs.

In @apps/package.json:

Around line 343-345: The package.json currently forces a mismatched typing by
keeping "express" at ^5.2.0 while "resolutions" pins "@types/express" to
^4.17.25; update the resolutions (or remove it) so "@types/express" aligns with
Express 5 (e.g., change the resolution to @types/express@^5.x or delete the
override), then run install and TypeScript build to ensure no type conflicts in
NestJS 11; locate the mismatch in the "resolutions" block and adjust or remove
the "@types/express" entry accordingly.

Minor comments:
In @apps/infra-local/boxlite_local/config.py:

Around line 10-15: The _parse_int_env function currently converts env values
to int but doesn't validate port ranges, allowing invalid ports (0, negative,

65535); update _parse_int_env to check that the parsed integer is within
1..65535 and raise a ValueError with a clear message like "{name} must be a
valid port between 1 and 65535, got: {raw!r}" when out of range, and ensure
every configuration entry that uses _parse_int_env for any *_PORT setting (all
occurrences of parsing port envs) uses this tightened validation so invalid
ports fail fast at load time.

In @apps/infra-local/CONNECTIONS.md:

Around line 40-42: Several fenced code blocks in CONNECTIONS.md are unlabeled
which triggers markdownlint MD040; update each unlabeled triple-backtick fence
(including the shown postgres URI block and the other occurrences at the
commented ranges) to include a language identifier such as "text" (e.g., change
totext) so the code blocks are explicitly labeled; search for similar
unlabeled fences in CONNECTIONS.md and add the language identifier consistently.

Around line 116-121: The aws CLI example is invalid because the environment
variables are placed after the command; put AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY before the aws --endpoint-url ... s3 ls invocation so
they prefix the command (e.g. export or inline before aws --endpoint-url ... s3 ls), ensuring the credentials are applied to the aws call rather than
trailing after s3 ls.

In @apps/infra-local/README.md:

Around line 226-230: The Markdown fenced code blocks that show the hosts
entries (the block containing "127.0.0.1 pgadmin.boxlite.test" / "127.0.0.1
jaeger.boxlite.test") and the directory tree block starting with
"apps/infra-local/" (and the other similar fenced blocks in the same README
section) must specify a fence language to satisfy MD040; update each opening
fence from totext so the hosts block and the directory-tree/other blocks
include the language identifier while leaving the block contents unchanged.

In @apps/infra-local/scripts/_stack-common.sh:

Around line 55-69: component_pid currently trusts any live PID from pid_file
as the component owner; update it to validate ownership before returning the PID
by (1) reading the pid from pid_file (existing pid="$(cat "$pf")"), (2) checking
/proc/$pid/cmdline or /proc/$pid/exe (or lsof/ss on the expected listening port)
to confirm the process command or listening socket matches the expected
component identity, and only then echo the pid; if validation fails, remove the
stale pidfile (rm -f "$pf") and return empty. Use the same pid_file and
component_pid symbols so stack-up will skip only when a confirmed matching
process is running.

In @apps/infra-local/scripts/build-all-in-one-sql.py:

Around line 1027-1029: Replace the overly broad except BaseException in the
migration loop with except Exception so system-exiting signals aren't swallowed;
specifically update the except clause that currently does "except BaseException
as exc:" (the block that appends to the failures list using
failures.append((path, exc)) and continues) to "except Exception as exc:" to
preserve KeyboardInterrupt/SystemExit behavior while still collecting migration
errors.

In @apps/infra-local/scripts/seed-init-data.sh:

Around line 125-129: The script currently checks only the first positional
parameter for "--no-bounce", so flags like "--no-wait --no-bounce" are ignored;
update the check in seed-init-data.sh to scan all args (e.g., loop over "$@" or
use a case pattern) and set a boolean if any argument equals "--no-bounce", then
use that boolean to decide whether to call bounce_api_if_running; apply the same
change to the other occurrence referenced (around the second check) so both
places reliably honor --no-bounce regardless of argument order.

In @apps/infra-local/scripts/stack-logs.sh:

Around line 20-23: The "all" branch in stack-logs.sh currently runs tail -F
"${LOGS_DIR}"/.log which fails if no log files exist; update the if-block that
handles "$1" = "all" to first check whether any files match the pattern (e.g.
test for the presence of "${LOGS_DIR}"/.log or use a glob check) and if none
are found print a clear message like "No log files found in $LOGS_DIR" and exit
non‑error (or exit 0) instead of invoking tail; keep using LOGS_DIR and the
existing tail -F "${LOGS_DIR}"/*.log when matches are present.

In @docs/apps/infra-local-usage.md:

Around line 230-233: The fenced code block in Section 6.1 is missing a
language identifier which triggers markdownlint MD040; update the backticks that
start the block around the line containing "http://localhost:3000 → pick admin
/ user to log in (dex static users)" to include a language hint such as text
(e.g., change totext) so the fenced block is properly annotated.

Around line 16-31: The docs use two different component selector variables
(COMPONENT= vs COMPONENTS=) which is inconsistent and error-prone; pick one
name and update all examples to use it consistently (e.g., replace
COMPONENT=api with COMPONENTS=api and ensure make stack-restart/make stack-logs/make stack-down examples and the explanatory sentence all
reference the same COMPONENTS= variable, including the other occurrences noted
around lines 215-217); also ensure the parenthetical "(empty = all)" matches the
chosen variable name.

Around line 127-132: Replace the direct go build workflow snippet that kills
and rebuilds the runner with the repo-standard make wrapper; update the block
that currently shows pkill -9 -f boxlite-runner and cd apps/runner && go build -o /tmp/boxlite-runner ./cmd/runner && cd - to use the documented make
target (e.g., make stack-restart COMPONENTS=runner) so builds and restarts go
through the repository's standardized wrappers and flags; ensure the surrounding
text references the status doc cheatsheet as before.

Nitpick comments:
In @apps/infra-local/Makefile:

Around line 89-91: The rule for stack-rebuild-l1-box hardcodes the interpreter
by calling "python -m boxlite_local up $(BOX)"; change it to use the Makefile's
interpreter variable by replacing that invocation with "$(PY) -m boxlite_local
up $(BOX)" so the target (stack-rebuild-l1-box) respects the configured PY value
and interpreter consistency across the Makefile.

In @apps/infra-local/README.md:

Around line 348-349: Replace the direct Python invocation "python -m
boxlite_local ps" with the Make target "make ps" wherever it appears in
documentation or examples (e.g., the README snippet referencing L1 box state);
update the command examples and any surrounding text so they instruct developers
to run "make ps" for status, preserving existing notes about "boxlite logs
boxlite-local-" and ensuring formatting/inline code backticks match the
rest of the doc.

In @apps/infra-local/tests/unit/test_doctor_lsof.py:

Around line 6-47: Add unit tests that monkeypatch subprocess.run for the
check_port_free code paths in boxlite_local.doctor: simulate (1)
FileNotFoundError to test the "lsof missing" branch, (2) a CompletedProcess with
returncode != 0 and non-empty stderr to test the non-zero+stderr branch, and (3)
a subprocess.TimeoutExpired (or other exception) to exercise the timeout/error
branch; place these tests alongside the existing _parse_lsof_F tests (e.g., in
apps/infra-local/tests/unit/test_doctor_lsof.py), use monkeypatch to replace
subprocess.run or the function that wraps it, and assert the expected return
values or error handling behavior of check_port_free (refer to the
check_port_free function and any helper that invokes subprocess.run).

In @apps/package.json:

Line 116: The dependency "@types/node-forge" is a type-only package and should
be moved from dependencies to devDependencies in package.json; open the file,
remove the "@types/node-forge" entry from the top-level "dependencies" object
and add the same entry (with the same version "^1.3.14") under
"devDependencies", then run npm/yarn install (or update lockfile) to ensure the
lockfile reflects the change and that no runtime code imports this types-only
package.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `92dacb5e-5744-4885-be73-16a14db0f4fc`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 6048e91e6c98afca85a5ca176cd2b1ffb174241d and 3424fb25eddedb742c6e9c0f07d50280d286896e.

</details>

<details>
<summary>📒 Files selected for processing (49)</summary>

* `.gitignore`
* `CLAUDE.md`
* `apps/api/src/app.module.ts`
* `apps/api/src/common/providers/openfeature-posthog.provider.ts`
* `apps/api/src/sandbox/managers/sandbox.manager.ts`
* `apps/api/tsconfig.json`
* `apps/dashboard/src/components/PostHogProviderWrapper.tsx`
* `apps/dashboard/src/mocks/handlers.ts`
* `apps/infra-local/CONNECTIONS.md`
* `apps/infra-local/Makefile`
* `apps/infra-local/README.md`
* `apps/infra-local/boxlite_local/__init__.py`
* `apps/infra-local/boxlite_local/__main__.py`
* `apps/infra-local/boxlite_local/cli.py`
* `apps/infra-local/boxlite_local/config.py`
* `apps/infra-local/boxlite_local/doctor.py`
* `apps/infra-local/boxlite_local/execwrap.py`
* `apps/infra-local/boxlite_local/orchestrator.py`
* `apps/infra-local/boxlite_local/services.py`
* `apps/infra-local/boxlite_local/types.py`
* `apps/infra-local/configs/minio/init.sh`
* `apps/infra-local/goal.md`
* `apps/infra-local/pyproject.toml`
* `apps/infra-local/scripts/_stack-common.sh`
* `apps/infra-local/scripts/apply-schema.sh`
* `apps/infra-local/scripts/build-all-in-one-sql.py`
* `apps/infra-local/scripts/seed-init-data.sh`
* `apps/infra-local/scripts/stack-build.sh`
* `apps/infra-local/scripts/stack-down.sh`
* `apps/infra-local/scripts/stack-logs.sh`
* `apps/infra-local/scripts/stack-reset.sh`
* `apps/infra-local/scripts/stack-restart.sh`
* `apps/infra-local/scripts/stack-status.sh`
* `apps/infra-local/scripts/stack-up.sh`
* `apps/infra-local/sql/merged-schema.auto-gen.sql`
* `apps/infra-local/tests/__init__.py`
* `apps/infra-local/tests/integration/__init__.py`
* `apps/infra-local/tests/integration/test_e2e_full.py`
* `apps/infra-local/tests/integration/test_multi_service.py`
* `apps/infra-local/tests/unit/__init__.py`
* `apps/infra-local/tests/unit/test_config.py`
* `apps/infra-local/tests/unit/test_doctor_lsof.py`
* `apps/infra-local/tests/unit/test_orchestrator.py`
* `apps/infra-local/tests/unit/test_topo.py`
* `apps/package.json`
* `apps/runner/pkg/boxlite/registry.go`
* `docs/apps/infra-local-usage.md`
* `libs`
* `tsconfig.base.json`

</details>

</details>

coderabbitai · 2026-06-07T07:14:00Z

+    p_up.add_argument("services", nargs="*", help="Subset of services (default: all)")
+    p_up.add_argument("--skip-doctor", action="store_true", help="Bypass preflight checks")
+
+    p_down = sub.add_parser("down", help="Stop + remove services.")
+    p_down.add_argument("services", nargs="*", help="Subset of services (default: all)")
+    p_down.add_argument("--wipe", action="store_true", help="Also remove the data dir")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown service names at parse-time instead of silently succeeding.

up/down currently accept arbitrary names; typos can return success while doing nothing. Validate against SERVICES in argparse so bad input fails immediately.

Proposed fix

- p_up.add_argument("services", nargs="*", help="Subset of services (default: all)") + p_up.add_argument( + "services", + nargs="*", + choices=sorted(SERVICES.keys()), + help="Subset of services (default: all)", + ) ... - p_down.add_argument("services", nargs="*", help="Subset of services (default: all)") + p_down.add_argument( + "services", + nargs="*", + choices=sorted(SERVICES.keys()), + help="Subset of services (default: all)", + )

Based on learnings: “Validate inputs at the boundary” and “Use explicit error handling: fail fast on invalid inputs.”

Also applies to: 75-79

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/boxlite_local/cli.py` around lines 30 - 35, The CLI currently lets p_up and p_down accept any service names and silently ignore typos; update the argument parsing to validate the "services" values against the canonical SERVICES list at parse-time and raise a parser error for any unknown names. Change the add_argument call for "services" (both p_up and p_down) to use a custom type or argparse Action that checks each value is in SERVICES (or use choices=SERVICES) so argparse fails fast; ensure the same validation logic is applied to the other parser(s) referenced around lines 75-79 so all entry points reject invalid service names immediately.

Source: Learnings

coderabbitai · 2026-06-07T07:14:00Z

+    proc = subprocess.run(
+        ["lsof", "-nP", f"-iTCP:{port}", "-sTCP:LISTEN", "-F", "pcLn"],
+        capture_output=True,
+        text=True,
+        check=False,
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Inspect the target file around the reported lines FILE="apps/infra-local/boxlite_local/doctor.py" echo "=== Context: $FILE (lines 90-170) ===" nl -ba "$FILE" | sed -n '90,170p' # 2) Find the definition of check_port_free (and surrounding logic) echo echo "=== Grep: check_port_free in $FILE ===" rg -n "def check_port_free|check_port_free\(" "$FILE" || true # 3) Inspect doctor() and any higher-level timeouts/cancellation/strict handling echo echo "=== Grep: doctor( and strict usage in $FILE ===" rg -n "def doctor\(|strict\s*=\s*True|doctor\(" "$FILE" || true # 4) Search repository for where doctor(strict=True) is called (to confirm blocking path) echo echo "=== Repo search: doctor(strict=True) ===" rg -n "doctor\(\s*.*strict\s*=\s*True|doctor\(\s*strict\s*=\s*True|strict\s*=\s*True.*doctor\(" . || true # 5) Search for stack-up / orchestrator call path around doctor invocation echo echo "=== Repo search: stack-up and doctor usage ===" rg -n "stack-up|stack_up|stackup|orchestrator|startup.*doctor|doctor.*stack" apps/infra-local/boxlite_local . || true

Repository: boxlite-ai/boxlite

Length of output: 196

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="apps/infra-local/boxlite_local/doctor.py" echo "=== Context: $FILE (lines 90-170) ===" awk 'NR>=90 && NR<=170 {printf "%6d %s\n", NR, $0}' "$FILE" || true echo echo "=== Grep: check_port_free in $FILE ===" rg -n "def check_port_free|check_port_free\(" "$FILE" || true echo echo "=== Grep: doctor() and strict usage in $FILE ===" rg -n "def doctor\(|strict\s*=\s*True|doctor\(" "$FILE" || true echo echo "=== Repo search: doctor(strict=True) ===" rg -n "doctor\(\s*.*strict\s*=\s*True|strict\s*=\s*True.*doctor\(" . || true echo echo "=== Repo search: stack-up / orchestrator call path around doctor ===" rg -n "stack-up|stack_up|stackup|orchestrator|startup.*doctor|doctor.*stack" apps/infra-local/boxlite_local . || true

Repository: boxlite-ai/boxlite

Length of output: 12340

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE_DOCTOR="apps/infra-local/boxlite_local/doctor.py" FILE_ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== doctor.py: context 160-260 ===" awk 'NR>=150 && NR<=260 {printf "%6d %s\n", NR, $0}' "$FILE_DOCTOR" || true echo echo "=== doctor.py: subprocess.run occurrences ===" rg -n "subprocess\.run$" "$FILE_DOCTOR" || true echo echo "=== orchestrator.py: context around doctor(strict=True) ===" awk 'NR>=330 && NR<=430 {printf "%6d %s\n", NR, $0}' "$FILE_ORCH" || true echo echo "=== orchestrator.py: subprocess usage (to see if there is generic timeouts) ===" rg -n "subprocess\.(run|check_output|Popen)\(" "$FILE_ORCH" || true

Repository: boxlite-ai/boxlite

Length of output: 6041

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Repo search: wait_for / timeout around up()/doctor() ===" rg -n "asyncio\.wait_for|wait_for\(|timeout[_a-zA-Z]*=.*doctor|doctor\(.*$.*timeout|signal\.alarm|setitimer" apps/infra-local/boxlite_local . || true echo echo "=== Repo search: CLI entry points calling orchestrator.up ===" rg -n "orchestrator\.up|await up\(|up\(config" apps/infra-local/boxlite_local/cli.py apps/infra-local/boxlite_local . || true echo echo "=== Repo search: any timeout in scripts/stack-up.sh ===" rg -n "timeout|kill|alarm" apps/infra-local/scripts/stack-up.sh apps/infra-local/scripts/*.sh apps/infra-local/README.md || true

Repository: boxlite-ai/boxlite

Length of output: 7777

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE_DOCTOR="apps/infra-local/boxlite_local/doctor.py" echo "=== doctor.py: imports/context (1-120) ===" awk 'NR>=1 && NR<=130 {printf "%6d %s\n", NR, $0}' "$FILE_DOCTOR"

Repository: boxlite-ai/boxlite

Length of output: 5332

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="apps/infra-local/boxlite_local/types.py" awk 'NR>=1 && NR<=140 {printf "%6d %s\n", NR, $0}' "$FILE"

Repository: boxlite-ai/boxlite

Length of output: 2764

Add a timeout to the lsof preflight port check to prevent hangs.

check_port_free() runs subprocess.run(["lsof", ...]) without a timeout, and doctor(..., strict=True) is awaited from orchestrator.up(), so a stalled lsof blocks the preflight and stops stack-up from progressing.

🛠️ Proposed fix

- proc = subprocess.run( - ["lsof", "-nP", f"-iTCP:{port}", "-sTCP:LISTEN", "-F", "pcLn"], - capture_output=True, - text=True, - check=False, - ) + try: + proc = subprocess.run( + ["lsof", "-nP", f"-iTCP:{port}", "-sTCP:LISTEN", "-F", "pcLn"], + capture_output=True, + text=True, + check=False, + timeout=5, + ) + except subprocess.TimeoutExpired: + return DoctorCheck( + name=name, + severity=Severity.FAIL, + msg=f"lsof timed out while checking port {port}", + hint="Retry after closing heavy file/network scans; cannot verify port conflict while lsof is hung.", + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

proc = subprocess.run(

["lsof", "-nP", f"-iTCP:{port}", "-sTCP:LISTEN", "-F", "pcLn"],

capture_output=True,

text=True,

check=False,

)

try:

proc = subprocess.run(

["lsof", "-nP", f"-iTCP:{port}", "-sTCP:LISTEN", "-F", "pcLn"],

capture_output=True,

text=True,

check=False,

timeout=5,

)

except subprocess.TimeoutExpired:

return DoctorCheck(

name=name,

severity=Severity.FAIL,

msg=f"lsof timed out while checking port {port}",

hint="Retry after closing heavy file/network scans; cannot verify port conflict while lsof is hung.",

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/boxlite_local/doctor.py` around lines 121 - 126, The lsof call in check_port_free() can hang; add a timeout (e.g. timeout=5) to the subprocess.run(...) invocation and catch subprocess.TimeoutExpired around that call (the call in check_port_free used by doctor() and orchestrator.up()). On TimeoutExpired, log or surface a clear error via the same logger used by doctor() / orchestrator (or return False) so the preflight fails fast instead of blocking stack-up; ensure the handler preserves existing behavior for other non-zero results.

Source: Learnings

coderabbitai · 2026-06-07T07:14:00Z

+    await asyncio.gather(
+        drain(execution.stdout(), out_parts),
+        drain(execution.stderr(), err_parts),
+    )
+    result = await execution.wait()
+    return result.exit_code, "".join(out_parts), "".join(err_parts)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify call sites and whether timeout wrappers already exist. rg -n "exec_collect\(" apps/infra-local -C2 rg -n "asyncio\.wait_for|asyncio\.timeout" apps/infra-local -C2

Repository: boxlite-ai/boxlite

Length of output: 6330

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect exec_collect implementation closely sed -n '1,220p' apps/infra-local/boxlite_local/execwrap.py | nl -ba | sed -n '1,220p' # Look for timeout/cancellation handling inside execwrap.py rg -n "timeout_s|wait_for|CancelledError|cancel|terminate|kill|finally" apps/infra-local/boxlite_local/execwrap.py # Check for any exec_collect signature/kwargs containing timeout at call sites rg -n "exec_collect\([^)]*timeout|timeout_s\s*=" apps/infra-local -S

Repository: boxlite-ai/boxlite

Length of output: 105

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="apps/infra-local/boxlite_local/execwrap.py" # Show exec_collect region (avoid huge output) echo "=== execwrap.py (first 220 lines with numbers) ===" cat -n "$FILE" | sed -n '1,220p' echo "=== execwrap.py (lines around exec_collect) ===" # Find line number of `def exec_collect` LINE=$(rg -n "async def exec_collect" "$FILE" | head -n 1 | cut -d: -f1) START=$((LINE-30)) END=$((LINE+140)) cat -n "$FILE" | sed -n "${START},${END}p" echo "=== Search for timeout/cancellation handling in execwrap.py ===" rg -n "timeout_s|wait_for|CancelledError|cancel|terminate|kill" "$FILE"

Repository: boxlite-ai/boxlite

Length of output: 3637

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="apps/infra-local/boxlite_local/execwrap.py" # Show exec_collect region using safe line bounds LINE=$(rg -n "async def exec_collect" "$FILE" | head -n 1 | cut -d: -f1) START=$((LINE-10)) if [ "$START" -lt 1 ]; then START=1; fi END=$((LINE+120)) echo "=== execwrap.py: exec_collect vicinity ($START..$END) ===" cat -n "$FILE" | sed -n "${START},${END}p" echo "=== Search for timeout/cancellation handling in execwrap.py ===" rg -n "timeout_s|wait_for|CancelledError|cancel|terminate|kill" "$FILE" || true echo "=== Search for any cleanup/finally usage around exec_collect ===" rg -n "finally|except" "$FILE" || true

Repository: boxlite-ai/boxlite

Length of output: 1788

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find where Execution and box.exec are implemented rg -n "class Execution|async def wait$|def wait\(|stdout\($|stderr|async def stdout|async def stderr|async def exec\(" apps/infra-local -S # Narrow on Execution.wait implementation rg -n "async def wait\(" apps/infra-local -S # Narrow on box.exec / "def exec" implementation rg -n "async def exec\(|def exec\(" apps/infra-local -S

Repository: boxlite-ai/boxlite

Length of output: 865

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find all exec_collect call sites in the repo rg -n "exec_collect\(" . -S # Find where Execution is implemented (outside tests) rg -n "class Execution\b|async def wait\(|def wait\(|async def stdout\(|async def stderr\(|def stdout\(|def stderr\(" . -S

Repository: boxlite-ai/boxlite

Length of output: 2362

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find all exec_collect call sites in the repo rg -n "exec_collect\(" . -S # Find where Execution is implemented (outside tests) rg -n "class Execution\b|async def wait\(|def wait\(|async def stdout\(|async def stderr\(|def stdout\(|def stderr\(" . -S

Repository: boxlite-ai/boxlite

Length of output: 2362

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect SDK execution/wait implementations sed -n '150,320p' sdks/python/boxlite/orchestration/box_runtime.py | cat -n echo "----" sed -n '150,320p' sdks/python/boxlite/sync_api/_execution.py | cat -n echo "----" sed -n '120,240p' sdks/python/boxlite/interactivebox.py | cat -n echo "---- Searches for cancellation/termination in SDK files ----" rg -n "CancelledError|cancelled|terminate|kill|shutdown" sdks/python/boxlite -S

Repository: boxlite-ai/boxlite

Length of output: 28638

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect SDK execution/wait implementations echo "=== sdks/python/boxlite/orchestration/box_runtime.py (approx. 150-320) ===" sed -n '150,320p' sdks/python/boxlite/orchestration/box_runtime.py | cat -n echo "=== sdks/python/boxlite/sync_api/_execution.py (approx. 150-320) ===" sed -n '150,320p' sdks/python/boxlite/sync_api/_execution.py | cat -n echo "=== sdks/python/boxlite/interactivebox.py (approx. 120-240) ===" sed -n '120,240p' sdks/python/boxlite/interactivebox.py | cat -n echo "=== Searches for cancellation/termination in SDK files ===" rg -n "CancelledError|cancelled|terminate|kill|shutdown" sdks/python/boxlite -S

Repository: boxlite-ai/boxlite

Length of output: 28834

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate the concrete async Execution implementation (not just wrappers/tests) rg -n "class Execution\b" sdks/python/boxlite -S rg -n "async def wait\(" sdks/python/boxlite -S rg -n "def kill\(" sdks/python/boxlite -S # Inspect orchestration/box_runtime.py around the CancelledError locations we saw earlier FILE="sdks/python/boxlite/orchestration/box_runtime.py" echo "=== orchestration/box_runtime.py: context around CancelledError ===" rg -n "CancelledError" "$FILE" -n || true # Print a couple windows around the first few matches (safe bounds) for ln in $(rg -n "CancelledError" "$FILE" | cut -d: -f1 | head -n 6); do START=$((ln-40)); [ "$START" -lt 1 ] && START=1 END=$((ln+40)) echo "---- $FILE ($START..$END) around CancelledError at line $ln ----" sed -n "${START},${END}p" "$FILE" | cat -n done

Repository: boxlite-ai/boxlite

Length of output: 44

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="sdks/python/boxlite/orchestration/box_runtime.py" echo "=== Top of $FILE ===" sed -n '1,120p' "$FILE" | cat -n echo "=== Lines containing CancelledError in $FILE ===" rg -n "CancelledError" "$FILE" || true echo "=== Lines containing _execution.kill() in $FILE ===" rg -n "_execution\.kill\(|execution\.kill\(" "$FILE" || true echo "=== Lines containing def/async def wait in sdks/python/boxlite ===" rg -n "async def wait$|def wait\(" sdks/python/boxlite -S echo "=== Grep for kill/terminate/subprocess cleanup related to Execution.wait ===" rg -n "kill\(|terminate\(|SIGTERM|SIGKILL|subprocess" sdks/python/boxlite -S

Repository: boxlite-ai/boxlite

Length of output: 6631

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect orchestrator timeout handling ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== orchestrator.py around asyncio.wait_for(exec_collect) ===" rg -n "wait_for\\(\\s*exec_collect\\(|exec_collect\\(box, cmd" "$ORCH" # Print relevant ranges manually (safe) sed -n '230,310p' "$ORCH" | cat -n echo "----" sed -n '320,390p' "$ORCH" | cat -n # Inspect SDK for the concrete Execution implementation (beyond sync wrappers) echo "=== Search sdks/python/boxlite for Execution implementation details ===" rg -n "async def wait\\(" sdks/python/boxlite -S rg -n "def kill\\(" sdks/python/boxlite -S rg -n "class .*Execution" sdks/python/boxlite -S rg -n "execution\\.kill|_execution\\.kill|\\.kill\\(\$" sdks/python/boxlite -S | head -n 50 # Identify likely file containing async wait()/kill: print those files if found for f in $(rg -l "async def wait\$|def kill\\(" sdks/python/boxlite -S); do echo "---- file: $f ----" head -n 5 "$f" done

Repository: boxlite-ai/boxlite

Length of output: 8193

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect orchestrator timeout handling ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== orchestrator.py around asyncio.wait_for(exec_collect) ===" rg -n "wait_for\\(\\s*exec_collect\\(|exec_collect\\(box, cmd" "$ORCH" # Print relevant ranges manually (safe) sed -n '230,310p' "$ORCH" | cat -n echo "----" sed -n '320,390p' "$ORCH" | cat -n # Inspect SDK for the concrete Execution implementation (beyond sync wrappers) echo "=== Search sdks/python/boxlite for Execution implementation details ===" rg -n "async def wait\\(" sdks/python/boxlite -S rg -n "def kill\\(" sdks/python/boxlite -S rg -n "class .*Execution" sdks/python/boxlite -S rg -n "execution\\.kill|_execution\\.kill|\\.kill\\(\$" sdks/python/boxlite -S | head -n 50 # Identify likely file containing async wait()/kill: print those files if found for f in $(rg -l "async def wait\\(|def kill\\(" sdks/python/boxlite -S); do echo "---- file: $f ----" head -n 5 "$f" done

Repository: boxlite-ai/boxlite

Length of output: 8193

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for usage of kill on execution objects across repo rg -n "\.kill\(" apps/infra-local/boxlite_local sdks/python/boxlite -S # Find where exec returns an Execution and see its interface (wait/stdout/stderr/kill) rg -n "return .*Execution|-> .*Execution|class .*Execution" sdks/python/boxlite -S # Look for ".stdout(" / ".stderr(" / ".wait()" / ".kill()" patterns in the SDK exec implementation rg -n "exec\\(|def exec\\(|async def exec\\(" sdks/python/boxlite -S | head -n 50

Repository: boxlite-ai/boxlite

Length of output: 5169

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for usage of kill on execution objects across repo rg -n "\.kill\(" apps/infra-local/boxlite_local sdks/python/boxlite -S # Find where exec returns an Execution and see its interface (wait/stdout/stderr/kill) rg -n "return .*Execution|-> .*Execution|class .*Execution" sdks/python/boxlite -S # Look for where async/sync exec is defined to locate the native Execution object rg -n "async def exec\\(|def exec\\(" sdks/python/boxlite -S | head -n 50

Repository: boxlite-ai/boxlite

Length of output: 652

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for usage of kill on execution objects across repo rg -n "\.kill\(" apps/infra-local/boxlite_local sdks/python/boxlite -S # Find where exec returns an Execution and see its interface (wait/stdout/stderr/kill) rg -n "return .*Execution|-> .*Execution|class .*Execution" sdks/python/boxlite -S # Locate where exec is defined (to find the native Execution object interface) rg -n "async def exec\\(|def exec\\(" sdks/python/boxlite -S | head -n 50

Repository: boxlite-ai/boxlite

Length of output: 652

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect async exec implementation in simplebox.py (to see Execution interface/type) FILE="sdks/python/boxlite/simplebox.py" echo "=== simplebox.py: async def exec vicinity ===" rg -n "async def exec\\(" "$FILE" LINE=$(rg -n "async def exec\\(" "$FILE" | head -n 1 | cut -d: -f1) START=$((LINE-40)); [ "$START" -lt 1 ] && START=1 END=$((LINE+140)) sed -n "${START},${END}p" "$FILE" | cat -n # Inspect sync_api/_execution.py for wait/kill interface FILE="sdks/python/boxlite/sync_api/_execution.py" echo "=== sync_api/_execution.py: kill + wait interface vicinity ===" rg -n "def kill\\(|def wait\\(|async def wait\\(" "$FILE" sed -n '200,320p' "$FILE" | cat -n # Find non-sync Execution class definitions (broader regex) rg -n "class .*Execution" sdks/python/boxlite -S # Search for ".kill()" usage outside orchestration/box_runtime.py to infer native execution handling rg -n "\.kill\(" sdks/python/boxlite -S

Repository: boxlite-ai/boxlite

Length of output: 10186

🏁 Script executed:

#!/bin/bash set -euo pipefail ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== orchestrator.py: show imports/top around runtime/get_runtime/exec_collect usage ===" sed -n '1,120p' "$ORCH" | cat -n echo "=== orchestrator.py: find where runtime.get(name) is used and what box type it is ===" rg -n "runtime\.get\(|create_box\(|ManagedBox|SimpleBox|box\.exec\(" "$ORCH" -S echo "=== orchestrator.py: show around one-shot bootstrap usage of runtime.get and exec_collect(box, 'true') ===" # locate the function containing exec_collect(box, "true" LINE=$(rg -n "exec_collect\\(box, \"true\"" "$ORCH" | head -n1 | cut -d: -f1) START=$((LINE-60)); [ "$START" -lt 1 ] && START=1 END=$((LINE+60)) sed -n "${START},${END}p" "$ORCH" | cat -n echo "=== execwrap.py: search for kill/terminate usage around exec_collect ===" FILE="apps/infra-local/boxlite_local/execwrap.py" rg -n "kill|terminate|finally|CancelledError" "$FILE" -S

Repository: boxlite-ai/boxlite

Length of output: 10688

🏁 Script executed:

#!/bin/bash set -euo pipefail ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== orchestrator.py: show imports/top around runtime/get_runtime/exec_collect usage ===" sed -n '1,120p' "$ORCH" | cat -n echo "=== orchestrator.py: find where runtime.get(name) is used and what box type it is ===" rg -n "runtime\.get\(|create_box\(|ManagedBox|SimpleBox|box\.exec\(" "$ORCH" -S echo "=== orchestrator.py: show around one-shot bootstrap usage of runtime.get and exec_collect(box, 'true') ===" # locate the function containing exec_collect(box, "true" LINE=$(rg -n "exec_collect\\(box, \"true\"" "$ORCH" | head -n1 | cut -d: -f1) START=$((LINE-60)); [ "$START" -lt 1 ] && START=1 END=$((LINE+60)) sed -n "${START},${END}p" "$ORCH" | cat -n echo "=== execwrap.py: search for kill/terminate usage around exec_collect ===" FILE="apps/infra-local/boxlite_local/execwrap.py" rg -n "kill|terminate|finally|CancelledError" "$FILE" -S || true

Repository: boxlite-ai/boxlite

Length of output: 10688

🏁 Script executed:

#!/bin/bash set -euo pipefail ORCH="apps/infra-local/boxlite_local/orchestrator.py" echo "=== orchestrator.py: show imports/top ===" sed -n '1,120p' "$ORCH" | cat -n echo "=== orchestrator.py: find where runtime.get(...) is used and what box type it is ===" rg -n "runtime\.get\(|create_box\(|ManagedBox|SimpleBox|box\.exec\(" "$ORCH" -S || true echo "=== orchestrator.py: show around one-shot bootstrap probe using exec_collect(box, \"true\", ...) ===" LINE=$(rg -n "exec_collect\\(box, \"true\"" "$ORCH" | head -n1 | cut -d: -f1) START=$((LINE-60)); [ "$START" -lt 1 ] && START=1 END=$((LINE+60)) sed -n "${START},${END}p" "$ORCH" | cat -n echo "=== execwrap.py: search for kill/terminate/cancellation handling ===" FILE="apps/infra-local/boxlite_local/execwrap.py" rg -n "kill|terminate|finally|CancelledError" "$FILE" -S || true

Repository: boxlite-ai/boxlite

Length of output: 10629

Add an execution timeout to exec_collect (and ensure it stops the underlying process) to prevent orchestration hangs.

apps/infra-local/boxlite_local/execwrap.py’s exec_collect drains stdout/stderr and then does result = await execution.wait() with no time bound. Even though _wait_healthy_exec(...) wraps exec_collect in asyncio.wait_for(..., timeout=hc.timeout_s), _wait_one_shot_exit(...) awaits exec_collect(box, "true", []) directly (no per-exec timeout), so a stuck command can block the orchestration loop beyond the loop’s timeout_s.

exec_collect should accept a timeout and propagate it to box.exec(..., timeout=...) (the SDK supports a timeout arg) or otherwise guarantee the underlying execution is killed on timeout.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/boxlite_local/execwrap.py` around lines 34 - 39, exec_collect currently awaits execution.wait() with no bound; add a timeout parameter to exec_collect (e.g., timeout: Optional[float] = None), pass it through to the SDK call that creates the execution (box.exec(..., timeout=timeout)) and also guard the final wait with asyncio.wait_for(execution.wait(), timeout) so the underlying process is killed on timeout (catch asyncio.TimeoutError, call execution.kill() or execution.terminate() and await execution.wait() to ensure cleanup, then re-raise or return a clear timeout result). Update callers such as _wait_one_shot_exit to pass an appropriate timeout (or None) when invoking exec_collect and keep _wait_healthy_exec behavior unchanged if it already wraps exec_collect in asyncio.wait_for.

Source: Learnings

coderabbitai · 2026-06-07T07:14:00Z

+    for name, spec in services.items():
+        ts.add(name, *spec.depends_on)
+    ts.prepare()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate depends_on entries before graph construction.

Line 32 accepts dependency names that are not keys in services. Those unknown nodes can appear in topo output, then Line 386/Line 401 will crash on services[n] with KeyError.

Proposed fix

def topo_sort(services: dict[str, ServiceSpec]) -> list[list[str]]: @@ - ts: TopologicalSorter[str] = TopologicalSorter() + unknown_deps = sorted( + { + dep + for spec in services.values() + for dep in spec.depends_on + if dep not in services + } + ) + if unknown_deps: + raise ValueError(f"Unknown service dependency(ies): {', '.join(unknown_deps)}") + + ts: TopologicalSorter[str] = TopologicalSorter()

coderabbitai · 2026-06-07T07:14:00Z

+[ -L "${APPS_DIR}/.env" ] || ln -sf api/.env "${APPS_DIR}/.env"
+# apps/apps → . (webpack.config.js path resolution quirk)
+[ -L "${APPS_DIR}/apps" ] || ln -sf . "${APPS_DIR}/apps"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid clobbering a real apps/.env with forced symlink creation.

Line 74 uses ln -sf, which will overwrite an existing non-symlink file. That can destroy local developer config unexpectedly. Guard for existing non-symlink paths and fail with a clear message instead of force-replacing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/scripts/stack-up.sh` around lines 74 - 76, The current use of ln -sf to create "${APPS_DIR}/.env" and "${APPS_DIR}/apps" will clobber existing non-symlink files; update the script to first test each target: if the path does not exist create the symlink with ln -s, if it exists and is a symlink leave it alone, and if it exists but is not a symlink print a clear error and exit non-zero so we don't overwrite developer files; apply this logic for the ".env" and "apps" targets (references: APPS_DIR, "${APPS_DIR}/.env", "${APPS_DIR}/apps", and the ln -s invocation).

coderabbitai · 2026-06-07T07:14:00Z

+  if port_listening "${PORT_API}"; then
+    warn "port ${PORT_API} already in use — killing prior listener"
+    lsof -ti :${PORT_API} -sTCP:LISTEN | xargs -r kill -9 2>/dev/null || true
+    sleep 1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t blindly kill -9 any process listening on these ports.

These blocks terminate whatever is bound to the port, even if it’s unrelated to this stack. This can kill other local services and lose work. Limit termination to stack-owned PIDs (from pidfiles / command validation), and use TERM with a short wait before KILL fallback.

Also applies to: 126-129, 155-158, 185-188

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 87-87: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/scripts/stack-up.sh` around lines 85 - 88, The current port cleanup blindly runs lsof/... | xargs kill -9 which can terminate unrelated processes; update the port cleanup in the blocks that call port_listening and warn to: prefer finding stack-owned PID(s) first (check a corresponding pidfile if one exists, or filter lsof/ps output by a known command pattern for this stack using pgrep -f or lsof -c), send SIGTERM to those PIDs, wait a short period (e.g. poll for port closure up to a few seconds), and only use SIGKILL (kill -9) as a fallback if the port is still held; replace direct lsof ... | xargs -r kill -9 with a safe function that accepts the port (used where port_listening is called) and implements pidfile/command validation, TERM then KILL fallback, and preserves the existing logging via warn.

coderabbitai · 2026-06-07T07:14:01Z

+    asyncio.run(setup())
+    try:
+        yield cfg
+    finally:
+        asyncio.run(teardown())
+        mp.undo()
+        shutil.rmtree(tmp, ignore_errors=True)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guarantee fixture cleanup when setup skips or errors.

Line 83 runs setup() before entering the current try/finally. If setup raises pytest.skip (Line 63) or another exception, Lines 87-90 never run, so the env patch/temp dir can leak into subsequent tests.

💡 Proposed fix

- asyncio.run(setup()) - try: - yield cfg - finally: - asyncio.run(teardown()) - mp.undo() - shutil.rmtree(tmp, ignore_errors=True) + try: + asyncio.run(setup()) + yield cfg + finally: + try: + asyncio.run(teardown()) + finally: + mp.undo() + shutil.rmtree(tmp, ignore_errors=True)

Based on learnings: "Close and release files, sockets, clients, browser handles, and subprocesses explicitly."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/infra-local/tests/integration/test_e2e_full.py` around lines 83 - 90, The fixture currently calls asyncio.run(setup()) outside the try, so if setup raises (including pytest.skip) the finally cleanup (asyncio.run(teardown()), mp.undo(), shutil.rmtree(tmp)) never runs; change the fixture to call setup() inside a try block and track whether setup completed (e.g., setup_done flag set to True after asyncio.run(setup()) returns) then in the finally always perform mp.undo() and shutil.rmtree(tmp, ignore_errors=True) and only call asyncio.run(teardown()) if setup_done (or ensure teardown is safe to call unconditionally); ensure pytest.skip exceptions are allowed to propagate but cleanup still runs.

Source: Learnings

Copilot

Pull request overview

Copilot reviewed 39 out of 43 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

tsconfig.base.json:1

This appears to be a committed symlink (or symlink-like pointer) rather than a valid JSON tsconfig. Git symlinks can break on Windows checkouts and some tooling that expects tsconfig.base.json to be parseable JSON. Consider replacing this with a real root tsconfig.base.json (or a small JSON file that extends the apps/tsconfig.base.json) to avoid cross-platform and toolchain issues; apply the same approach to the root-level libs pointer as well.
apps/infra-local/tests/integration/test_e2e_full.py:1
asyncio.run(setup()) happens before the try/finally. If setup() triggers pytest.skip(...) or raises/asserts, the fixture exits before entering the finally, so mp.undo() and shutil.rmtree(tmp) won’t run (leaking env vars and temp dirs). Move the setup() call inside the try so cleanup runs even when setup skips/fails.

+      <PostHogProvider
+        apiKey="phc_local_dev_no_op"
+        options={{
+          // api_host is required by posthog-js but never used because
+          // capturing is opted out and flags are bootstrapped below.
+          api_host: 'https://localhost.invalid',
+          bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
+          advanced_disable_feature_flags: true, // skip /decide network call
+          autocapture: false,
+          capture_pageview: false,
+          capture_pageleave: false,
+          opt_out_capturing_by_default: true,
+          disable_session_recording: true,
+          loaded: (posthog) => posthog.opt_out_capturing(),
+        }}


+if [ "$1" = "all" ]; then
+  # `tail -f` multiple files at once; each line gets ==> filename <== headers.
+  exec tail -F "${LOGS_DIR}"/*.log
+fi


+_MINIO_INIT_SCRIPT = """\
+set -eu
+for i in 1 2 3 4 5; do
+  if mc alias set boxlite "$MINIO_URL" "$MINIO_USER" "$MINIO_PASSWORD" 2>/dev/null; then break; fi
+  echo "init: minio not ready yet (attempt $i)"
+  sleep 2
+done
+mc alias set boxlite "$MINIO_URL" "$MINIO_USER" "$MINIO_PASSWORD"
+mc mb --ignore-existing boxlite/boxlite
+echo "init: ok - boxlite bucket ready"
+"""


lilongen · 2026-06-09T03:42:41Z

This push does three things:

Scope split — moved changes not directly related to infra-local out into their own separate PRs (already removed from this PR's diff after the rebase).
DB schema setup — switched from pre-merging all migrations into a single all-in-one SQL baseline to running the migration scripts directly, matching apps/api's real migration path.
Pinned typeorm — pinned typeorm to 0.3.28 to avoid the orderBy + limit regression in 0.3.29.

… closed) The local-dev PostHog bootstrap (forcing the 6 UI/route-gating flags to true when PostHog isn't configured) could fail OPEN in any prod/staging deploy that ships without POSTHOG_API_KEY: sst.config.ts injects the key only conditionally, and an absent key made the API expose gated routes (POST /regions, GET /runners) and the dashboard render gated surfaces (create-sandbox, infrastructure, etc.) that should stay hidden. Gate the bootstrap behind a hard production check at all three spots: - app.module.ts: skip passing bootstrapFlags when NODE_ENV==='production' - openfeature-posthog.provider.ts: defense-in-depth — refuse bootstrapFlags in production so the flag-resolution boundary itself can't fail open regardless of how it's wired; unconfigured prod falls through to each call-site's defaultValue (false) - PostHogProviderWrapper.tsx: in a prod build (import.meta.env.PROD) with no PostHog config, render children without a provider so flags resolve falsy infra-local is unaffected: NODE_ENV=development / vite dev (PROD=false) still get the bootstrap. Verified the running stack — dashboard still renders Create Sandbox after the gate; the three files type-check clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 40 out of 44 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (4)

apps/infra-local/scripts/stack-up.sh:1

xargs -r is GNU-specific and isn’t supported by macOS/BSD xargs, which will print an error (and can break the UX of these scripts). Replace the pipeline with a bash-level empty-check (capture PIDs from lsof -ti, then kill only if non-empty), or use a portable xargs pattern that won’t error on empty input.
apps/infra-local/scripts/stack-up.sh:1
xargs -r is GNU-specific and isn’t supported by macOS/BSD xargs, which will print an error (and can break the UX of these scripts). Replace the pipeline with a bash-level empty-check (capture PIDs from lsof -ti, then kill only if non-empty), or use a portable xargs pattern that won’t error on empty input.
apps/infra-local/scripts/stack-up.sh:1
xargs -r is GNU-specific and isn’t supported by macOS/BSD xargs, which will print an error (and can break the UX of these scripts). Replace the pipeline with a bash-level empty-check (capture PIDs from lsof -ti, then kill only if non-empty), or use a portable xargs pattern that won’t error on empty input.
apps/infra-local/scripts/stack-up.sh:1
xargs -r is GNU-specific and isn’t supported by macOS/BSD xargs, which will print an error (and can break the UX of these scripts). Replace the pipeline with a bash-level empty-check (capture PIDs from lsof -ti, then kill only if non-empty), or use a portable xargs pattern that won’t error on empty input.

+if [ "$1" = "all" ]; then
+  # `tail -f` multiple files at once; each line gets ==> filename <== headers.
+  exec tail -F "${LOGS_DIR}"/*.log
+fi


+        }}
+      >
+        {children}
+      </PostHogProvider>


+stack-reset-hard:      ## like stack-reset, but also re-applies prod schema baseline
+	./scripts/stack-reset.sh --hard


Fresh clones have no apps/api/.env (gitignored — prod holds real secrets there), so stack-up.sh's unconditional `apps/.env -> api/.env` symlink dangled and `. ./.env` failed, leaving the L2 API unable to start. Ship configs/api.env (all deterministic local-stack placeholders, no real secrets) and seed apps/api/.env from it on first stack-up when missing, without clobbering a dev's edits.

…y-required Verified end-to-end that infra-local deploys without it: cold stack-up → 10/10 L1 boxes healthy → API healthy → seed completes (snapshot active; seed-init-data.sh seeds via psql, not the flag-gated routes) → POST /api/sandbox → microVM started. The control plane never consults the feature flags — only flag-gated dashboard UI and admin routes do. Dropped: - openfeature-posthog.provider.ts bootstrap support (reverted to main) and its spec, which only tested the bootstrap gate - the bootstrapFlags injection block in app.module.ts - the dashboard PostHogProviderWrapper local-dev defaults (reverted to main) Kept (deliberately not reverted): - the DB pool statement_timeout/query_timeout wiring in app.module.ts + configuration.ts — consumed by stack-up.sh's DB_*_TIMEOUT_MS exports (microVM↔pg wedge fix) - apps/package.json typeorm 0.3.28 pin (quoted-orderBy 0.3.29 trigger still live in sandbox.manager.ts) and @types/node-forge Behavior change in local dev (documented in infra-local-usage.md): without a configured PostHog, flag-gated surfaces fail closed — the dashboard "Create Sandbox" button and admin routes like POST /api/regions stay hidden. API sandbox creation is unaffected.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Reverts configuration.ts + app.module.ts to their main (merge-base e4bbd0e) versions and removes the now-stranded DB_QUERY_TIMEOUT_MS / DB_STATEMENT_TIMEOUT_MS exports from stack-up.sh. apps/api now differs from main only by the webpack/tsconfig build fixes — verified earlier today by a full cold deploy on exactly this app-file state (10/10 L1 healthy → API healthy → seed → sandbox started). Accepted trade-off (dev-only, documented in services.py): without the client-side query_timeout, a query wedged on the microVM↔pg transport can hold its API pool slot until `make stack-restart COMPONENTS=api`. The server-side caps stay in the pg box config (statement_timeout, idle_in_transaction_session_timeout, tcp_keepalives), which reap the orphaned backend. Root cause is the transport silently wedging a stream — to be filed against src/; the env knobs can return as a standalone main PR if prod wants them as general hygiene.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

…ags instead Reverts apps/api/webpack.config.js to its main (merge-base e4bbd0e) version. The local-dev plugin filtering (GeneratePackageJsonPlugin + ForkTsCheckerWebpackPlugin by constructor-name string match) is replaced by the executors' own documented options, passed at the single local call site in stack-up.sh: nx serve api --buildTargetOptions.generatePackageJson=false \ --buildTargetOptions.skipTypeChecking=true (@nx/js:node serve forwards buildTargetOptions to the @nx/webpack:webpack build target; the JSON-string form fails Nx's schema validation, so the dotted form is used.) CI/prod builds keep both plugins — this is invocation-scoped, not config-scoped. CONNECTIONS.md and infra-local-usage.md updated so no doc instructs a bare `nx serve api`, which would crash in this workspace (yarn4 + gitignored lockfile leaves npm deps out of the Nx project graph). Verified: nx build api green with flags on the reverted config; API relaunched via the exact new invocation → /api/health 200; sandbox created on it → state `started` in ~5s. apps/api now differs from main only by the 2-line tsconfig.json extends fix.

Moves docs/apps/infra-local-usage.md into apps/infra-local/README.md as a "Usage guide" section so the component's docs live with the component (alongside CONNECTIONS.md). Content moved verbatim; headings demoted one level (code fences untouched), the one relative link rewritten for the new location, and the README's three links to the old path turned into internal anchors. No remaining references to the old path.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Signed-off-by: dorianzheng <8065637+DorianZheng@users.noreply.github.com>

Main's new api-client-drift workflow runs `yarn install --immutable` against the newly-committed apps/yarn.lock; the PR's merge-ref failed YN0028 because the branch's typeorm 0.3.28 pin + @types/node-forge weren't in that lock, and main's tar ^7.5.16 bump conflicted textually with the adjacent pin line. Manual content (everything else is main's side verbatim): - apps/package.json: union resolution — main's tar ^7.5.16 + the branch's typeorm 0.3.28 pin (0.3.29 quoted-orderBy regression guard) - apps/yarn.lock: regenerated from main's lock with the merged manifest (+24/-14: typeorm 0.3.28 + @types/node-forge entries, stale descriptors pruned) Verified: `yarn install --immutable` passes on the merged tree (the exact failed CI check); `nx build api --skip-nx-cache` green.

…l dependsOn serve's defaultConfiguration has pointed at api:build:development since the workspace import without that configuration existing. Add it (generatePackageJson=false — GeneratePackageJsonPlugin crashes in this workspace; skipTypeChecking=true — pre-existing type-only OTLP lockfile skew), and remove serve's dependsOn:["build"]: @nx/js:node builds its buildTarget itself, and the dependsOn forced a second bare-config build that hit the GeneratePackageJsonPlugin crash whenever the Nx graph was fresh (the main merge invalidated the stale cache that had been masking it; configurations don't propagate to plain-string dependsOn tasks, so the stack-up.sh flags could not reach it). Production-safe: apps/api/Dockerfile runs `nx build api --configuration=production` and never invokes serve; no path values touched. Verified: plain `nx serve api` on the merged tree reaches /api/health 200 (crashed before); POST /api/box creates a box.

…t/cloud-mvp

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings May 26, 2026 11:29

Copilot AI reviewed May 26, 2026

View reviewed changes

DorianZheng reviewed May 27, 2026

View reviewed changes

Comment thread apps/infra-local/sql/REFRESH.md Outdated

DorianZheng reviewed May 27, 2026

View reviewed changes

Comment thread docs/apps/infra-vs-local-infra.md Outdated

DorianZheng reviewed May 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 27, 2026 04:00

Copilot AI reviewed May 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 27, 2026 08:22

Copilot AI reviewed May 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 28, 2026 10:01

Copilot AI reviewed May 28, 2026

View reviewed changes

lilongen changed the title ~~docs(infra-local): align with v0.1.0 + dev runner-score override~~ feat(infra-local): BoxLite-based local dev stack (L1 + L2 + M5 native runner) May 29, 2026

lilongen force-pushed the feat/cloud-mvp branch from 57c5604 to 6071a90 Compare May 29, 2026 09:21

Copilot AI review requested due to automatic review settings May 29, 2026 09:44

Copilot AI reviewed May 29, 2026

View reviewed changes

Copilot Bot review requested due to automatic review settings May 29, 2026 10:05

Copilot AI reviewed May 29, 2026

View reviewed changes

Copilot Bot review requested due to automatic review settings May 29, 2026 10:12

Copilot AI reviewed May 29, 2026

View reviewed changes

Copilot Bot review requested due to automatic review settings May 29, 2026 11:39

Copilot AI reviewed May 29, 2026

View reviewed changes

DorianZheng reviewed Jun 4, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

lilongen mentioned this pull request Jun 7, 2026

fix(api): anchor OIDC-created user's Personal org to the default region #668

Merged

Copilot Bot review requested due to automatic review settings June 8, 2026 04:56

lilongen force-pushed the feat/cloud-mvp branch from 9358cdb to 503efc3 Compare June 9, 2026 03:28

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Copilot Bot review requested due to automatic review settings June 9, 2026 04:29

Copilot AI reviewed Jun 9, 2026

View reviewed changes

lile and others added 3 commits June 9, 2026 16:53

style(dashboard): format sheet.tsx (prettier)

50c3aea

Copilot Bot review requested due to automatic review settings June 10, 2026 05:26

DorianZheng requested a review from a team as a code owner June 10, 2026 05:26

Copilot AI reviewed Jun 10, 2026

Copilot Bot review requested due to automatic review settings June 10, 2026 05:52

Copilot AI reviewed Jun 10, 2026

DorianZheng added 2 commits June 10, 2026 14:40

Copilot Bot review requested due to automatic review settings June 10, 2026 07:56

Copilot AI reviewed Jun 10, 2026

Merge branch 'main' into feat/cloud-mvp

c4c6901

Signed-off-by: dorianzheng <8065637+DorianZheng@users.noreply.github.com>

DorianZheng previously approved these changes Jun 10, 2026

View reviewed changes

DorianZheng added 3 commits June 10, 2026 16:15

Merge branch 'feat/cloud-mvp' of github.com:lilongen/boxlite into fea…

ffe5d4e

…t/cloud-mvp

Copilot Bot review requested due to automatic review settings June 10, 2026 08:48

DorianZheng dismissed their stale review via ffe5d4e June 10, 2026 08:48

Copilot AI reviewed Jun 10, 2026

DorianZheng merged commit 9a438c1 into boxlite-ai:main Jun 10, 2026

DorianZheng mentioned this pull request Jun 10, 2026

chore(apps,e2e): sweep Sandbox→Box rename leftovers — regen toolbox client, fix dead env vars + stale refs #723

Merged

		bootstrap: { featureFlags: LOCAL_DEV_FEATURE_FLAG_DEFAULTS },
		advanced_disable_feature_flags: true, // skip /decide network call

		@@ -0,0 +1,217 @@
		# Milestone: `milestone/infra-local/v0.1.0`


		- [docs/development/rust-style.md](./docs/development/rust-style.md)

		## Documentation Language

		stack-reset-hard: ## like stack-reset, but also re-applies prod schema baseline
		./scripts/stack-reset.sh --hard

Conversation

lilongen commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

What works end-to-end (verified)

Operate-by-make

Notable implementation notes

Known limitations (intentional, not bugs)

Docs

Test plan

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lilongen commented May 26, 2026 •

edited by coderabbitai Bot

Loading

cla-assistant Bot commented Jun 6, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading