Skip to content

feat: box lifecycle persistence and name support#38

Merged
DorianZheng merged 9 commits into
mainfrom
feature/box-lifecycle-persistence
Dec 26, 2025
Merged

feat: box lifecycle persistence and name support#38
DorianZheng merged 9 commits into
mainfrom
feature/box-lifecycle-persistence

Conversation

@DorianZheng

Copy link
Copy Markdown
Member

Summary

  • Add persistent box state management with database storage
  • Add optional user-defined names for boxes (unique constraint)
  • Support get/remove by ID or name
  • Add auto_remove option for automatic cleanup when box stops
  • Refactor BoxliteRuntime to thin facade pattern (implementation moved to RuntimeInnerImpl)

Changes

Box Lifecycle Persistence

  • Store box config and state in SQLite database
  • Recover boxes on runtime startup with PID validation
  • Handle system reboot detection

Box Naming

  • Optional name parameter on create()
  • Name uniqueness validation
  • Lookup by ID or name in get(), get_info(), exists(), remove()

Auto-remove

  • auto_remove option in BoxOptions (default: false)
  • Automatically remove box when stopped

Architecture Refactor

  • Move all implementation from BoxliteRuntime to RuntimeInnerImpl
  • BoxliteRuntime is now a thin facade with one-liner delegations
  • Enables LiteBox to call runtime methods directly via RuntimeInner

Python SDK

  • Add name parameter to create(), SimpleBox, InteractiveBox
  • Add name property to Box handle
  • Update get(), remove() to accept ID or name

Test plan

  • Create box with name, verify uniqueness constraint
  • Get box by name and by ID
  • Stop box with auto_remove, verify cleanup
  • Restart runtime, verify box recovery

Signed-off-by: dorianzheng <xingzhengde72@gmail.com>
Major refactoring to support persistent box state with stop/restart capabilities.

## Core Changes

### Database Persistence
- Add SQLite-based box state storage (db/boxes.rs, db/schema.rs)
- Persist BoxConfig and BoxState across runtime restarts

### Box Lifecycle Management
- Add BoxManager with state machine (Starting → Running → Stopped → Crashed)
- Implement stop(), restart, and reattach operations
- Add auto_remove option (like Docker --rm) for automatic cleanup

### Container Init Process
- Add stdio pipes to keep container init alive (guest/container/stdio.rs)
- Replace sleep infinity workaround with pipe-based blocking

### Architecture
- Refactor init stages to tasks-based pipeline
- Move controller to vmm/controller module
- Add BoxBuilder for lazy initialization
- Consolidate runtime types and state management

### SDK Updates
- Expose auto_remove in Python BoxOptions
- Update C SDK headers
- Add lifecycle example scripts
- Add auto_remove field to BoxOptions with manual Default impl
- Python SDK uses Option<bool> to inherit Rust default when None
- All Python box classes (SimpleBox, CodeBox, BrowserBox, ComputerBox,
  InteractiveBox) default to auto_remove=True (ephemeral)
- Low-level Rust BoxOptions defaults to false (persistent for debugging)
- Add optional user-defined name for boxes (unique, stored in BoxConfig)
- Support get/remove by ID or name (name lookup as fallback)
- Add name validation to prevent duplicates on create
- Add InvalidArgument error variant for validation errors

Refactor BoxliteRuntime to thin facade pattern:
- Move all implementation to RuntimeInnerImpl (rt_impl.rs)
- BoxliteRuntime delegates to self.inner for all operations
- Enables LiteBox to call runtime methods directly via RuntimeInner

Python SDK updates:
- Add name parameter to create(), SimpleBox, InteractiveBox
- Add name property to Box handle
- Update get/remove to accept id_or_name

Signed-off-by: dorianzheng <xingzhengde72@gmail.com>
- Add missing 'name: None' field to BoxConfig in test files
- Change &PathBuf to &Path in vmm_spawn.rs
- Implement FromStr trait for BoxStatus instead of inherent from_str method
- Allow clippy::module_inception for pipeline module
- Update lifecycle tests to use new create() signature with name parameter
@DorianZheng DorianZheng merged commit 84f522a into main Dec 26, 2025
4 checks passed
@DorianZheng DorianZheng deleted the feature/box-lifecycle-persistence branch December 26, 2025 11:46
G4614 added a commit that referenced this pull request Jun 11, 2026
…rt PRIMARY TD

Two fixes from run #38's symptoms:

1. Dockerfile.source ts-node startup failed with:
     TS5083: Cannot read file '/boxlite/apps/tsconfig.base.json'
   Root cause: codex PR #730 (Normalize monorepo Docker build paths)
   moved the apps Nx workspace root from repo-root to apps/. After
   that, apps/api/tsconfig.json's `extends: ../tsconfig.base.json`
   resolves to apps/tsconfig.base.json (not the repo-root one).
   Dockerfile.source still placed it at /boxlite/ with WORKDIR /boxlite,
   so the extends never resolved.

   Fix: mirror the codex production Dockerfile layout — WORKDIR
   /boxlite/apps, place tsconfig.base.json there, COPY apps/api/ to
   api/ and apps/libs/ to libs/. ENTRYPOINT paths drop the apps/
   prefix (api/tsconfig.app.json, api/src/main.ts).

2. Deploy-Api step false-positive:
   ECS DeploymentCircuitBreaker auto-rolled back the broken Api:20
   task definition, leaving the service stable on the previous
   Api:16. `aws ecs wait services-stable` returned success, ALB
   targets stayed healthy (old tasks were never deregistered), and
   /api/health responded 2xx — all answering from the OLD image.
   The workflow believed the deploy succeeded; pytest then ran
   against the old API, masking the broken new image.

   Fix: after `wait services-stable`, query the PRIMARY deployment's
   taskDefinition and assert it equals NEW_TD_ARN. If different,
   ECS rolled back — fail the step and dump stop reasons from the
   most recent stopped tasks for diagnosis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant