feat(security): remove host bind mounts; only managed volumes allowed#639
feat(security): remove host bind mounts; only managed volumes allowed#639G4614 wants to merge 2 commits into
Conversation
…icy walls Replaces the entire previous boxlite-ai#618 (admission guard + recovery budget + auto-GC + per-command statvfs check, ~1500 LOC) with a structural fallocate-based reserve (~200 LOC). The kernel now enforces the floor at every write(2); boxlite owns only the reserve file's lifecycle and the operator's recovery affordance. What lands: - boxlite::util::reserve — `ensure_reserve(home)` preallocates 64 MiB into `$BOXLITE_HOME/.reserve` via fallocate(mode=0). Idempotent + self-healing: top-up if size dropped, recreate if the file was removed. fallback to a 64 MiB zero-write when fallocate returns EOPNOTSUPP (tmpfs / some FUSE backends). - RuntimeImpl::new calls ensure_reserve right after layout.prepare(). From that moment on the host filesystem's f_bavail is 64 MiB lower for every writer — boxlite, the operator's other tools, anything else. No per-command statvfs poll, no policy table to maintain. - `boxlite reserve-release` CLI command — emergency: unlink the reserve so the operator can run gc / rm on a full host. unlink(2) is metadata-only on ext4/xfs/btrfs, so it works at 0 free. The next runtime construction will lay the reserve back down automatically. - CLI dispatcher catches ENOSPC chains and prints a one-line hint pointing at `boxlite reserve-release`. Substring + raw_os_error match so it works whether the error came from std::io::Error, BoxliteError::Storage(String), or a wrapped reqwest body upload. What goes away from the original boxlite-ai#618 scope (deliberately): - DiskSpaceTask init task + classify() three-tier thresholds - enforce_recovery_budget calls in 6 CLI commands + 6 REST handlers - periodic_recovery_budget_monitor for serve - boxlite gc + collect_garbage + sweep_orphan_disk_images + auto-GC self-heal (these were `prevention via reactive recovery`; with the structural reserve the equivalent recovery is one operator-driven `boxlite reserve-release` + manual cleanup, which is more predictable than auto-GC and surfaces ENOSPC cleanly to scripts) boxlite-ai#639 (GC scope expansion) and boxlite-ai#640 (RuntimeBackend wiring + boxlite df) will need to be restructured as standalone follow-ups since their base in this branch is now gone — addressed in a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ensure_reserve` was unconditional: every RuntimeImpl construction
re-acquired the 64 MiB reserve via fallocate. That breaks the
documented recovery flow:
host: 0 free, .reserve = 64 MiB
$ boxlite reserve-release → host: 64 MiB free, reserve gone
$ boxlite rm -f bigbox
RuntimeImpl::new
ensure_reserve → fallocate 64 MiB succeeds
host: 0 free again, reserve back
SQLite WAL grow → ENOSPC
The operator's 64 MiB lifeboat is consumed by the reserve top-up
before the recovery command can spend a byte of it.
Fix: hysteretic top-up. If the reserve is missing AND host_free is
below `2 × RESERVE_BYTES` (128 MiB), log a warn and defer. The
recovery now has a full reserve's worth of headroom; the reserve
self-heals on a later runtime construction after free recovers.
Implementation: extract `ensure_reserve_with_free(home, free)` so
the deferral rule can be exercised against hand-crafted free-byte
counts without filling a real filesystem. Promote the test-only
`statvfs_bavail_bytes` to module-scope `host_free_bytes` so we
don't duplicate the libc dance.
Also drop `boxlite gc` from the ENOSPC hint in main.rs — that
command isn't in this PR's scope (deferred to boxlite-ai#639). Sending the
operator to a non-existent command at the worst possible moment
is worse than no hint.
Two-side verified: with the deferral check reverted, the new
`defers_topup_when_host_free_below_threshold` test fails on its
"reserve file must not exist" assertion — proving the test
exercises the new branch and that without hysteresis the race
is silent (no error, just wrong outcome).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f721f8e to
51bcb34
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughRequire user volumes to live under home-scoped managed roots ( ChangesManaged Volume Security Gate
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Volumes are now structural: only boxlite-managed directories under
<BOXLITE_HOME>/volumes/{anonymous,named}/ are mountable into a box.
The CLI rejects the legacy `-v /host:/guest` shape outright with a
migration message, and the runtime enforces the same constraint as
a structural defence against SDK callers that bypass the CLI parser.
Threat model: the old `-v` shape made every host path the shim's uid
could r/w reachable from inside the box (virtio-fs collapsed onto
host kernel DAC at the shim's uid). `boxlite serve` daemons or any
multi-tenant deployment leaked the entire host file system the
moment a caller named a path. There is no general-purpose deny-list
that covers every sensitive path on every distro; the structurally
safe model is "only sources boxlite controls."
CLI surface (breaking):
-v /data anonymous, ephemeral per-box
-v myvol:/data named, persists across boxes
-v myvol:/data:ro read-only named
-v /data:ro read-only anonymous
-v /host:/guest REJECTED with a migration message
Runtime gate (resolve_user_volumes):
- takes &Path for runtime home
- canonicalises each host_path
- rejects anything not under <home>/volumes/{anonymous,named}/
- symlinks escaping the managed roots are caught (check runs after
canonicalize)
## Tests (two-side verification)
CLI parser unit (cli::tests):
- parse_volume_spec_rejects_host_bind_mount — side B verified.
- parse_volume_spec_anonymous / _anonymous_ro / _named / _named_ro
— side A.
- volume_flags_apply_to_anonymous_materializes_under_home_volumes
— anon dirs under <home>/volumes/anonymous/<ulid>.
- volume_flags_apply_to_named_is_stable_across_invocations
— named dirs reused across applications.
- volume_flags_apply_to_rejects_host_bind — apply_to surfaces
the parser rejection.
Runtime gate unit (litebox::init::types::tests):
- resolve_volume_rejects_non_managed_host_path — side B verified
by deleting the starts_with gate body in resolve_user_volumes;
test flipped red. Restored → green.
- resolve_volume_symlink_escape_rejected — also flipped red on the
same revert.
- resolve_volume_anonymous_under_home_succeeds /
_named_under_home_succeeds — side A.
- resolve_volume_empty_slice_ok — empty input still spawns cleanly.
CLI integration (src/cli/tests/host_volume_disabled.rs):
- host_bind_mount_rejected — `boxlite run -v /etc:/host_etc` exits
non-zero with the migration message.
- anonymous_volume_accepted — `-v /data` does NOT surface the
rejection.
- named_volume_accepted — `-v myvol:/data` does NOT surface the
rejection.
Existing integration tests (mount_security, security_enforcement)
used raw /tmp paths; both rewritten to materialise their volume
dirs under <home>/volumes/named/<name> directly so the runtime gate
admits them. Behaviour under test is unchanged.
Replaces the earlier opt-in approach (--allow-host-volumes
default-off flag on the experiment/volume-mount-disable branch) —
that approach kept the lever reachable behind a flag; this one
removes the lever entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
51bcb34 to
7f2b7b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cli/src/cli.rs (1)
751-759:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid creating volume directories before the full list validates.
apply_tocreates directories and mutatesopts.volumesas it iterates. If a later spec is invalid, the command returns an error after leaving behind orphan volume dirs and partially-applied state. Parse the full list first, then materialize/push only after validation succeeds.Suggested fix
) -> anyhow::Result<()> { let base = volume_home_base(home); - for s in self.volume.iter() { - let spec = parse_volume_spec(s)?; + let specs = self + .volume + .iter() + .map(|s| parse_volume_spec(s)) + .collect::<anyhow::Result<Vec<_>>>()?; + + for spec in specs { let host_path = materialize_volume(&base, &spec.source)?; opts.volumes.push(VolumeSpec { host_path, guest_path: spec.guest_path, read_only: spec.read_only,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/src/cli.rs` around lines 751 - 759, The loop in apply_to currently parses and materializes each volume spec and mutates opts.volumes as it iterates, leaving created dirs and partial state if a later spec fails; change the logic to first parse and validate all specs by mapping self.volume.iter() through parse_volume_spec (collecting Result<Vec<...>>) and returning early on parse errors, then, after successful validation, call volume_home_base(home) once and materialize_volume for each validated spec and push VolumeSpec entries into opts.volumes; ensure you reference parse_volume_spec, materialize_volume, volume_home_base, opts.volumes, and the VolumeSpec struct when implementing the two-phase flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/litebox/init/types.rs`:
- Around line 96-110: The current gate allows any descendant under
allowed_anon/allowed_named; tighten it so only per-volume directories (one path
component under the managed root) are accepted. Replace the starts_with checks
on resolved_path with a check that attempts strip_prefix(allowed_named) and
strip_prefix(allowed_anon) and ensures the remaining relative path has exactly
one non-empty component (e.g. count components == 1) so resolved_path
corresponds to allowed_root/<volume_name> and not the root itself or deeper
nested paths; update the error branch to use the same symbols (resolved_path,
allowed_anon, allowed_named, vol) when rejecting.
In `@src/cli/src/cli.rs`:
- Around line 633-642: The branch that handles two-part volume specs accepting
"guest:ro"/"guest:rw" currently accepts relative guest paths (e.g., "data:ro")
and should reject them; in the block that returns
(ParsedVolumeSource::Anonymous, guest, second.eq_ignore_ascii_case("ro")), add
the same absolute-path check used in the one-part case (ensure guest starts with
'/' and bail with an error if not) so relative paths are rejected, and add/keep
a regression test asserting "data:ro" is rejected; look for the variables parts,
guest and the ParsedVolumeSource::Anonymous return to locate the exact spot to
update.
---
Outside diff comments:
In `@src/cli/src/cli.rs`:
- Around line 751-759: The loop in apply_to currently parses and materializes
each volume spec and mutates opts.volumes as it iterates, leaving created dirs
and partial state if a later spec fails; change the logic to first parse and
validate all specs by mapping self.volume.iter() through parse_volume_spec
(collecting Result<Vec<...>>) and returning early on parse errors, then, after
successful validation, call volume_home_base(home) once and materialize_volume
for each validated spec and push VolumeSpec entries into opts.volumes; ensure
you reference parse_volume_spec, materialize_volume, volume_home_base,
opts.volumes, and the VolumeSpec struct when implementing the two-phase flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 993f004c-2047-4bc9-84ff-313cfb6b3407
📒 Files selected for processing (6)
src/boxlite/src/litebox/init/tasks/vmm_spawn.rssrc/boxlite/src/litebox/init/types.rssrc/boxlite/tests/mount_security.rssrc/boxlite/tests/security_enforcement.rssrc/cli/src/cli.rssrc/cli/tests/host_volume_disabled.rs
Two genuine quick wins from the bot review:
(1) `litebox/init/types.rs:99` — aggregate-root + deep-descendant
escape. The old `starts_with(allowed_anon | allowed_named)` gate
accepted not just per-volume directories but ALSO the aggregate
roots themselves (`<home>/volumes/named`) and arbitrary deeper
paths (`<home>/volumes/named/myvol/etc`). SDK callers could
mount the aggregate root and see every named volume on the host,
or mount a sub-tree of someone else's volume.
Tightened to `resolved_path.parent() == Some(allowed_{anon,named})`
so the path must be exactly one volume directory deep under a
managed root.
(2) `cli/cli.rs:642` — relative anonymous mount with options
(`-v data:ro`) slipped through because the 1-part case's
absolute-path check wasn't replicated in the 2-part
anonymous-with-options branch. Added the same `if
!guest.starts_with('/') { bail }` guard.
## Tests (two-side verification)
- `resolve_volume_aggregate_root_rejected` — both
`<home>/volumes/named` and `<home>/volumes/anonymous` themselves
are rejected; reverting the per-volume-dir check (returning to
`starts_with`) flips this red.
- `resolve_volume_deep_descendant_rejected` —
`<home>/volumes/named/myvol/etc/` is rejected for the same
reason.
- Existing `resolve_volume_anonymous_under_home_succeeds` /
`_named_under_home_succeeds` confirm the per-volume happy path
still works.
- `parse_volume_spec_anonymous_relative_with_options_invalid` —
`data:ro` and `data:rw` both rejected with "must be absolute"
error; reverting the new bail block flips this red.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After running the suite end-to-end: - test_detached_box_survives_cli_exit_and_is_reusable used `boxlite info <id>` for per-box state. The CLI's `info` is system- wide (version / runtime stats) — there's no per-box info command — so the call was always `unexpected argument` exit 2. Replaced step 3 with `boxlite ls`-row parsing + a state-keyword check. Marked xfail(strict=True) because step 3's exec hits the stdout-drop race that boxlite-ai#563 fixes (same as the reattach case below). - test_reattach_after_original_completes already broken by the same stdout race — short `echo X && exit 0` returns out=''. Marked xfail(strict=True) pointing at boxlite-ai#563. - test_readonly_volume_mount_flag_and_write_reject expected a host bind mount to surface inside the guest. The cloud runtime deliberately dropped host bind mounts (boxlite-ai#639 "remove host bind mounts; only managed volumes allowed"), so this case was testing a feature that doesn't exist by design. Flipped to the negative contract — REST callers passing host paths get them silently dropped, no /mnt/<x> ever surfaces. Pin: boxlite-ai#639's product direction. Net: 3 pass + 2 xfail(strict). When boxlite-ai#563 lands, both xfails flip xpass-strict and become regular passes (markers should be dropped at that point). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#710) add 5 e2e cases pinning REST contracts not covered today ## Coverage gaps Three classes of behaviour run through the SDK → API → Runner → VM chain but had no e2e pin on the chain itself; bugs in any one of them would silently regress through `make test:integration:*` (which only exercises local-FFI): ``` 1. CLI detach lifecycle — `boxlite run -d` returns, CLI process exits, does a fresh CLI process still see / exec the box? FFI side src/boxlite/tests/detach.rs, recovery.rs ✓ E2E side scripts/test/e2e/cases/ ✗ 2. Execution.attach / reattach contract — bogus exec_id should be a typed client error; reattach to a completed exec should return a usable handle. Runner side apps/runner/.../boxlite_exec_attach_test.go ✓ SDK <-> API end-to-end ✗ 3. host bind mount via REST — the cloud runtime intentionally dropped host bind mounts (#639); REST callers passing host paths must silently no-op, no /mnt/<x> in the guest. FFI surface src/boxlite/tests/mount_security.rs ✓ REST contract ✗ ``` ## Cases shipped ``` scripts/test/e2e/cases/test_cli_detach_recovery.py (2 cases) scripts/test/e2e/cases/test_exec_attach.py (2 cases) scripts/test/e2e/cases/test_volume_readonly.py (1 case) ``` 5 cases total. Author also dropped 6 cases from this branch's earlier incarnation that were already committed alongside their respective fix PRs (#686 / #688 / #689 / #691 / #692 / #696), so the diff is strictly net-new coverage. ## Test plan — run against current main Stack: local e2e, runner unchanged, no source edits. | Case | Result | Notes | |---|---|---| | `test_detached_box_exec_propagates_exit_code_on_fresh_cli` | ✅ PASS | exit-code passthrough across CLI processes | | `test_detached_box_survives_cli_exit_and_is_reusable` |⚠️ XFAIL (strict) | reaches step 3 (`boxlite exec <id> echo still-alive`) then hits the stdout-drop race that #563 fixes — marker drops when #563 lands | | `test_attach_with_bogus_id_is_typed_error` | ✅ PASS | bogus exec_id → typed `Exception` (not 5xx, not silent) | | `test_reattach_after_original_completes` |⚠️ XFAIL (strict) | same stdout-drop race (#563) on the original exec's `out=='first-output'` assertion | | `test_host_bind_mount_via_rest_is_silently_ignored` | ✅ PASS | the box created with `volumes=[(host_dir, "/mnt/ro", True)]` reports `MOUNT_LINE=<none>` from `/proc/mounts` and the host marker file is untouched — REST silently dropped the host path | The 2 XFAILs are tied to #563 (`fix(go-sdk): fold stream drain into Execution.Wait`). Once #563 merges, both xfails flip xpass-strict, the markers come off, and the suite is 5/5 green. No additional fix work is needed in this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end tests verifying CLI detach survival and box reusability across fresh CLI processes, including exit code propagation tests. * Added end-to-end tests for SDK reattach functionality, validating session state after execution completion. * Added end-to-end test confirming proper host bind mount handling during REST execution. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forbidding the box to mount host path as it likes by replacing
-v /host:/guestwith managed volumes<BOXLITE_HOME>/volumes/{anonymous,named}/);Test plan
Two-sided (reverted vs applied) via
litebox::init::types::tests::resolve_volume_*+cli::tests::*volume_spec*+cli/tests/host_volume_disabled.rs, the reverted side toggling whether the runtime gate body or the CLI bind-mount reject branch is in place.resolve_volume_rejects_non_managed_host_path— non-managed host_path returnsConfigwith "not a boxlite-managed volume"; side B (gate body removed) admits the path and the test sees aResolvedVolume.resolve_volume_symlink_escape_rejected— symlink under<home>/volumes/named/pointing to/tmp/...is rejected because thestarts_withcheck runs aftercanonicalize.parse_volume_spec_rejects_host_bind_mount+ integrationhost_bind_mount_rejected—boxlite run -v /etc:/host_etcexits non-zero with the migration message before any runtime touch.volume_flags_apply_to_named_is_stable_across_invocations— pins the "two boxes mount the same name → same dir" contract that the persistence story depends on.boxlite run -v /etc:/host_etc/etc/passwdhost bind mounts (`-v /host:/guest`) are not supported…BoxOptions.volumes[0].host_path = "/etc"/etcinto the boxVolume host path '/etc' is not a boxlite-managed volume. … Managed roots: '<home>/volumes/anonymous' and '<home>/volumes/named'.<home>/volumes/named/x → /tmp/tmpsilentlystarts_with)boxlite run -v /datacwd/data<home>/volumes/anonymous/<ulid>materialised 0700, mounted at/databoxlite run -v myvol:/data<home>/volumes/named/myvolmaterialised 0700, mounted at/data, persists across boxesmount_security/security_enforcementintegration testsTempDir::new_in("/tmp")<home>/volumes/named/<name>so they satisfy the runtime gate; behaviour under test unchangedBefore: any host path the shim's uid could r/w was reachable from inside the box (virtio-fs + host kernel DAC, no boxlite-level filter); after: only directories boxlite created under
<home>/volumes/{anonymous,named}/are reachable, regardless of who callsBoxOptions.volumes. The earlier--allow-host-volumesopt-in approach kept the lever behind a flag; this commit removes the lever entirely.Summary by CodeRabbit
Breaking Changes
-v /host:/guest) is no longer supported; migrate to anonymous (-v /path) or named (-v name:/path) volumes. Parser now rejects unsupported forms with migration guidance.Improvements