feat(box): --cap option, default all open#597
Conversation
1eb17ad to
45bc06d
Compare
Repeatable `--cap-add CAP` flag adds individual Linux capabilities to the container. "ALL" grants every cap. CAP_SYS_ADMIN triggers cgroup2 rw mount automatically. Wired end-to-end: CLI → BoxOptions.added_caps → proto → guest OCI spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cap-add's ALL expansion was a no-op: build_capabilities double-prefixed
cap names (format!("CAP_{name}") over capability_names(), which already
returns "CAP_*"), so every Capability::from_str("CAP_CAP_*") errored and
the container silently kept only the 14-cap default set — missing
SYS_ADMIN/NET_ADMIN that privileged workloads need. Replace the broken
loop with all_capabilities() enumerating all 41 OCI capabilities.
Privileged containers (cap-add ALL/SYS_ADMIN) also need a writable
/proc/sys: dockerd writes net.ipv4.ip_forward to bring up its bridge.
LinuxBuilder otherwise defaults readonlyPaths/maskedPaths to the OCI
lists, which remount /proc/sys read-only — override them with empty sets
when privileged (matching `docker --privileged`). Non-privileged boxes
keep the protective defaults.
Together these let `boxlite run --kernel net --cap-add ALL docker:dind`
start dockerd end-to-end: Daemon initialized, API listening on
/var/run/docker.sock, `docker version` client+server, overlayfs storage,
cgroup v2. Adds guest unit tests for both (all_capabilities superset,
ALL effective set includes SYS_ADMIN/NET_ADMIN, privileged spec clears
readonly/masked paths while non-privileged keeps /proc/sys read-only).
Also rustfmt the kernel-net build.rs constants/Fetcher calls, committed
unformatted in the kernel-selection commit, so the branch passes fmt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`build_capabilities` consumed every per-name `--cap-add` arg through
`format!("CAP_{name}")` (or "kept as-is if already CAP_-prefixed"),
yielding a "CAP_*" string handed to `oci_spec::runtime::Capability::
from_str`. But oci-spec 0.6 accepts the *bare* form ("SYS_ADMIN"), not
"CAP_SYS_ADMIN" — every call errored, the loop fell into the `_ => warn!`
arm, and the cap was silently dropped. End result: `--cap-add SYS_ADMIN`
on the CLI got all the way to OCI spec building, then quietly collapsed
to the 14-cap default. `--cap-add ALL` happened to work because
8f63e3b had already replaced that arm with `all_capabilities()`; the
per-cap arm kept the broken normalisation.
Fix: strip the `CAP_` prefix if present and pass the bare form. Both
`--cap-add SYS_ADMIN` and `--cap-add CAP_SYS_ADMIN` now map to
`Capability::SysAdmin`.
Also fixes a follow-on miss in the Node SDK: a2a501d added
`BoxOptions.added_caps` but left the field-by-field initialiser in
`sdks/node/src/options.rs` untouched, so the workspace failed to lint
once `added_caps` was no longer covered by a Default fallback. Add
`added_caps: vec![]` to the JS→Rust conversion (Node clients can't
target this field today; exposing it is a separate decision).
The companion integration tests in src/cli/tests/cap_add.rs were what
caught the per-cap bug — the pre-existing unit tests
(`cap_add_propagates_to_options`) only asserted the value carried in
`BoxOptions.added_caps`, which the bug never touched. See the next
commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three integration tests on alpine, asserting on `CapEff:` from /proc/self/status inside the container — the kernel-visible effective capability mask, i.e. the receiving end of the chain boxlite-ai#597 plumbs: CLI --cap-add → BoxOptions.added_caps → proto added_caps → guest build_capabilities → OCI Spec process.capabilities → libcontainer → /proc/self/status::CapEff Existing boxlite-ai#597 tests stopped at the host-side `BoxOptions.added_caps` value; nothing checked whether anything actually reached the container. This commit exercises the whole chain at the kernel boundary. - default_box_cap_eff_matches_docker_baseline: a vanilla `run alpine` must yield CapEff == 0xa80425fb (the Docker 14-cap default). Catches a regression that silently widens/narrows `default_capabilities()`. - cap_add_sys_admin_sets_bit_21_in_cap_eff: `--cap-add SYS_ADMIN` must produce default | (1<<21). Asserting the exact resulting mask (not just bit-21 set) also catches an accidental cap-set *replacement* (default bits drop while only SYS_ADMIN remains) and host-side double-add / alias drift (extra bits appear). --- This is the test that caught the per-cap normalisation bug fixed in the previous commit. Two-side verified on the rebuilt+ re-embedded guest: pre-fix, CapEff = 0xa80425fb (no SYS_ADMIN); post-fix, CapEff = 0xa82425fb (bit 21 set). - cap_add_all_grants_full_cap_eff: `--cap-add ALL` must set >=38 of the 41 OCI cap bits. The slack (vs "exactly 41") accommodates kernel versions where the OCI spec hasn't caught up with a new cap, without giving a false-positive on a regression that drops 3+ caps. Pins the 8f63e3b fix to the `ALL` arm. Reading CapEff via `awk '/^CapEff:/ {print $2}' /proc/self/status` needs nothing more than busybox + procfs — alpine:latest ships both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…TY exec inheritance + parse-time validation)
Inverts the cap baseline and rewires the surface accordingly:
- default ALL caps on (was: docker 14)
- --cap NAME=0 drops one
- --cap NAME=1 explicit grant (no-op vs default but allowed for audit)
- --cap ALL=0 drops every cap (zero-cap sandbox)
- --cap ALL=1 default, no-op
- CAP_ prefix optional on every name; case-insensitive
The reason for the inversion: in boxlite the trust boundary is the VM
(libkrun), not the container. A docker-style restricted-default added
friction without adding meaningful isolation against the host, because
escaping the container only gets you the guest kernel — which libkrun
contains regardless of what caps the container has.
The previous `--cap-add` surface tried to thread an "additive on top
of 14" model through the same plumbing, which had three layered bugs
the symmetric model makes structurally hard to reintroduce:
(a) The ALL-expansion arm double-prefixed (CAP_CAP_*), silently
no-op'ing `--cap-add ALL`. Fixed in 8f63e3b but the bug pattern
survived.
(b) The per-cap arm added CAP_ before calling oci-spec 0.6's
Capability::from_str — which only accepts the bare form
("SYS_ADMIN"), not the prefixed one. Every per-cap add silently
dropped into a warn-and-skip; `--cap-add SYS_ADMIN` never
reached the container. Fixed in 463064a (this branch).
(c) Both exec paths (zygote.rs TTY + non-TTY) hardcoded their own
cap source instead of inheriting the container's. TTY exec
passed `build_capabilities(&[])`; non-TTY passed
`capability_names()` (the docker 14 list verbatim). So even
a correctly-added SYS_ADMIN on the container's init was
silently dropped on every `boxlite exec` — the operator saw
the cap on the init process, then lost it on every subsequent
shell into the box.
All three are fixed structurally by the new model:
- There is exactly one cap source: `resolve_cap_set(cap_overrides)`,
starting from `all_capabilities()` and applying entries in order.
No additive vs. replace ambiguity; no "first match" vs "ALL only".
- Per-cap normalisation is one line: `strip_prefix("CAP_")`. Both
`--cap SYS_ADMIN=0` and `--cap CAP_SYS_ADMIN=0` produce the same
override before `Capability::from_str` is called.
- Unknown cap names error at CLI parse (oci-spec's from_str is the
authoritative dictionary, pulled in as a build-only dep on
boxlite-cli). A typo like `SYS-ADMIN=0` fails `boxlite run`
before the box is created — no silent guest-side warn.
- BuildSpec carries the container's `cap_overrides` through the
zygote IPC. Both TTY (build_tty_exec_process) and non-TTY
(`with_capabilities(cap_override_libcontainer_names(...))`) exec
paths replay the same overrides as init.
Plumbing surface:
schema/proto BoxOptions { cap_overrides: Vec<CapOverride> }
and `repeated CapOverride cap_overrides = 6` in
ContainerInitRequest. CapOverride is a (name,
enabled) pair with serde and prost derives.
host bridge portal/interfaces/container.rs maps host-side
CapOverride → proto.
guest-local mirror container/capabilities.rs defines a serde-
deriving CapOverride that the zygote IPC carries
in BuildSpec. From<boxlite_shared::CapOverride>
bridges proto → local at the gRPC boundary in
service/container.rs.
Container stashes cap_overrides at start; ContainerCommand
reads them via self.cap_overrides on cmd(); each
BuildSpec replay copies them in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six integration tests on alpine, each asserting on `CapEff:` from
/proc/self/status inside the container — the kernel-visible effective
capability mask, i.e. the receiving end of the chain the previous
commit plumbs:
CLI --cap NAME=0|1 → BoxOptions.cap_overrides → proto cap_overrides
→ guest build_capabilities → OCI Spec process.capabilities
→ libcontainer → /proc/self/status::CapEff
Asserting at the kernel end catches a regression anywhere in that
chain, including ones that pass the parse-layer unit tests.
- default_box_cap_eff_is_essentially_all:
empty cap_overrides yields >=38 of 41 OCI cap bits and
explicitly includes CAP_SYS_ADMIN (bit 21). The diagnostic
message names the most likely regression source (a sneak
reappearance of `default_capabilities()` in the cap baseline).
- cap_drop_sys_admin_clears_bit_21_only:
`--cap SYS_ADMIN=0` produces `baseline & !(1<<21)`. The exact
equality catches over-broad drops (more than one bit removed)
AND silent grants (bit 21 still set) in the same pass; either
direction of regression flips the test red.
- cap_all_zero_clears_every_bit:
`--cap ALL=0` empties CapEff entirely — the zero-cap sandbox
use case (e.g. CTF payloads that shouldn't even bind a low
port).
- cap_all_zero_then_sys_admin_one_leaves_just_bit_21:
Pins the "later overrides earlier" semantics. Without this,
"minimum permission + one specific cap" isn't expressible.
- exec_inherits_container_cap_drops:
Pins that `boxlite exec` reads the *container's* cap_overrides
rather than silently falling back to a hardcoded default. This
is what catches the inverted exec leak under the default-ALL
model — pre-fix, `--cap SYS_ADMIN=0` on the container left
SYS_ADMIN granted on every exec'd process. Two-side verified:
reverting BuildSpec.cap_overrides to `vec![]` re-produces
CapEff = 0x000001ffffffffff (full ALL, including bit 21) on
the exec'd process, which the test correctly catches.
- cap_unknown_name_rejected_at_cli_parse:
A typo like `SYS-ADMIN=0` (dash) must fail at `boxlite run`
parse, not silently no-op in the guest log. The CLI parse
layer is the first opportunity to reject — failing here also
means the box never starts, which is the right behaviour
(better to refuse than to produce an unintendedly-permissive
container).
Reading CapEff via `awk '/^CapEff:/ {print $2}' /proc/self/status`
needs nothing more than busybox + procfs — alpine:latest ships both.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-name later-overrides-earlier
Two unit tests on spec.rs alongside the existing five
`build_capabilities_*` cases:
- tty_exec_process_honours_cap_overrides:
`boxlite exec -t` goes through `build_tty_exec_process` instead
of the non-TTY `with_capabilities(...)` builder. The kernel-level
integration test `cap::exec_inherits_container_cap_drops` only
covers the non-TTY path (assert_cmd has no real PTY parent), so
the TTY hop was previously only pinned by inspection. This test
calls `build_tty_exec_process(..., &[SYS_ADMIN=0])` and asserts
the resulting OCI Process spec's effective set both drops
SysAdmin AND preserves NetAdmin (= not collapsed back to a
14-cap docker baseline). Two-side verified: hardcoding
`build_capabilities(&[])` re-introduces the pre-fix state and
reproduces the 41-cap-with-SysAdmin failure verbatim.
- build_capabilities_same_name_later_overrides_earlier:
Pins the "later overrides earlier" ordering rule for entries
hitting the same cap name. `SYS_ADMIN=0` then `SYS_ADMIN=1`
must end up granted; the reverse must end up dropped. The
existing `all_zero_then_sys_admin_one` test pinned the rule
for `ALL=0 → NAME=1` but not for repeated same-name entries,
which a future caller (REST config layering, env-var append)
will produce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rver + serde) boxlite-ai#597's `--cap NAME=0|1` flag worked end-to-end through the local CLI path, but the REST path silently dropped every override: - boxlite/src/rest/types.rs::CreateBoxRequest (the *client-side* wire form sent by `RestRuntime::create`) had no `cap_overrides` field, so `from_options(&BoxOptions)` discarded `BoxOptions.cap_overrides` before serialization. - cli/src/commands/serve/types.rs::CreateBoxRequest (the *server-side* form deserialized by `POST /v1/boxes`) also had no `cap_overrides` field, AND used `#[serde(deny_unknown_fields)]` — so even if a hand-rolled client had POSTed the right JSON, the server would have rejected it with a 4xx. - serve::build_box_options used `..Default::default()` to fill in everything beyond the explicit fields, so even with the previous two fixes the server would have stopped at an empty list. Net effect: any client routing through REST — including the Python, Node, Go SDKs once they expose `--cap` — saw the new flag in the CLI but got the default-ALL baseline regardless of what they asked for. Fix: - Add `cap_overrides: Option<Vec<CreateBoxCapOverride>>` to both structs. The wire type is a thin name/enabled pair owned by the rest module so the field can keep `skip_serializing_if = "Option:: is_none"` for backward compatibility (existing clients that never set `--cap` don't change their POST body shape). - Conversions: • `BoxOptions → CreateBoxRequest` (rest::types) populates the field from `options.cap_overrides`, None when empty. • `CreateBoxRequest → BoxOptions` (serve::build_box_options) unwraps Option to `vec![]` when absent (= default-ALL) and maps each entry into `boxlite::CapOverride`. - Bidirectional `From` impls on `CreateBoxCapOverride` ↔ `runtime::options::CapOverride` so the converter sites stay one-liners. Pinned by four new unit tests: - test_create_box_request_carries_cap_overrides_on_the_wire - test_create_box_request_omits_cap_overrides_when_empty - build_box_options_threads_cap_overrides_from_rest_body - build_box_options_no_cap_overrides_defaults_to_empty The first two pin the client side (BoxOptions → JSON), the last two the server side (JSON → BoxOptions), and the empty-case tests pin the backward-compat contract for pre-existing clients. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CapEff-bitmask asserts cover the kernel's *view* of the cap set, but
the load-bearing question for an operator is whether the kernel
actually *enforces* the cap. This test threads `mount(2)` through
both ends of the contract:
- Default-ALL box: `mount -t tmpfs tmpfs /tmp/mnt` must succeed
(with SYS_ADMIN granted, the cap check passes and the mount
landing site is unconstrained for an empty dir under /tmp).
- `--cap SYS_ADMIN=0` box: the same call must fail with EPERM.
BusyBox `mount` translates EPERM to "permission denied (are
you root?)"; either the textual cue or `Operation not permitted`
counts. The exit-code AND text conjunction catches a regression
that produces the right exit through an unrelated failure
(e.g. a broken mount binary).
This closes the gap between "the cap is in CapEff" (existing tests)
and "the cap actually gates a SYS_ADMIN-bound syscall." A regression
that, say, populates the bounding set but not the effective set
would pass the CapEff tests and silently fail enforcement; this test
keeps it red.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two integration tests on alpine, each verifying CapEff is byte-for-byte
preserved across a lifecycle transition:
- cap_overrides_persist_across_stop_start:
`boxlite stop` + `boxlite start` must reuse the original
cap_overrides from the persisted BoxConfig. Without this, a
regression at the persistence layer (e.g. cap_overrides dropped
from BoxConfig's on-disk serialization) would silently restore
the default-ALL baseline on every restart — exactly the
"restart privileged-dropped box → privileged again" surprise
operators don't expect.
- cap_overrides_persist_across_restart_verb:
`boxlite restart` is functionally `stop()` + `start()` today
(see commands/restart.rs), so this is the same persistence
contract verb-side. Pinning it as a separate test guards
against a future refactor that gives restart its own
possibly cap-losing path (e.g. an in-process restart that
bypasses the persistent BoxConfig).
Each test first anchors that the drop took on the fresh box, then
asserts the byte-equality of CapEff after the lifecycle transition.
The pre-anchor matters because a regression that fails to drop
in the first place (different bug, same observable on the "after"
side) would otherwise also pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… hatch boxlite-ai#613 makes the per-box health check task default-on so the new zombie-shim reaper (Issue boxlite-ai#523) has a watcher to piggy-back on. The schema field `BoxOptions.advanced.health_check: Option<HealthCheckOptions>` already lets Rust library users opt out via `None`, but CLI users (`boxlite run` / `create`) and REST/SDK clients (Python, Node, Go all route through `POST /v1/boxes`) had no way to express that choice — exactly the gap boxlite-ai#597 hit for `cap_overrides`. Surface plumbed end-to-end: - `ManagementFlags { no_health_check: bool }` — `--no-health-check` on `boxlite run` / `create`. When set, `apply_to` clears `opts.advanced.health_check = None`. - REST client-side `CreateBoxRequest` gets `health_check_disabled: Option<bool>`; `from_options` flips it to `Some(true)` only when the operator already disabled health check in the input options, so the wire form stays byte-identical for every other call. - Server-side `CreateBoxRequest` (in `cli/src/commands/serve/types.rs`, `#[serde(deny_unknown_fields)]`) gains the same field. The `build_box_options` mapper builds a manual `AdvancedBoxOptions` instead of falling through `..Default::default()`, then forces `health_check = None` iff the wire says so. Three-state semantics for `health_check_disabled`: - `Some(true)` → explicit disable; box runs with no watcher, no zombie reaping (operator takes responsibility — documented inline). - `Some(false)` → "use the server default" (= currently `Some(...)`). Same effect as omitting the field, but documents the intent on the wire. - absent → server default. Pre-boxlite-ai#613 clients keep the legacy POST body shape and get the new default-on behaviour without code changes. Seven unit tests cover the surface symmetrically: - CLI: `management_flags_no_health_check_clears_advanced_field` / `management_flags_no_health_check_unset_keeps_default_on` — flag-on clears, flag-off preserves; pre-asserts the baseline so a regression that broke the default-on schema would also fail. - REST client: `test_create_box_request_carries_health_check_disabled_on_the_wire` / `test_create_box_request_omits_health_check_disabled_when_default` — `None` in BoxOptions → `Some(true)` on the wire; `Some(default)` → field absent (backward-compat with pre-boxlite-ai#613 servers). - REST server: `build_box_options_health_check_disabled_true_clears_health_check` / `build_box_options_no_health_check_field_keeps_default_on` / `build_box_options_health_check_disabled_false_keeps_default_on` — three-state JSON → BoxOptions mapping, including the explicit "Some(false) means default" sentinel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reverted — branch force-pushed to origin/main ( |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Single line of production change at src/guest/src/container/spec.rs:285:
-let mut caps = all_capabilities();
+let mut caps = std::collections::HashSet::new();
The trust boundary becomes the container; operator opts into each cap
via --cap NAME=1 or --cap ALL=1. The --cap mechanism, REST wire, and
test infrastructure from prior commits stay; only the starting set
changes.
Tests updated:
- build_capabilities_default_baseline_is_all_caps → is_empty
Asserts CapEff empty + SysAdmin/NetAdmin/SysPtrace/SysModule NOT
present (sanity vs future re-flip).
- build_capabilities_drop_removes_only_named_cap → drop_after_opt_in_*
Original was vacuous under default-empty. Rewritten: opt in
SYS_ADMIN+NET_ADMIN, drop SYS_ADMIN, assert NET_ADMIN survives.
- new build_capabilities_opt_in_via_cap_name_eq_1: minimal additive
case.
- new build_capabilities_all_eq_1_yields_full_set: --cap ALL=1
escape hatch pin.
cargo check -p boxlite-guest --target x86_64-unknown-linux-musl passed
(20s). Test linking needs libseccomp.so on the host; not available
here, runs on CI.
Two-side toggle proof (production line):
let mut caps = all_capabilities(); // revert the flip
→ default_baseline_is_empty reds with
'default baseline must be empty (CapEff = 0); got 41 caps'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
--cap NAME=0|1default all 1Test plan
0xa80425fb(docker 14)0x1ffffffffff(≥38 of 41 OCI caps, incl. SYS_ADMIN)--cap SYS_ADMIN=0CAP_prefix thatCapability::from_strrejected)--cap ALL=0--cap-add)--cap ALL=0 --cap SYS_ADMIN=1cap_overridesSYS-ADMIN=0)warn-and-skip, box still startsmount tmpfsw/ default capsmount tmpfsw/--cap SYS_ADMIN=0permission denied (are you root?))POST /v1/boxescarryingcap_overridesdeny_unknown_fields4xx's the bodyBoxOptions.cap_overrides28 tests across CLI parse, runtime, guest unit, REST wire, and alpine e2e — two-side verified where the contract is non-obvious (TTY exec inheritance, mount enforcement, cap_overrides round-trip).
DinD/privileged plumbing (
/proc/syswritable, cgroup mounts) is intentionally out of scope — stacked draft PR.