feat-stress(disk_guard): structural reserve via fallocate, replacing the policy walls by G4614 · Pull Request #618 · boxlite-ai/boxlite

G4614 · 2026-05-28T12:13:26Z

use fallocate to reserve some space to ensure the box alive and the ability to remove something

Test plan

The one test that decides whether #618 works

release_unblocks_writes_on_a_full_host (GATED integration, src/cli/tests/reserve_release.rs)

Three properties have to all be true for the structural reserve to mean anything:

The 64 MiB is physically allocated, not sparse — other writers can't take it.
When the host fs is truly at f_bavail = 0, the reserve still owns its blocks.
unlink(2) at zero free still works (metadata-only), and the freed bytes immediately become writable again.

This test exercises all three end-to-end on a real bounded mount:

1. bootstrap reserve on $BOXLITE_RESERVE_TEST_HOME
2. progressive-chunk fill (4 MiB → 64 KiB → 4 KiB → 1 byte)
   until even a 1-byte write returns ENOSPC                 ← proves prop 1+2: reserve owns its blocks
3. probe-write 1 byte → expect ENOSPC                       ← confirms host is genuinely full
4. boxlite reserve-release                                  ← proves prop 3: unlink works at 0 free
5. same probe-write → expect success                        ← proves bytes really return to fs

Gated behind BOXLITE_RESERVE_TEST_HOME=/path/to/bounded-mount. CI can't reach this without a privileged tmpfs setup; the GATED env-var pattern lets reviewers run it on a dev box with two lines:

sudo mount -t tmpfs -o size=256M tmpfs /tmp/m
BOXLITE_RESERVE_TEST_HOME=/tmp/m \
  make test:integration:cli FILTER=release_unblocks_writes_on_a_full_host

Two-side verified in the commit message (efb556a): the original single-pass version of this test mistakenly passed because the 4 MiB chunk fill left a few-MiB gap, and the 1-byte probe slipped through. The multi-pass cascade was the fix and is documented inline.

Supporting coverage (21 tests, brief)

The end-to-end above is the load-bearing test. The remaining 21 tests guard properties that have to hold for the e2e to even be reachable, plus orthogonal correctness concerns:

concern	tests	proves
reserve is physically allocated, not sparse	`ensure_reserve_allocates_real_blocks`	statvfs `f_bavail` drops by ~64 MiB
release returns bytes to fs free pool	`release_returns_bytes_to_host_available`	statvfs `f_bavail` rises by ~64 MiB after unlink
bootstrap / idempotency / self-heal	4 unit + 4 integration	every `boxlite` cmd auto-creates the reserve; double-call no-ops; external `rm`/partial file/4-thread race all top up to exactly 64 MiB
recovery race fix (my `865ce86`)	3 unit	hysteresis: `host_free < 2 × RESERVE_BYTES` defers top-up so `boxlite reserve-release` actually leaves headroom for the next `rm`
box-cycle invariance (my `0a06569`)	2 integration	release with a running box leaves the box `Running`; 5 release/restore cycles likewise
ENOSPC hint plumbing	4 unit (CLI matcher)	kernel ENOSPC errno 28 / "No space left" substring across anyhow chain → operator hint; ENOENT / 401 don't false-positive

All seven tests marked under "race fix" / "self-heal" / "box-cycle" are two-side verified — for each, a specific regression was injected and the test was observed to fail on the assertion that names the property. Details in the individual commit messages (865ce86, 3078b3d, 0a06569).

How to run

make test:unit:rust FILTER=reserve::                                          # 11 unit tests, no VM
make test:integration:cli FILTER='reserve_release|box_survives|looks_like_enospc'  # 11 integration / CLI-binary tests, needs VM + alpine pull
# Plus the GATED e2e above

Known gaps (deliberate, with rationale)

Real f_bavail = 0 for every CLI command — only the load-bearing e2e reaches this state. Filling shared /tmp to verify every other command would destabilize CI; the four ENOSPC matcher tests cover the surface-area indirection that turns a kernel ENOSPC into the operator hint.
fallocate itself returning ENOSPC at attempt-time — the hysteresis guard prevents reaching fallocate whenever host_free < 2 × RESERVE_BYTES; the only entry into that error path is statvfs misreporting, which is production-unreachable.
Kernel atomicity of fallocate mid-crash — Linux contract, not boxlite's responsibility. Whatever meta.len() the kernel leaves behind, the partial-reserve self-heal test covers recovery on next runtime build.
Cross-FS semantics (XFS / btrfs / ZFS) — CI runs on ext4; XFS / btrfs share the POSIX fallocate(mode=0) contract; ZFS is not a supported deployment target (documented in the module header).

…icy walls Replaces the entire previous boxlite-ai#618 (admission guard + recovery budget + auto-GC + per-command statvfs check, ~1500 LOC) with a structural fallocate-based reserve (~200 LOC). The kernel now enforces the floor at every write(2); boxlite owns only the reserve file's lifecycle and the operator's recovery affordance. What lands: - boxlite::util::reserve — `ensure_reserve(home)` preallocates 64 MiB into `$BOXLITE_HOME/.reserve` via fallocate(mode=0). Idempotent + self-healing: top-up if size dropped, recreate if the file was removed. fallback to a 64 MiB zero-write when fallocate returns EOPNOTSUPP (tmpfs / some FUSE backends). - RuntimeImpl::new calls ensure_reserve right after layout.prepare(). From that moment on the host filesystem's f_bavail is 64 MiB lower for every writer — boxlite, the operator's other tools, anything else. No per-command statvfs poll, no policy table to maintain. - `boxlite reserve-release` CLI command — emergency: unlink the reserve so the operator can run gc / rm on a full host. unlink(2) is metadata-only on ext4/xfs/btrfs, so it works at 0 free. The next runtime construction will lay the reserve back down automatically. - CLI dispatcher catches ENOSPC chains and prints a one-line hint pointing at `boxlite reserve-release`. Substring + raw_os_error match so it works whether the error came from std::io::Error, BoxliteError::Storage(String), or a wrapped reqwest body upload. What goes away from the original boxlite-ai#618 scope (deliberately): - DiskSpaceTask init task + classify() three-tier thresholds - enforce_recovery_budget calls in 6 CLI commands + 6 REST handlers - periodic_recovery_budget_monitor for serve - boxlite gc + collect_garbage + sweep_orphan_disk_images + auto-GC self-heal (these were `prevention via reactive recovery`; with the structural reserve the equivalent recovery is one operator-driven `boxlite reserve-release` + manual cleanup, which is more predictable than auto-GC and surfaces ENOSPC cleanly to scripts) boxlite-ai#639 (GC scope expansion) and boxlite-ai#640 (RuntimeBackend wiring + boxlite df) will need to be restructured as standalone follow-ups since their base in this branch is now gone — addressed in a separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`ensure_reserve` was unconditional: every RuntimeImpl construction re-acquired the 64 MiB reserve via fallocate. That breaks the documented recovery flow: host: 0 free, .reserve = 64 MiB $ boxlite reserve-release → host: 64 MiB free, reserve gone $ boxlite rm -f bigbox RuntimeImpl::new ensure_reserve → fallocate 64 MiB succeeds host: 0 free again, reserve back SQLite WAL grow → ENOSPC The operator's 64 MiB lifeboat is consumed by the reserve top-up before the recovery command can spend a byte of it. Fix: hysteretic top-up. If the reserve is missing AND host_free is below `2 × RESERVE_BYTES` (128 MiB), log a warn and defer. The recovery now has a full reserve's worth of headroom; the reserve self-heals on a later runtime construction after free recovers. Implementation: extract `ensure_reserve_with_free(home, free)` so the deferral rule can be exercised against hand-crafted free-byte counts without filling a real filesystem. Promote the test-only `statvfs_bavail_bytes` to module-scope `host_free_bytes` so we don't duplicate the libc dance. Also drop `boxlite gc` from the ENOSPC hint in main.rs — that command isn't in this PR's scope (deferred to boxlite-ai#639). Sending the operator to a non-existent command at the worst possible moment is worse than no hint. Two-side verified: with the deferral check reverted, the new `defers_topup_when_host_free_below_threshold` test fails on its "reserve file must not exist" assertion — proving the test exercises the new branch and that without hysteresis the race is silent (no error, just wrong outcome). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anism The four existing reserve tests pinned the mechanism (file created / removed / recreated / idempotent). They didn't verify the user-visible outcome the reserve exists for: "after I run reserve-release on a full host, can I actually write again?" Add two tests covering that gap. - release_returns_bytes_to_host_available (lib unit) — pins that statvfs.f_bavail really climbs by ~RESERVE_BYTES after release, not just that the file disappears. - release_unblocks_writes_on_a_full_host (CLI integration, gated) — drives the whole story end-to-end on a real bounded fs: bootstrap reserve, fill the host with garbage until every chunk size from 4 MiB down to 1 byte returns ENOSPC, verify a small probe write fails with ENOSPC, run `boxlite reserve-release`, verify the *same* probe write now succeeds. Gated on BOXLITE_RESERVE_TEST_HOME pointing at a dedicated small mount. The progressive-chunk fill (4 MiB → 64 KiB → 4 KiB → 1 byte) is deliberate — without the small-chunk passes, the 4 MiB write returns ENOSPC while there are still 3.99 MiB free, and the 1-byte probe slips through and falsifies step 3. Pre-fix verified on a 256 MiB loop ext4: the single-pass version of this test mistakenly passed because the probe found 1 byte to land in; the multi-pass version now stably reproduces the "every write ENOSPCs" precondition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI's macOS clippy job failed on src/boxlite/src/util/reserve.rs:135 — `libc::fallocate` is Linux-only, doesn't exist on macOS / *BSD. Wrap the fast-path fallocate call in `#[cfg(target_os = "linux")]` so non-Linux builds skip straight to the zero-write fallback, which works cross-platform. Semantics are identical (the reserve still consumes 64 MiB of real disk); only the cost shifts from one syscall to ~16 4-MiB sequential writes. boxlite's runtime is Linux-only anyway — the macOS build target exists only for CLI / SDK build-validation. Also gates the now-Linux-only `AsRawFd` import to keep the unused- import warning quiet on macOS clippy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Empirical finding from manual host-fill testing (2026-06-01): the `ensure_reserve` / `release_reserve` lifecycle touches only the `.reserve` inode under `$BOXLITE_HOME` and never any `boxes/<id>/` content, so running boxes are unaffected by recovery operations. That property is critical for the documented recovery flow — without it, operators hitting ENOSPC would have to choose between recovering disk and losing their boxes. Add two integration tests that pin this invariant without needing sudo or actual host-fs filling: - `reserve_release_does_not_disturb_running_box`: start an idle `sleep 3600` box, run `reserve-release`, assert the box is still `Running` and the next runtime construction auto-restores the reserve without disturbing the box again. - `box_survives_multiple_reserve_cycles`: stress the lifecycle with five release/restore cycles to catch stateful regressions that might not surface on the first iteration. What this does NOT verify (documented in module header): the `f_bavail=0` scenarios that need a privileged tmpfs mount — shim-process survival, CLI ENOSPC handling, metadata-only unlink at zero free. Those were empirically confirmed in manual testing and the runbook for re-running them is to be added separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds three reserve unit tests that close gaps in the existing matrix: - `zero_write_to_produces_exact_size_filled_with_zeros`: the fallocate-fallback path (used on tmpfs / NFSv3 / some FUSE where `fallocate(mode=0)` returns EOPNOTSUPP) was previously untested because we can't easily get a tempdir to return EOPNOTSUPP. Factor the zero-write loop out into a `zero_write_to(file, total)` helper and drive it directly with a fresh file. The helper is size-agnostic so 4 MiB + 7 bytes exercises the same full-buffer + tail-partial logic as 64 MiB. - `ensure_reserve_self_heals_partial_reserve_file`: pin the `meta.len() >= RESERVE_BYTES` self-heal branch. Pre-create the reserve at RESERVE_BYTES / 2 (simulating a crash mid-fallocate or an external truncate), call `ensure_reserve`, assert the file is topped up to the full RESERVE_BYTES. Without this, a loosened check (e.g. `> 0` instead of `>= RESERVE_BYTES`) would silently let undersized reserves stand. - `concurrent_ensure_reserve_is_safe`: spawn 4 threads all calling `ensure_reserve` on the same home dir, assert no thread errors and the final file is exactly RESERVE_BYTES (not torn, not doubled). Pins the architectural race-safety expectation against a future refactor that might introduce non-atomic operations (e.g. switching `truncate(false)` to `truncate(true)`). Two-side verified manually (logged here since the corresponding production change is only the small zero_write_to extraction, not a fix): - zero_write_to: replaced `remaining = total_bytes` with `remaining = total_bytes / 2`. Test failed: "zero_write_to must produce a file of exactly `total_bytes` long left: 2097155, right: 4194311" — proves the size assertion is real, not tautological. - self_heals_partial: replaced `meta.len() >= RESERVE_BYTES` with just `if meta exists` (no size check). Test failed: "ensure_reserve must top up a partial reserve to RESERVE_BYTES; saw 33554432, left: 33554432, right: 67108864" — proves the test exercises the self-heal branch. - concurrent: replaced `RESERVE_BYTES` in the fallocate call with `RESERVE_BYTES / 2`. Test failed: "after concurrent ensure_reserve, file must be exactly RESERVE_BYTES; got 33554432, left: 33554432, right: 67108864" — confirms the test catches at least the size-based regression class. The truncate(false) → truncate(true) race regression did *not* trip the test (fallocate-of-equal-size is essentially atomic on Linux), which is honest evidence that the test mostly pins regression against size mistakes, not race mistakes per se. Module-scope `host_free_bytes` is unchanged; gamnaansong's existing 4 tests for `looks_like_host_enospc` in main.rs already cover the ENOSPC hint plumbing — no gap there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cla-assistant · 2026-06-06T08:40:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ G4614
❌ Ubuntu

Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

G4614 mentioned this pull request May 29, 2026

test-stress(disk-guest): box rootfs is a bounded, isolated disk that survives a fill #623

Merged

G4614 changed the title ~~feat(disk): host disk-space admission guard + boxlite gc for orphaned image disks~~ feat-stress(disk): host disk-space admission guard + boxlite gc for orphaned image disks May 29, 2026

G4614 force-pushed the feat/cache-gc branch 2 times, most recently from f61692f to 64681a6 Compare June 1, 2026 04:20

G4614 changed the title ~~feat-stress(disk): host disk-space admission guard + boxlite gc for orphaned image disks~~ [high] feat-stress(disk): host disk-space admission guard + boxlite gc for orphaned image disks Jun 1, 2026

G4614 force-pushed the feat/cache-gc branch from 18ce4a5 to f75e08c Compare June 1, 2026 08:22

G4614 mentioned this pull request Jun 1, 2026

feat(volumes): per-volume size cap (size=N[K|M|G]) #636

Draft

G4614 force-pushed the feat/cache-gc branch from e0797f8 to f75e08c Compare June 1, 2026 10:10

This was referenced Jun 1, 2026

feat(security): remove host bind mounts; only managed volumes allowed #639

Open

test(net): hermetic + e2e coverage for #645 allowlist DNS resilience #640

Draft

G4614 force-pushed the feat/cache-gc branch from f75e08c to 246a8e1 Compare June 1, 2026 11:39

G4614 changed the title ~~[high] feat-stress(disk): host disk-space admission guard + boxlite gc for orphaned image disks~~ feat(disk_guard): structural reserve via fallocate, replacing the policy walls Jun 1, 2026

G4614 changed the title ~~feat(disk_guard): structural reserve via fallocate, replacing the policy walls~~ feat-stress(disk_guard): structural reserve via fallocate, replacing the policy walls Jun 1, 2026

Ubuntu and others added 5 commits June 1, 2026 13:22

G4614 marked this pull request as ready for review June 2, 2026 04:59

G4614 mentioned this pull request Jun 2, 2026

feat(disk_guard,gc,df): fallocate reserve + cache reclaim + boxlite df #647

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat-stress(disk_guard): structural reserve via fallocate, replacing the policy walls#618

feat-stress(disk_guard): structural reserve via fallocate, replacing the policy walls#618
G4614 wants to merge 6 commits into
boxlite-ai:mainfrom
G4614:feat/cache-gc

G4614 commented May 28, 2026 •

edited

Loading

Uh oh!

cla-assistant Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

G4614 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

The one test that decides whether #618 works

Supporting coverage (21 tests, brief)

How to run

Known gaps (deliberate, with rationale)

Uh oh!

cla-assistant Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

G4614 commented May 28, 2026 •

edited

Loading