Skip to content

fix(snapshot): harden snapshot subsystem with chain-walk safety and crash recovery#390

Merged
DorianZheng merged 1 commit into
mainfrom
fix/snapshot-subsystem-hardening
Mar 16, 2026
Merged

fix(snapshot): harden snapshot subsystem with chain-walk safety and crash recovery#390
DorianZheng merged 1 commit into
mainfrom
fix/snapshot-subsystem-hardening

Conversation

@DorianZheng

Copy link
Copy Markdown
Member

Summary

  • Reverse deletion order in remove(): Delete DB record before filesystem to prevent phantom snapshots on crash (orphaned files are harmless and GC-able)
  • Validate backing path in crash recovery: When both container disk and snapshot disk exist during recovery, validate the COW child's backing path actually points to the expected snapshot; preserve both files on mismatch for manual inspection
  • Log warning on chain walk errors: read_backing_chain() now emits tracing::warn instead of silently swallowing errors
  • Add is_backing_dependency() helper: Walks full qcow2 backing chains to detect transitive dependencies, used by remove() to check container disks, other snapshots, and clone bases
  • 67 integration tests covering snapshot create/restore/remove, clone operations, and deep backing chain scenarios

Test plan

  • cargo test -p boxlite --lib — 31 unit tests pass
  • cargo clippy -p boxlite --lib -- -D warnings — clean
  • cargo clippy -p boxlite --tests -- -D warnings — clean
  • Integration test suite (67 tests) — all pass

…rash recovery

Fix three issues found during systems-level review of the snapshot/clone subsystem:

1. **Reverse deletion order in snapshot remove** (critical): Delete DB record
   before filesystem. Previous order (FS first, DB second) left orphaned DB
   records on crash — phantom snapshots visible to users. New order leaves
   orphaned files (harmless, cleaned by GC) on crash instead.

2. **Validate backing path in crash recovery** (medium): When both container
   disk and snapshot disk exist during recovery (Case 2), validate that the
   COW child's backing path actually points to the expected snapshot before
   declaring success. Preserves both files on mismatch for manual inspection.

3. **Log warning on chain walk errors** (low): `read_backing_chain()` now
   emits `tracing::warn` instead of silently swallowing errors, aiding
   diagnosis of partial chain results.

Also adds `is_backing_dependency()` helper that walks full qcow2 backing
chains to detect transitive dependencies, used by `remove()` to check
container disks, other snapshots, and clone bases before deletion.

Includes 67 integration tests covering snapshot create/restore/remove,
clone operations, and deep backing chain scenarios.
@DorianZheng DorianZheng merged commit 9ecfe0c into main Mar 16, 2026
18 checks passed
@DorianZheng DorianZheng deleted the fix/snapshot-subsystem-hardening branch March 16, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant