Skip to content

feat(vmm): add HypervisorProbe for post-failure VM diagnostics#430

Merged
DorianZheng merged 2 commits into
mainfrom
feat/hypervisor-probe-diagnostics
Apr 4, 2026
Merged

feat(vmm): add HypervisorProbe for post-failure VM diagnostics#430
DorianZheng merged 2 commits into
mainfrom
feat/hypervisor-probe-diagnostics

Conversation

@DorianZheng

Copy link
Copy Markdown
Member

Summary

  • Add HypervisorProbe trait for platform-abstracted hypervisor validation with startup_check() and diagnose_create_failure() methods
  • On macOS, when krun_start_enter() fails, HvfProbe calls hv_vm_create() directly to get the exact HVF error code that libkrun discards (collapses all to EINVAL)
  • Maps HV_NO_RESOURCESResourceExhausted with actionable message (address space limit), HV_DENIED → entitlement error, HV_BUSY → post-creation failure
  • Zero-cost on happy path — probe only runs after a failure

Context

macOS Hypervisor.framework has a hard limit of 128 VM address spaces (kern.hv.max_address_spaces). When exhausted, hv_vm_create() returns HV_NO_RESOURCES (0xfae94005), but libkrun discards the code at hvf/src/lib.rs:260 and returns generic -EINVAL (-22). Users see a misleading "rootfs" error.

Changes

File Change
src/shared/src/errors.rs Add ResourceExhausted(String) variant
src/boxlite/src/system_check.rs HypervisorProbe trait, HvfProbe (macOS), KvmProbe (Linux), HVF FFI
src/boxlite/src/vmm/krun/engine.rs Wire probe into enter() failure path
src/boxlite/src/vmm/krun/mod.rs Improve EINVAL message in check_status()
src/ffi/src/error.rs Add ResourceExhausted = 20 to FFI error codes
sdks/c/include/boxlite.h Auto-regenerated C header

Test plan

  • cargo clippy -p boxlite -p boxlite-shared --tests -- -D warnings (clean)
  • cargo test -p boxlite --lib (614 passed, 0 failed)
  • cargo test -p boxlite-shared (1 passed)
  • HVF constant verification test (hvf_ffi_constants)
  • HVF startup check test (hvf_probe_startup_check)
  • System check test (system_check_runs)
  • Pre-push hook: 220 Rust integration tests passed, C unit tests passed

macOS Hypervisor.framework returns HV_NO_RESOURCES (0xfae94005) when
the 128 VM address space limit is exhausted, but libkrun discards the
specific error code and collapses all failures to EINVAL (-22).

Add a HypervisorProbe trait that provides platform-abstracted post-failure
diagnostics. On macOS, when krun_start_enter() fails, the HvfProbe calls
hv_vm_create() directly to reproduce and identify the exact HVF error:
- HV_NO_RESOURCES → ResourceExhausted with actionable message
- HV_SUCCESS → not HVF-related, return original error
- HV_BUSY → VM created but failed post-creation
- HV_DENIED → missing entitlement

This is zero-cost on the happy path — the probe only runs after failure.

Changes:
- Add ResourceExhausted error variant to BoxliteError
- Add HypervisorProbe trait with HvfProbe (macOS) and KvmProbe (Linux)
- Wire probe into KrunVmmInstance::enter() failure path
- Improve EINVAL message in check_status() to list both causes
- Add ResourceExhausted to FFI error codes
@DorianZheng DorianZheng merged commit 0539856 into main Apr 4, 2026
20 checks passed
@DorianZheng DorianZheng deleted the feat/hypervisor-probe-diagnostics branch April 4, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant