Skip to content

feat(vmm): add KVM health check to detect broken nested virtualization#417

Merged
DorianZheng merged 1 commit into
mainfrom
fix/kvm-health-check-nested-virt
Apr 1, 2026
Merged

feat(vmm): add KVM health check to detect broken nested virtualization#417
DorianZheng merged 1 commit into
mainfrom
fix/kvm-health-check-nested-virt

Conversation

@DorianZheng

Copy link
Copy Markdown
Member

Summary

  • Add runtime KVM health check that detects broken nested virtualization before VM creation
  • Update /dev/kvm permission error to suggest newgrp kvm instead of logout/login
  • Document EC2 c8i + Amazon Linux 2023 known issue in FAQ

Root Cause

Amazon Linux 2023 (kernel 6.1) on EC2 c8i instances (Intel Granite Rapids) has broken nested KVM. A minimal test executing a single HLT instruction returns KVM_EXIT_UNKNOWN (exit reason 0) instead of KVM_EXIT_HLT (5). The vCPU cannot execute any guest code.

Ubuntu 24.04 (kernel 6.17) on the same hardware works correctly.

The existing check only verified /dev/kvm exists — this PR adds a functional check that catches this class of issues with a clear error message pointing to the FAQ.

Test plan

  • Verified on EC2 c8i Amazon Linux 2023: minimal KVM test returns KVM_EXIT_UNKNOWN
  • Verified on EC2 c8i Ubuntu 24.04: BoxLite works correctly
  • Verified on GCP (Emerald Rapids): BoxLite works correctly
  • CI green on all platforms

@DorianZheng DorianZheng force-pushed the fix/kvm-health-check-nested-virt branch from 425a19e to d075d9a Compare April 1, 2026 06:54
Replace the two separate functions (check_virtualization_support +
check_kvm_health) with a single SystemCheck::run() that validates
all host requirements and returns a struct owning validated resources.

- SystemCheck::run() opens /dev/kvm, runs a HLT smoke test to catch
  broken nested KVM, and holds the fd for the process lifetime
- macOS: checks Apple Silicon + Hypervisor.framework
- Use Display formatting in default_runtime panic for readable errors
- Update /dev/kvm permission message: suggest newgrp kvm
- Document EC2 c8i + Amazon Linux 2023 known issue in FAQ
@DorianZheng DorianZheng force-pushed the fix/kvm-health-check-nested-virt branch from d075d9a to 8b56e4e Compare April 1, 2026 07:40
@DorianZheng DorianZheng merged commit 8b84a20 into main Apr 1, 2026
20 checks passed
@DorianZheng DorianZheng deleted the fix/kvm-health-check-nested-virt branch April 1, 2026 07:44
DorianZheng added a commit that referenced this pull request Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't
initialize vCPU registers before KVM_RUN. Without setting CS base=0 and
RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is
unmapped, causing KVM_EXIT_UNKNOWN on nested KVM.

This was misdiagnosed as "broken nested virtualization on Amazon Linux
2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and
Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is
properly initialized.

The fix:
- Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic
  FFI has ABI issues with KVM ioctls on some platforms
- Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2
- Verified on EC2 c8i with both AL2023 and Ubuntu 24.04

References:
- LWN "Using the KVM API": https://lwn.net/Articles/658511/
- dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world

Fixes the false positive from #417.
DorianZheng added a commit that referenced this pull request Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't
initialize vCPU registers before KVM_RUN. Without setting CS base=0 and
RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is
unmapped, causing KVM_EXIT_UNKNOWN on nested KVM.

This was misdiagnosed as "broken nested virtualization on Amazon Linux
2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and
Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is
properly initialized.

The fix:
- Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic
  FFI has ABI issues with KVM ioctls on some platforms
- Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2
- Verified on EC2 c8i with both AL2023 and Ubuntu 24.04

References:
- LWN "Using the KVM API": https://lwn.net/Articles/658511/
- dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world

Fixes the false positive from #417.
DorianZheng added a commit that referenced this pull request Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't
initialize vCPU registers before KVM_RUN. Without setting CS base=0 and
RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is
unmapped, causing KVM_EXIT_UNKNOWN on nested KVM.

This was misdiagnosed as "broken nested virtualization on Amazon Linux
2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and
Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is
properly initialized.

The fix:
- Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic
  FFI has ABI issues with KVM ioctls on some platforms
- Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2
- Verified on EC2 c8i with both AL2023 and Ubuntu 24.04

References:
- LWN "Using the KVM API": https://lwn.net/Articles/658511/
- dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world

Fixes the false positive from #417.
DorianZheng added a commit that referenced this pull request Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't
initialize vCPU registers before KVM_RUN. Without setting CS base=0 and
RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is
unmapped, causing KVM_EXIT_UNKNOWN on nested KVM.

This was misdiagnosed as "broken nested virtualization on Amazon Linux
2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and
Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is
properly initialized.

The fix:
- Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic
  FFI has ABI issues with KVM ioctls on some platforms
- Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2
- Verified on EC2 c8i with both AL2023 and Ubuntu 24.04

References:
- LWN "Using the KVM API": https://lwn.net/Articles/658511/
- dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world

Fixes the false positive from #417.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant