feat(vmm): add KVM health check to detect broken nested virtualization#417
Merged
Conversation
425a19e to
d075d9a
Compare
Replace the two separate functions (check_virtualization_support + check_kvm_health) with a single SystemCheck::run() that validates all host requirements and returns a struct owning validated resources. - SystemCheck::run() opens /dev/kvm, runs a HLT smoke test to catch broken nested KVM, and holds the fd for the process lifetime - macOS: checks Apple Silicon + Hypervisor.framework - Use Display formatting in default_runtime panic for readable errors - Update /dev/kvm permission message: suggest newgrp kvm - Document EC2 c8i + Amazon Linux 2023 known issue in FAQ
d075d9a to
8b56e4e
Compare
DorianZheng
added a commit
that referenced
this pull request
Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't initialize vCPU registers before KVM_RUN. Without setting CS base=0 and RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is unmapped, causing KVM_EXIT_UNKNOWN on nested KVM. This was misdiagnosed as "broken nested virtualization on Amazon Linux 2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is properly initialized. The fix: - Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic FFI has ABI issues with KVM ioctls on some platforms - Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2 - Verified on EC2 c8i with both AL2023 and Ubuntu 24.04 References: - LWN "Using the KVM API": https://lwn.net/Articles/658511/ - dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world Fixes the false positive from #417.
DorianZheng
added a commit
that referenced
this pull request
Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't initialize vCPU registers before KVM_RUN. Without setting CS base=0 and RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is unmapped, causing KVM_EXIT_UNKNOWN on nested KVM. This was misdiagnosed as "broken nested virtualization on Amazon Linux 2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is properly initialized. The fix: - Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic FFI has ABI issues with KVM ioctls on some platforms - Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2 - Verified on EC2 c8i with both AL2023 and Ubuntu 24.04 References: - LWN "Using the KVM API": https://lwn.net/Articles/658511/ - dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world Fixes the false positive from #417.
DorianZheng
added a commit
that referenced
this pull request
Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't initialize vCPU registers before KVM_RUN. Without setting CS base=0 and RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is unmapped, causing KVM_EXIT_UNKNOWN on nested KVM. This was misdiagnosed as "broken nested virtualization on Amazon Linux 2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is properly initialized. The fix: - Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic FFI has ABI issues with KVM ioctls on some platforms - Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2 - Verified on EC2 c8i with both AL2023 and Ubuntu 24.04 References: - LWN "Using the KVM API": https://lwn.net/Articles/658511/ - dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world Fixes the false positive from #417.
DorianZheng
added a commit
that referenced
this pull request
Apr 2, 2026
The smoke test from #417 failed on EC2 c8i instances because it didn't initialize vCPU registers before KVM_RUN. Without setting CS base=0 and RIP=0, the CPU starts at the x86 reset vector (0xFFFFFFF0) which is unmapped, causing KVM_EXIT_UNKNOWN on nested KVM. This was misdiagnosed as "broken nested virtualization on Amazon Linux 2023 / EC2 c8i". In fact, both Amazon Linux 2023 (kernel 6.1) and Ubuntu 24.04 (kernel 6.17) work correctly on c8i when the vCPU is properly initialized. The fix: - Move smoke test to C (kvm_smoke.c) — Rust's libc::ioctl() variadic FFI has ABI issues with KVM ioctls on some platforms - Properly init vCPU state: CS base=0, selector=0, RIP=0, RFLAGS=0x2 - Verified on EC2 c8i with both AL2023 and Ubuntu 24.04 References: - LWN "Using the KVM API": https://lwn.net/Articles/658511/ - dpw/kvm-hello-world: https://github.com/dpw/kvm-hello-world Fixes the false positive from #417.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/dev/kvmpermission error to suggestnewgrp kvminstead of logout/loginRoot Cause
Amazon Linux 2023 (kernel 6.1) on EC2 c8i instances (Intel Granite Rapids) has broken nested KVM. A minimal test executing a single HLT instruction returns
KVM_EXIT_UNKNOWN(exit reason 0) instead ofKVM_EXIT_HLT(5). The vCPU cannot execute any guest code.Ubuntu 24.04 (kernel 6.17) on the same hardware works correctly.
The existing check only verified
/dev/kvmexists — this PR adds a functional check that catches this class of issues with a clear error message pointing to the FAQ.Test plan