feat: add Intel client GPU reader (Arc/Iris/Xe) for Windows and Linux#245
Conversation
Adds the first half of the Intel client GPU implementation described in #244 — a sysfs-based reader for both discrete Intel Arc (A-series / B-series "Battlemage") and integrated graphics (Iris Xe, Xe-LPG, Arc iGPU on Core Ultra / Meteor Lake). New modules: - `src/device/readers/intel_gpu_linux.rs` — `IntelGpuReader` implementation. Walks `/sys/class/drm/card*` for vendor `0x8086` cards driven by `i915` or `xe`, with a fallback to `lspci -n` for containers without `/sys` access. Filters out Habana / Gaudi (vendor `0x1da3`) and Intel-vendor non-GPU devices by requiring the graphics class and driver. Exposes a `has_intel_client_gpu()` detector. - `src/device/readers/intel_gpu_linux/tests.rs` — unit tests covering discovery, the Habana exclusion, variant classification (discrete vs integrated), and the `lspci -n` line filter. - `src/device/readers/intel_gpu_sysfs.rs` — low-level sysfs I/O helpers (read_memory_bytes / read_frequency_mhz / read_temperature_celsius / read_power_watts). Hosts the `MemoryVariant` enum so the reader can stay under the 500-line budget. - `src/device/readers/intel_gpu_names.rs` — PCI device ID → friendly marketing-name lookup for Arc A-series, Battlemage, Tiger / Alder / Raptor / Meteor / Ice / Rocket / Arrow / Lunar Lake. Unknown IDs fall back to `Intel Graphics (device 0xXXXX)`. v1 scope limitations are documented in the module docstring and surfaced in the `GpuInfo.detail` map: utilization always reports `0.0` with a `"Utilization"` note pointing at `intel_gpu_top` (engine-busy delta tracking is follow-up work), integrated GPUs report `total_memory = 0` with a `"Memory"` note (no fabrication from system RAM), and per-process accounting is empty (follow-up). Implementation is stateless and feature/platform-gated so musl builds still work and non-Intel hosts compile cleanly. Refs #244
Adds the Windows-side companion to the Linux Intel client GPU reader (#244). Mirrors `src/device/readers/amd_windows.rs` line-for-line: same thread-local WMI connection cell, same `Win32_VideoController` query, same defensive `GpuInfo` template that pins NVIDIA-only fields to `None`. The only differences are the name filter and a discrete-vs-integrated discriminator. `is_intel_gpu_name()` is factored out as a free function so it can be unit-tested without WMI: it requires both an "intel" substring AND a graphics-family token (`arc`, `iris`, `uhd graphics`, `hd graphics`, `xe graphics`, or `intel graphics`), so devices like "Intel Display Audio", "Intel(R) Management Engine Interface", and "Intel(R) Smart Sound" are excluded even though they contain "Intel" in their controller name. `classify_intel_variant()` uses an Arc model-number heuristic — single letter (A/B/C/D) followed by 3+ digits — to separate discrete cards (A770, B580, …) from the Meteor Lake iGPU which ships as "Intel(R) Arc(TM) Graphics" with no number. v1 metric coverage matches the AMD-on-Windows reader: only `total_memory` from `Win32_VideoController.AdapterRAM` (subject to the same 32-bit / 4GB WMI limitation, surfaced with the same warning text). Utilization, temperature, fine-grained power, and per-process accounting all return zero and the `detail["Note"]` field points the operator at Level Zero / `xpu-smi` for the deferred follow-up. Refs #244
Wires the platform detection and reader-factory plumbing for the new Intel client GPU readers (#244) into the rest of the crate. Changes: - `src/device/platform_detection.rs` adds `has_intel_gpu()` — a `OnceLock`-cached detector that delegates to the Linux sysfs walker on Linux and the WMI query on Windows, falling back to `false` on every other platform. The `PlatformSnapshot` introspection struct gains an `intel_gpu` field so `all-smi doctor` and other reusers see the same detection result the factory uses, and the existing `snapshot_is_default_friendly` test asserts the new field defaults to `false`. - `src/device/reader_factory.rs::get_gpu_readers` registers the new readers under the appropriate `os_type` arm, gated on `has_intel_gpu()`. The Linux arm intentionally avoids the `not(target_env = "musl")` constraint that gates AMD: the Intel sysfs reader has no glibc dependency and works on both glibc and musl targets. - `src/doctor/checks/platform.rs::check_hardware` reports "Intel GPU" alongside the existing "Intel Gaudi" entry so the doctor's accelerator inventory line distinguishes the two product families. Refs #244
Adds the mock template needed so `all-smi-mock-server --platform intel` produces a realistic Intel client GPU Prometheus payload (#244). Modelled on `src/mock/templates/amd_gpu.rs` with these differences: - Default device name is **Intel Arc B580 12GB** (Battlemage, ~190W TDP). - Memory table covers Intel client SKUs (4–16 GB) and yields `0` total memory for integrated names so the mock matches the production reader semantics — discrete Arc cards report dedicated VRAM, integrated GPUs surface `0` with the same caveats the live reader emits. - Replaces the AMD `all_smi_amd_rocm_version` info metric with `all_smi_intel_driver_version`. There's no ROCm-equivalent runtime version string on the Intel side; the Windows-style Intel Graphics Driver version (`32.0.101.6299`) is the closest analogue, and `all_smi_gpu_info` carries it as a label. - Random-draw caps are sized for client GPUs: power between 80 and 225 W for discrete Arc (capped at 250 W for the auto-generated AMD-style generator path), 2–15 W for integrated; temperature 40–78 °C; frequency 1100–2400 MHz. Wiring: - `MockPlatform::IntelGpu` added to the trait-level enum. - `PlatformType::Intel` (already existed) now routes to the new `IntelGpuMockGenerator` through `template_engine::build_response_template`, `render_response`, `create_generator`, and `platform_type_to_mock_platform`. - `mock::server` picks `DEFAULT_INTEL_GPU_NAME` when `--platform intel` is selected and `--gpu-name` is unset. - `mock::generator::generate_gpus` caps Intel platform power at 250 W so the AMD-style generator path (used by `MockNode`) cannot synthesise datacenter-scale wattage on a consumer card. The orchestrator note suggested `Intel Arc B580 12GB` (Battlemage) as the default rather than A770 16GB so the mock reflects the current Intel client generation. A770 users can override with `--gpu-name`. Refs #244
…ream consumers Teach the Intel client GPU reader to classify the marketing name into a stable architecture family (Alchemist / Battlemage / Xe-LPG / Xe-LPG+ / Iris Xe / older integrated) and surface it in `GpuInfo.detail` as `Architecture` and `SYCL Capable` entries. This mirrors the `INTEL_GPU_PATTERNS` table that lablup/backend.ai-go currently maintains in `src-tauri/src/engine/gpu.rs`, so every downstream consumer of all-smi can rely on a single source of truth for "is this Intel GPU SYCL/oneAPI capable" without re-implementing the same name-pattern table. What changed: - `src/device/readers/intel_gpu_names.rs` — add the `IntelArchitecture` enum, the `classify_intel_architecture()` matcher, and the `label()` / `is_sycl_capable()` / `sycl_capable_label()` helpers. Pattern order is documented as load-bearing: older integrated checked first, then Battlemage (which contains the substring `arc`), then Alchemist (`arc` + `a3`/`a5`/`a7`), then Lunar Lake (handles both the `lunarlake` / `lunar lake` family names and the Arc 140V / 130V iGPU naming), then generic Xe-LPG (Meteor Lake `Intel Arc Graphics` iGPU), then Iris Xe. - `src/device/readers/intel_gpu_linux.rs` — call the classifier when building the per-card static info and inject `Architecture` + `SYCL Capable` entries into the detail map. - `src/device/readers/intel_gpu_windows.rs` — same wiring on the Windows / WMI side; extend `FAMILY_TOKENS` so marketing names like `Intel Battlemage Graphics`, `Intel LunarLake Graphics`, and `Intel Xe-LPG Graphics` (which omit the legacy `arc` / `iris` tokens) are not dropped by the name filter before classification can run. - `src/device/readers/mod.rs` — make `intel_gpu_names` available on Windows as well as Linux. The module is pure-Rust string matching with no platform-specific deps; the existing `dead_code` allow attributes on the Linux-only PCI-ID lookup functions keep the Windows build warning-free. - Tests — every backend.ai-go fixture is exercised: A770/A750/A580/A380/A310 → Alchemist, B-series + explicit Battlemage → Battlemage, Arc 140V / 130V + `LunarLake` → XeLpgPlus, `Intel Arc Graphics` (Meteor Lake iGPU) → XeLpg, `Iris Xe` → IrisXe, `HD/UHD Graphics` → OlderIntegrated and not SYCL-capable, unknown → Unknown. Adds regression tests for the trickiest disambiguation case (`Intel Arc Graphics` vs `Intel Arc A770 Graphics` vs `Intel Arc 140V Graphics`). - Windows reader tests moved to a sibling `intel_gpu_windows/tests.rs` file mirroring the existing `intel_gpu_linux/tests.rs` split so the main reader file stays under the 500-line budget after extending the fixture coverage. No new external dependencies. Pure-Rust string matching only.
Implementation Review SummaryIntentPR #245 must enumerate Intel client GPUs (Arc A/B-series discrete + Iris Xe / Xe-LPG / Arc iGPU integrated) on both Linux (sysfs / i915 / xe) and Windows (WMI), with Findings AddressedNone — no CRITICAL/HIGH/MEDIUM findings were discovered that warranted delegation. Remaining Items (LOW only)
Verification
Cross-platform gating audit (the highest-risk integration surface)
PR is implementation-complete and ready to advance to the security/performance review stage. |
… GPU
Apply cargo fmt formatting to intel_gpu_linux.rs, intel_gpu_linux/tests.rs, intel_gpu_names.rs, and intel_gpu_windows/tests.rs (import grouping, long line breaking, matches! arm layout).
Add test::generate_leaves_no_unreplaced_placeholders to src/mock/templates/intel_gpu.rs: calls generate() with device_count=2 and asserts no {{ markers remain, guarding against render_intel_response mis-indexing bugs that would emit non-numeric values to Prometheus scrapers.
Update docs to enumerate Intel Arc / Iris Xe / Xe client GPU support:
- README.md: intro line, Linux platform list (with sysfs details), mock platform list, metrics list
- docs/ARCHITECTURE.md: executive summary platform list, new reader subsection before Intel Gaudi
- docs/man/all-smi.1: DESCRIPTION line and new SUPPORTED PLATFORMS entry
All narrow-scope tests green (15/15 intel_gpu_names, 14/14 intel_gpu_linux, 10/10 intel_gpu_sysfs, 8/8 mock server intel_gpu, 2/2 platform_detection).
PR Finalization CompleteTestsAdded one test to
Architecture classifier gap analysis: all 8 pattern families from All narrow-scope tests pass: 15/15 DocumentationUpdated existing docs to list Intel Arc/Iris Xe/Xe client GPU support:
No new doc files created. Lint / FormatApplied |
…249) * feat(intel-gpu): compute Linux utilization from engine-busy counters The Intel client GPU reader merged in #245 reported `utilization = 0.0` for every Arc/Iris/Xe card with a placeholder `detail["Utilization"] = "Requires intel_gpu_top..."`. Replace that with a real engine-busy percentage computed from the kernel's per-engine monotonic busy counters in sysfs. What changed: - New `device::readers::intel_gpu_engine` module owning the delta tracker, lock handling, and discovery walks. Splits cleanly into `intel_gpu_engine.rs` (state + refresh + reader-facing helpers) and `intel_gpu_engine/discovery.rs` (i915 flat/nested and xe flat/nested/multi-GT sysfs probing + class-name normalisation). - `IntelGpuCard` gains an `engine_state: Mutex<EngineState>` field initialised at discovery time. Per-card mutex shape mirrors `AmdGpuDevice.vram_usage` including the poisoning-recovery flow (warn, replace with fresh `EngineState`, continue serving). - `IntelGpuReader::get_gpu_info` now drives one engine refresh per card, clamps to `[0, 100]`, and folds the result into the produced `GpuInfo`. The primary utilization is `max(render, compute)`; per-class breakdown lands in `detail["Engine: <class>"]`. The first call per card is a seeding refresh that returns 0.0 and stamps the baseline; subsequent calls compute deltas. - Kernels without engine counters surface a new explanatory `detail["Utilization"]` string (`"Engine counters unavailable (kernel does not expose engine busy)"`) instead of the old `intel_gpu_top` placeholder. Tests added: - 19 engine-module tests covering class-name normalisation, all four discovery layouts (i915 flat/nested, xe flat/nested/multi-GT), seeding semantics, delta computation, the `[0, 100]` clamp, counter-reset safety, multi-engine aggregation (render vs copy vs compute, multiple instances of the same class), graceful read-failure handling, and zero-wall-delta short-circuit. Wall clock injected via `EngineState::with_clock(now_fn)` so no real sleeps are involved. - 2 reader-level tests confirming the seeding-call detail entry and the post-seeding `Engine: render` detail key + cleared `Utilization` note. - Pre-existing reader test tightened to assert the new no-counter message rather than just `contains_key("Utilization")`. v1 scope limitations (documented in module header and intentionally deferred): - Sysfs only; the PMU `perf_event_open(2)` fallback used by `intel_gpu_top` on locked-down kernels is not shipped. - First refresh per card returns `0.0` (seeding); real values appear from the second refresh onward. - Primary `utilization` is `max(render, compute)`, not an aggregate across all engines. - Per-process engine-time deltas (`/proc/<pid>/fdinfo`-driven) remain deferred — `get_process_info` still returns an empty Vec. Verification (narrow scopes per the watchdog guard): - `cargo check --lib --tests` clean - `cargo clippy --lib --tests -- -D warnings` clean - `cargo test --lib device::readers::intel_gpu_linux` -> 16/16 pass - `cargo test --lib device::readers::intel_gpu_engine` -> 19/19 pass - `cargo test --lib device::readers::intel_gpu_sysfs` -> 10/10 pass All touched files stay under the 500-line cap (engine discovery split into `intel_gpu_engine/discovery.rs` and tests into `intel_gpu_engine/tests.rs`). No public API or `Cargo.toml` changes. Closes #246 * chore(intel-gpu): add mutex-poisoning test, normalize case variants, and doc updates Add refresh_with_lock_recovers_from_poisoned_mutex test to cover the recovery path in refresh_with_lock: spawn a thread that panics while holding the lock, confirm the mutex is poisoned, then verify refresh_with_lock returns a valid readout without panicking and that the recovered EngineState is correctly reset. Note that std::sync::Mutex does not clear the poison flag via into_inner(), so the post-recovery lock is still technically poisoned and is acquired with unwrap_or_else. Add COMPUTE and VIDEO_DECODE upper-case assertions to normalize_engine_class_handles_known_tokens, covering the all-caps xe path variants that the normalize_engine_class function handles via to_ascii_lowercase. Update ARCHITECTURE.md Intel GPU section to describe the engine-counter module and its v1 constraints (sysfs-only, seeding call, max(render,compute) primary, PMU deferred). Update README.md Linux feature list to mention engine-busy utilization and the seeding semantics. Update manpage Intel GPU entry to include engine-busy utilization. * fix(intel-gpu): narrow re-export to silence bin-target clippy The pre-existing 'pub use discovery::normalize_engine_class' is only consumed by the unit-test module via the 'use super::*' glob. When clippy runs in CI without --lib filtering (the default 'cargo clippy -- -D warnings'), the binary build sees the import as unused. Narrow the re-export to discover_engine_counters only and have tests import normalize_engine_class directly from the discovery submodule, matching the existing pattern for split_class_instance.
Summary
Adds the missing Intel client GPU reader (issue #244) — both discrete Intel Arc (A-series / B-series "Battlemage") and integrated graphics (Iris Xe, Xe-LPG, Arc iGPU on Core Ultra / Meteor Lake) — on Linux (sysfs / i915 / xe) and Windows (WMI). On Intel client hosts,
get_gpu_info()now returns a populatedGpuInfowith name, memory, frequency, temperature, and power instead of an empty vector, fixing SYCL / oneAPI accelerator selection downstream.What changed
New device readers
src/device/readers/intel_gpu_linux.rs(+intel_gpu_linux/tests.rs) — sysfs walker over/sys/class/drm/card*for vendor0x8086cards driven byi915orxe. Distinguishes discrete vs integrated via the presence ofmem_info_vram_total/tile0/vram0/total_bytesand rejects Habana / Gaudi (vendor0x1da3) plus Intel-vendor non-GPU devices.src/device/readers/intel_gpu_sysfs.rs— low-level sysfs I/O helpers split out so the main reader stays under the 500-line budget. Hosts memory / frequency / temperature / power readers with their own unit tests.src/device/readers/intel_gpu_names.rs— PCI device-ID → marketing-name table covering Arc A-series, Battlemage, Tiger / Alder / Raptor / Meteor / Ice / Rocket / Arrow / Lunar Lake, with a genericIntel Graphics (device 0xXXXX)fallback.src/device/readers/intel_gpu_windows.rs(+intel_gpu_windows/tests.rs) — WMI reader mirroringamd_windows.rs. Theis_intel_gpu_name()filter andclassify_intel_variant()discriminator are free functions so they can be unit-tested without WMI; "Intel Display Audio" / "Intel(R) Management Engine Interface" / "Intel(R) Smart Sound" are correctly excluded.Wiring
src/device/platform_detection.rs—has_intel_gpu()detector +PlatformSnapshot.intel_gpufield. Linux uses/sys/class/drmwith anlspci -nfallback; Windows uses WMI.src/device/reader_factory.rs— registersIntelGpuReaderon Linux andIntelWindowsGpuReaderon Windows, gated on the detector.src/device/readers/mod.rs— module declarations.src/doctor/checks/platform.rs::check_hardware— surfaces "Intel GPU" inall-smi doctoralongside "Intel Gaudi".Mock
src/mock/templates/intel_gpu.rs— Intel mock generator modelled onamd_gpu.rs. Default device is Intel Arc B580 12GB (Battlemage) so the mock reflects the current Intel client generation; A770 16GB and other SKUs work via--gpu-name. Addsall_smi_intel_driver_versionas the Intel analogue ofall_smi_amd_rocm_version.src/mock/template_engine.rs,src/mock/server.rs,src/mock/templates/mod.rs,src/mock/constants.rs,src/mock/generator.rs,src/traits/mock_generator.rs—PlatformType::Intel(already existed) is now routed end-to-end;MockPlatform::IntelGpuadded to the trait-level enum; power cap for Intel platform is 250 W (vs the AMD-style generator's 700 W default).Architecture & SYCL classification
Per maintainer guidance, the Intel reader now classifies the detected GPU's architecture (Alchemist / Battlemage / Xe-LPG / Xe-LPG+ / Iris Xe / older integrated) and surfaces it in
GpuInfo.detailas:detail["Architecture"]— e.g."Alchemist (Xe-HPG, A-series)"detail["SYCL Capable"]—"Yes"/"No"/"Unknown"This mirrors the
INTEL_GPU_PATTERNStable in lablup/backend.ai-go'ssrc-tauri/src/engine/gpu.rsso downstream consumers (Backend.AI's accelerator-selection layer, llama.cpp SYCL backend picker, etc.) can rely on all-smi as their single source of truth without re-implementing the same name-pattern table.The classifier is exposed publicly via
all_smi::device::readers::intel_gpu_names::{IntelArchitecture, classify_intel_architecture}on both Linux and Windows. TheIntelArchitectureenum carries three helpers:is_sycl_capable()(bool, matches backend.ai-go'scheck_intel_sycl_support),label()("Alchemist (Xe-HPG, A-series)"etc., used in the detail map), andsycl_capable_label()which distinguishes"Unknown"from"No"so consumers can tell "we know this GPU is not SYCL-capable" from "we couldn't classify this GPU at all".The matcher uses pure-Rust string analysis with a load-bearing pattern order: older HD/UHD integrated first (so they don't accidentally match later Xe rules), then Battlemage (since
Intel Arc B580containsarc), then Alchemist (arc+a3/a5/a7), then Lunar Lake (handles both explicitlunarlake/lunar lakenames and the Arc 140V/130V iGPU), then generic Xe-LPG (Meteor Lake'sIntel Arc GraphicsiGPU with no model number), then Iris Xe. The trickiest disambiguation —Intel Arc Graphics(Meteor Lake iGPU →XeLpg) vsIntel Arc A770 Graphics(discrete Alchemist →Alchemist) vsIntel Arc 140V Graphics(Lunar Lake iGPU →XeLpgPlus) — is covered by dedicated tests.v1 scope limitations (documented in code + below)
0.0withdetail["Utilization"] = "Requires intel_gpu_top (perf engine counters)". Real engine-busy% requires readingengine/*/busyperf counters and tracking deltas across polling intervals — that's a follow-up so we don't fabricate a value./proc/<pid>/fdinfo/*DRM client parsing differs betweeni915andxeand also needs delta tracking; deferred.libze_intel_gpu) for compute-capable metrics is the documented next step.AdapterRAM(with the same 4 GB / 32-bit caveat warning), driver version, video processor, status, DAC type. Utilization / temperature / fine-grained power need Level Zero orxpu-smion Windows too.Cargo.toml.Test plan
cargo check --lib --testscargo clippy --lib --tests -- -D warningscargo test --lib device::readers::intel_gpu_linux(14 passed, includes the Architecture / SYCL assertions for both discrete A770 and Meteor Lake iGPU)cargo test --lib device::readers::intel_gpu_sysfs(10 passed)cargo test --lib device::readers::intel_gpu_names(15 passed — every backend.ai-go fixture exercised: A-series → Alchemist, B-series + explicit Battlemage → Battlemage, Arc 140V/130V + LunarLake → XeLpgPlus, Intel Arc Graphics → XeLpg, Iris Xe → IrisXe, HD/UHD → OlderIntegrated/not-SYCL, unknown → Unknown)cargo test --bin all-smi-mock-server --features mock intel_gpu(7 passed)cargo test --lib device::platform_detection::introspection(2 passed, including the updatedsnapshot_is_default_friendlycovering the newintel_gpufield)cargo test --lib device::readers(114 passed) andcargo test --bin all-smi-mock-server --features mock(62 passed)Closes #244