Skip to content

feat: add Intel client GPU reader (Arc/Iris/Xe) for Windows and Linux#245

Merged
inureyes merged 6 commits into
mainfrom
feat/issue-244-intel-client-gpu-reader
May 26, 2026
Merged

feat: add Intel client GPU reader (Arc/Iris/Xe) for Windows and Linux#245
inureyes merged 6 commits into
mainfrom
feat/issue-244-intel-client-gpu-reader

Conversation

@inureyes

@inureyes inureyes commented May 26, 2026

Copy link
Copy Markdown
Member

Summary

Adds the missing Intel client GPU reader (issue #244) — both discrete Intel Arc (A-series / B-series "Battlemage") and integrated graphics (Iris Xe, Xe-LPG, Arc iGPU on Core Ultra / Meteor Lake) — on Linux (sysfs / i915 / xe) and Windows (WMI). On Intel client hosts, get_gpu_info() now returns a populated GpuInfo with name, memory, frequency, temperature, and power instead of an empty vector, fixing SYCL / oneAPI accelerator selection downstream.

What changed

New device readers

  • src/device/readers/intel_gpu_linux.rs (+ intel_gpu_linux/tests.rs) — sysfs walker over /sys/class/drm/card* for vendor 0x8086 cards driven by i915 or xe. Distinguishes discrete vs integrated via the presence of mem_info_vram_total / tile0/vram0/total_bytes and rejects Habana / Gaudi (vendor 0x1da3) plus Intel-vendor non-GPU devices.
  • src/device/readers/intel_gpu_sysfs.rs — low-level sysfs I/O helpers split out so the main reader stays under the 500-line budget. Hosts memory / frequency / temperature / power readers with their own unit tests.
  • src/device/readers/intel_gpu_names.rs — PCI device-ID → marketing-name table covering Arc A-series, Battlemage, Tiger / Alder / Raptor / Meteor / Ice / Rocket / Arrow / Lunar Lake, with a generic Intel Graphics (device 0xXXXX) fallback.
  • src/device/readers/intel_gpu_windows.rs (+ intel_gpu_windows/tests.rs) — WMI reader mirroring amd_windows.rs. The is_intel_gpu_name() filter and classify_intel_variant() discriminator are free functions so they can be unit-tested without WMI; "Intel Display Audio" / "Intel(R) Management Engine Interface" / "Intel(R) Smart Sound" are correctly excluded.

Wiring

  • src/device/platform_detection.rshas_intel_gpu() detector + PlatformSnapshot.intel_gpu field. Linux uses /sys/class/drm with an lspci -n fallback; Windows uses WMI.
  • src/device/reader_factory.rs — registers IntelGpuReader on Linux and IntelWindowsGpuReader on Windows, gated on the detector.
  • src/device/readers/mod.rs — module declarations.
  • src/doctor/checks/platform.rs::check_hardware — surfaces "Intel GPU" in all-smi doctor alongside "Intel Gaudi".

Mock

  • src/mock/templates/intel_gpu.rs — Intel mock generator modelled on amd_gpu.rs. Default device is Intel Arc B580 12GB (Battlemage) so the mock reflects the current Intel client generation; A770 16GB and other SKUs work via --gpu-name. Adds all_smi_intel_driver_version as the Intel analogue of all_smi_amd_rocm_version.
  • src/mock/template_engine.rs, src/mock/server.rs, src/mock/templates/mod.rs, src/mock/constants.rs, src/mock/generator.rs, src/traits/mock_generator.rsPlatformType::Intel (already existed) is now routed end-to-end; MockPlatform::IntelGpu added to the trait-level enum; power cap for Intel platform is 250 W (vs the AMD-style generator's 700 W default).

Architecture & SYCL classification

Per maintainer guidance, the Intel reader now classifies the detected GPU's architecture (Alchemist / Battlemage / Xe-LPG / Xe-LPG+ / Iris Xe / older integrated) and surfaces it in GpuInfo.detail as:

  • detail["Architecture"] — e.g. "Alchemist (Xe-HPG, A-series)"
  • detail["SYCL Capable"]"Yes" / "No" / "Unknown"

This mirrors the INTEL_GPU_PATTERNS table in lablup/backend.ai-go's src-tauri/src/engine/gpu.rs so downstream consumers (Backend.AI's accelerator-selection layer, llama.cpp SYCL backend picker, etc.) can rely on all-smi as their single source of truth without re-implementing the same name-pattern table.

The classifier is exposed publicly via all_smi::device::readers::intel_gpu_names::{IntelArchitecture, classify_intel_architecture} on both Linux and Windows. The IntelArchitecture enum carries three helpers: is_sycl_capable() (bool, matches backend.ai-go's check_intel_sycl_support), label() ("Alchemist (Xe-HPG, A-series)" etc., used in the detail map), and sycl_capable_label() which distinguishes "Unknown" from "No" so consumers can tell "we know this GPU is not SYCL-capable" from "we couldn't classify this GPU at all".

The matcher uses pure-Rust string analysis with a load-bearing pattern order: older HD/UHD integrated first (so they don't accidentally match later Xe rules), then Battlemage (since Intel Arc B580 contains arc), then Alchemist (arc + a3/a5/a7), then Lunar Lake (handles both explicit lunarlake / lunar lake names and the Arc 140V/130V iGPU), then generic Xe-LPG (Meteor Lake's Intel Arc Graphics iGPU with no model number), then Iris Xe. The trickiest disambiguation — Intel Arc Graphics (Meteor Lake iGPU → XeLpg) vs Intel Arc A770 Graphics (discrete Alchemist → Alchemist) vs Intel Arc 140V Graphics (Lunar Lake iGPU → XeLpgPlus) — is covered by dedicated tests.

v1 scope limitations (documented in code + below)

  • Utilization is always reported as 0.0 with detail["Utilization"] = "Requires intel_gpu_top (perf engine counters)". Real engine-busy% requires reading engine/*/busy perf counters and tracking deltas across polling intervals — that's a follow-up so we don't fabricate a value.
  • Per-process GPU memory accounting is empty. /proc/<pid>/fdinfo/* DRM client parsing differs between i915 and xe and also needs delta tracking; deferred.
  • Level Zero integration is deferred. The Linux reader is sysfs-only and the Windows reader is WMI-only — no new external library dependencies. Adding Level Zero (libze_intel_gpu) for compute-capable metrics is the documented next step.
  • Windows metrics are limited to what WMI surfaces, matching the AMD-on-Windows reader: only AdapterRAM (with the same 4 GB / 32-bit caveat warning), driver version, video processor, status, DAC type. Utilization / temperature / fine-grained power need Level Zero or xpu-smi on Windows too.
  • No new external library dependencies were added to Cargo.toml.

Test plan

  • cargo check --lib --tests
  • cargo clippy --lib --tests -- -D warnings
  • cargo test --lib device::readers::intel_gpu_linux (14 passed, includes the Architecture / SYCL assertions for both discrete A770 and Meteor Lake iGPU)
  • cargo test --lib device::readers::intel_gpu_sysfs (10 passed)
  • cargo test --lib device::readers::intel_gpu_names (15 passed — every backend.ai-go fixture exercised: A-series → Alchemist, B-series + explicit Battlemage → Battlemage, Arc 140V/130V + LunarLake → XeLpgPlus, Intel Arc Graphics → XeLpg, Iris Xe → IrisXe, HD/UHD → OlderIntegrated/not-SYCL, unknown → Unknown)
  • cargo test --bin all-smi-mock-server --features mock intel_gpu (7 passed)
  • cargo test --lib device::platform_detection::introspection (2 passed, including the updated snapshot_is_default_friendly covering the new intel_gpu field)
  • Regression: cargo test --lib device::readers (114 passed) and cargo test --bin all-smi-mock-server --features mock (62 passed)

Closes #244

inureyes added 4 commits May 27, 2026 02:33
Adds the first half of the Intel client GPU implementation described in #244 — a sysfs-based reader for both discrete Intel Arc (A-series / B-series "Battlemage") and integrated graphics (Iris Xe, Xe-LPG, Arc iGPU on Core Ultra / Meteor Lake).

New modules:

- `src/device/readers/intel_gpu_linux.rs` — `IntelGpuReader` implementation. Walks `/sys/class/drm/card*` for vendor `0x8086` cards driven by `i915` or `xe`, with a fallback to `lspci -n` for containers without `/sys` access. Filters out Habana / Gaudi (vendor `0x1da3`) and Intel-vendor non-GPU devices by requiring the graphics class and driver. Exposes a `has_intel_client_gpu()` detector.
- `src/device/readers/intel_gpu_linux/tests.rs` — unit tests covering discovery, the Habana exclusion, variant classification (discrete vs integrated), and the `lspci -n` line filter.
- `src/device/readers/intel_gpu_sysfs.rs` — low-level sysfs I/O helpers (read_memory_bytes / read_frequency_mhz / read_temperature_celsius / read_power_watts). Hosts the `MemoryVariant` enum so the reader can stay under the 500-line budget.
- `src/device/readers/intel_gpu_names.rs` — PCI device ID → friendly marketing-name lookup for Arc A-series, Battlemage, Tiger / Alder / Raptor / Meteor / Ice / Rocket / Arrow / Lunar Lake. Unknown IDs fall back to `Intel Graphics (device 0xXXXX)`.

v1 scope limitations are documented in the module docstring and surfaced in the `GpuInfo.detail` map: utilization always reports `0.0` with a `"Utilization"` note pointing at `intel_gpu_top` (engine-busy delta tracking is follow-up work), integrated GPUs report `total_memory = 0` with a `"Memory"` note (no fabrication from system RAM), and per-process accounting is empty (follow-up).

Implementation is stateless and feature/platform-gated so musl builds still work and non-Intel hosts compile cleanly.

Refs #244
Adds the Windows-side companion to the Linux Intel client GPU reader (#244). Mirrors `src/device/readers/amd_windows.rs` line-for-line: same thread-local WMI connection cell, same `Win32_VideoController` query, same defensive `GpuInfo` template that pins NVIDIA-only fields to `None`. The only differences are the name filter and a discrete-vs-integrated discriminator.

`is_intel_gpu_name()` is factored out as a free function so it can be unit-tested without WMI: it requires both an "intel" substring AND a graphics-family token (`arc`, `iris`, `uhd graphics`, `hd graphics`, `xe graphics`, or `intel graphics`), so devices like "Intel Display Audio", "Intel(R) Management Engine Interface", and "Intel(R) Smart Sound" are excluded even though they contain "Intel" in their controller name.

`classify_intel_variant()` uses an Arc model-number heuristic — single letter (A/B/C/D) followed by 3+ digits — to separate discrete cards (A770, B580, …) from the Meteor Lake iGPU which ships as "Intel(R) Arc(TM) Graphics" with no number.

v1 metric coverage matches the AMD-on-Windows reader: only `total_memory` from `Win32_VideoController.AdapterRAM` (subject to the same 32-bit / 4GB WMI limitation, surfaced with the same warning text). Utilization, temperature, fine-grained power, and per-process accounting all return zero and the `detail["Note"]` field points the operator at Level Zero / `xpu-smi` for the deferred follow-up.

Refs #244
Wires the platform detection and reader-factory plumbing for the new Intel client GPU readers (#244) into the rest of the crate.

Changes:

- `src/device/platform_detection.rs` adds `has_intel_gpu()` — a `OnceLock`-cached detector that delegates to the Linux sysfs walker on Linux and the WMI query on Windows, falling back to `false` on every other platform. The `PlatformSnapshot` introspection struct gains an `intel_gpu` field so `all-smi doctor` and other reusers see the same detection result the factory uses, and the existing `snapshot_is_default_friendly` test asserts the new field defaults to `false`.
- `src/device/reader_factory.rs::get_gpu_readers` registers the new readers under the appropriate `os_type` arm, gated on `has_intel_gpu()`. The Linux arm intentionally avoids the `not(target_env = "musl")` constraint that gates AMD: the Intel sysfs reader has no glibc dependency and works on both glibc and musl targets.
- `src/doctor/checks/platform.rs::check_hardware` reports "Intel GPU" alongside the existing "Intel Gaudi" entry so the doctor's accelerator inventory line distinguishes the two product families.

Refs #244
Adds the mock template needed so `all-smi-mock-server --platform intel` produces a realistic Intel client GPU Prometheus payload (#244). Modelled on `src/mock/templates/amd_gpu.rs` with these differences:

- Default device name is **Intel Arc B580 12GB** (Battlemage, ~190W TDP).
- Memory table covers Intel client SKUs (4–16 GB) and yields `0` total memory for integrated names so the mock matches the production reader semantics — discrete Arc cards report dedicated VRAM, integrated GPUs surface `0` with the same caveats the live reader emits.
- Replaces the AMD `all_smi_amd_rocm_version` info metric with `all_smi_intel_driver_version`. There's no ROCm-equivalent runtime version string on the Intel side; the Windows-style Intel Graphics Driver version (`32.0.101.6299`) is the closest analogue, and `all_smi_gpu_info` carries it as a label.
- Random-draw caps are sized for client GPUs: power between 80 and 225 W for discrete Arc (capped at 250 W for the auto-generated AMD-style generator path), 2–15 W for integrated; temperature 40–78 °C; frequency 1100–2400 MHz.

Wiring:

- `MockPlatform::IntelGpu` added to the trait-level enum.
- `PlatformType::Intel` (already existed) now routes to the new `IntelGpuMockGenerator` through `template_engine::build_response_template`, `render_response`, `create_generator`, and `platform_type_to_mock_platform`.
- `mock::server` picks `DEFAULT_INTEL_GPU_NAME` when `--platform intel` is selected and `--gpu-name` is unset.
- `mock::generator::generate_gpus` caps Intel platform power at 250 W so the AMD-style generator path (used by `MockNode`) cannot synthesise datacenter-scale wattage on a consumer card.

The orchestrator note suggested `Intel Arc B580 12GB` (Battlemage) as the default rather than A770 16GB so the mock reflects the current Intel client generation. A770 users can override with `--gpu-name`.

Refs #244
@inureyes inureyes added status:review Under review type:enhancement New feature or request priority:medium Medium priority issue labels May 26, 2026
…ream consumers

Teach the Intel client GPU reader to classify the marketing name into a stable architecture family (Alchemist / Battlemage / Xe-LPG / Xe-LPG+ / Iris Xe / older integrated) and surface it in `GpuInfo.detail` as `Architecture` and `SYCL Capable` entries. This mirrors the `INTEL_GPU_PATTERNS` table that lablup/backend.ai-go currently maintains in `src-tauri/src/engine/gpu.rs`, so every downstream consumer of all-smi can rely on a single source of truth for "is this Intel GPU SYCL/oneAPI capable" without re-implementing the same name-pattern table.

What changed:

- `src/device/readers/intel_gpu_names.rs` — add the `IntelArchitecture` enum, the `classify_intel_architecture()` matcher, and the `label()` / `is_sycl_capable()` / `sycl_capable_label()` helpers. Pattern order is documented as load-bearing: older integrated checked first, then Battlemage (which contains the substring `arc`), then Alchemist (`arc` + `a3`/`a5`/`a7`), then Lunar Lake (handles both the `lunarlake` / `lunar lake` family names and the Arc 140V / 130V iGPU naming), then generic Xe-LPG (Meteor Lake `Intel Arc Graphics` iGPU), then Iris Xe.
- `src/device/readers/intel_gpu_linux.rs` — call the classifier when building the per-card static info and inject `Architecture` + `SYCL Capable` entries into the detail map.
- `src/device/readers/intel_gpu_windows.rs` — same wiring on the Windows / WMI side; extend `FAMILY_TOKENS` so marketing names like `Intel Battlemage Graphics`, `Intel LunarLake Graphics`, and `Intel Xe-LPG Graphics` (which omit the legacy `arc` / `iris` tokens) are not dropped by the name filter before classification can run.
- `src/device/readers/mod.rs` — make `intel_gpu_names` available on Windows as well as Linux. The module is pure-Rust string matching with no platform-specific deps; the existing `dead_code` allow attributes on the Linux-only PCI-ID lookup functions keep the Windows build warning-free.
- Tests — every backend.ai-go fixture is exercised: A770/A750/A580/A380/A310 → Alchemist, B-series + explicit Battlemage → Battlemage, Arc 140V / 130V + `LunarLake` → XeLpgPlus, `Intel Arc Graphics` (Meteor Lake iGPU) → XeLpg, `Iris Xe` → IrisXe, `HD/UHD Graphics` → OlderIntegrated and not SYCL-capable, unknown → Unknown. Adds regression tests for the trickiest disambiguation case (`Intel Arc Graphics` vs `Intel Arc A770 Graphics` vs `Intel Arc 140V Graphics`).
- Windows reader tests moved to a sibling `intel_gpu_windows/tests.rs` file mirroring the existing `intel_gpu_linux/tests.rs` split so the main reader file stays under the 500-line budget after extending the fixture coverage.

No new external dependencies. Pure-Rust string matching only.
@inureyes

Copy link
Copy Markdown
Member Author

Implementation Review Summary

Intent

PR #245 must enumerate Intel client GPUs (Arc A/B-series discrete + Iris Xe / Xe-LPG / Arc iGPU integrated) on both Linux (sysfs / i915 / xe) and Windows (WMI), with get_gpu_info() returning a fully-populated GpuInfo, an architecture/SYCL classification surface for downstream consumers, and integration through reader-factory + doctor + mock server — without leaking new external dependencies.

Findings Addressed

None — no CRITICAL/HIGH/MEDIUM findings were discovered that warranted delegation.

Remaining Items (LOW only)

  • src/mock/templates/intel_gpu.rs is 533 lines (LOW) — slightly over the 500-line goal the PR description sets for the Intel files. src/mock/templates/amd_gpu.rs (the mirror) sits at 479 lines, so this is a minor convention drift, but not a project-wide invariant (e.g. src/mock/templates/nvidia.rs is 1222 lines). Not worth a fix on its own.
  • src/device/readers/intel_gpu_linux.rs:63 (LOW) — MAX_GPU_POWER_WATTS = 750.0 with a comment that says "largest Arc Pro variants stay <250W". The number is defensive (matches AMD's headroom philosophy) but the comment misleads — either the cap should be tightened to ~300 W or the comment should justify the 750 W headroom explicitly.

Verification

  • All stated requirements implemented — get_gpu_info() populates GpuInfo on Linux (full metric set: name, memory, frequency, temperature, power) and Windows (WMI-limited subset matching amd_windows.rs semantics)
  • No placeholder/mock code remaining — utilization-zero is explicitly documented (detail["Utilization"]) and integrated-GPU memory-zero is explicitly documented (detail["Memory"]) rather than fabricated
  • Integrated into project code flow — has_intel_gpu() detector + reader_factory registration (Linux intel_gpu_linux::IntelGpuReader, Windows intel_gpu_windows::IntelWindowsGpuReader) + doctor wiring + mock server wiring (PlatformType::IntelIntelGpuMockGenerator)
  • Project conventions followed — mirrors amd_windows.rs on Windows, amd.rs patterns on Linux (OnceLock<DeviceStaticInfo>, MAX_DEVICES from common_cache, DetailBuilder), test files split into sibling directories matching the AMD pattern
  • Existing modules reused where applicable — DeviceStaticInfo, MAX_DEVICES, execute_command_default, get_hostname, WMI helpers all reused without reimplementation
  • No unintended structural changes — strictly additive; only 3 deletions in 17 files, all related to new module wiring
  • Tests pass:
    • cargo check --lib --tests → clean
    • cargo clippy --lib --tests -- -D warnings → clean
    • cargo test --lib device::readers::intel_gpu_names → 15/15
    • cargo test --lib device::readers::intel_gpu_linux → 14/14
    • cargo test --lib device::readers::intel_gpu_sysfs → 10/10
    • cargo test --bin all-smi-mock-server --features mock intel_gpu → 7/7
    • cargo test --lib device::platform_detection::introspection → 2/2
    • Regression cargo test --lib device::readers → 114/114, cargo test --bin all-smi-mock-server --features mock → 62/62
  • No new Cargo dependencies (verified git diff main..HEAD --name-only | grep -i cargo is empty)
  • Closes feat: Intel client GPU (Arc/Iris/Xe) reader for Windows and Linux #244 present in PR body
  • All 5 commits comply with project conventions (conventional-commits prefix, English, no AI attribution, no co-authorship)

Cross-platform gating audit (the highest-risk integration surface)

  • intel_gpu_linux.rs / intel_gpu_sysfs.rs are #[cfg(target_os = "linux")] — confirmed in mod.rs:62-67
  • intel_gpu_windows.rs is #[cfg(target_os = "windows")] — confirmed in mod.rs:69-70
  • intel_gpu_names.rs is #[cfg(any(target_os = "linux", target_os = "windows"))] — confirmed in mod.rs:64-65. Both readers use crate::device::readers::intel_gpu_names::classify_intel_architecture and both successfully drive the Architecture / SYCL Capable detail entries.
  • has_intel_gpu() is unconditional with internal #[cfg] branches and a false fallback for non-Linux/Windows — safe.
  • Reader factory gates Linux on target_os = "linux" (NOT on not(target_env = "musl"), intentionally — sysfs has no glibc dep), and Windows on target_os = "windows".

PR is implementation-complete and ready to advance to the security/performance review stage.

… GPU

Apply cargo fmt formatting to intel_gpu_linux.rs, intel_gpu_linux/tests.rs, intel_gpu_names.rs, and intel_gpu_windows/tests.rs (import grouping, long line breaking, matches! arm layout).

Add test::generate_leaves_no_unreplaced_placeholders to src/mock/templates/intel_gpu.rs: calls generate() with device_count=2 and asserts no {{ markers remain, guarding against render_intel_response mis-indexing bugs that would emit non-numeric values to Prometheus scrapers.

Update docs to enumerate Intel Arc / Iris Xe / Xe client GPU support:
- README.md: intro line, Linux platform list (with sysfs details), mock platform list, metrics list
- docs/ARCHITECTURE.md: executive summary platform list, new reader subsection before Intel Gaudi
- docs/man/all-smi.1: DESCRIPTION line and new SUPPORTED PLATFORMS entry

All narrow-scope tests green (15/15 intel_gpu_names, 14/14 intel_gpu_linux, 10/10 intel_gpu_sysfs, 8/8 mock server intel_gpu, 2/2 platform_detection).
@inureyes

Copy link
Copy Markdown
Member Author

PR Finalization Complete

Tests

Added one test to src/mock/templates/intel_gpu.rs:

  • generate_leaves_no_unreplaced_placeholders: calls generate() with device_count=2 and asserts no {{ markers remain in the output, guarding against render_intel_response mis-indexing bugs that would emit non-numeric values to Prometheus scrapers. This gap was real: the existing tests only exercised generate_template (which intentionally contains placeholders) and the internal get_gpu_memory_bytes logic, but never verified that the full render cycle produces clean output.

Architecture classifier gap analysis: all 8 pattern families from INTEL_GPU_PATTERNS in lablup/backend.ai-go are covered by the existing 15 tests. No additional classifier tests added.

All narrow-scope tests pass: 15/15 intel_gpu_names, 14/14 intel_gpu_linux, 10/10 intel_gpu_sysfs, 8/8 mock server intel_gpu (was 7, now 8), 2/2 platform_detection::introspection.

Documentation

Updated existing docs to list Intel Arc/Iris Xe/Xe client GPU support:

  • README.md: intro sentence, Linux platform list (with sysfs/hwmon detail bullet), mock platform list, Prometheus metrics list
  • docs/ARCHITECTURE.md: executive summary platform list, new Platform-Specific Implementations subsection before Intel Gaudi
  • docs/man/all-smi.1: DESCRIPTION line and new SUPPORTED PLATFORMS .TP entry

No new doc files created.

Lint / Format

Applied cargo fmt --all formatting changes to 4 files: intel_gpu_linux.rs (import grouping), intel_gpu_linux/tests.rs (long-assert line break), intel_gpu_names.rs (matches! arm layout, long-assert breaks), intel_gpu_windows/tests.rs (long-assert line break). cargo clippy --lib --tests -- -D warnings clean.

@inureyes inureyes added status:done Completed and removed status:review Under review labels May 26, 2026
@inureyes inureyes merged commit 4db028b into main May 26, 2026
4 checks passed
@inureyes inureyes deleted the feat/issue-244-intel-client-gpu-reader branch May 26, 2026 18:19
inureyes added a commit that referenced this pull request May 27, 2026
…249)

* feat(intel-gpu): compute Linux utilization from engine-busy counters

The Intel client GPU reader merged in #245 reported `utilization = 0.0` for every Arc/Iris/Xe card with a placeholder `detail["Utilization"] = "Requires intel_gpu_top..."`. Replace that with a real engine-busy percentage computed from the kernel's per-engine monotonic busy counters in sysfs.

What changed:

- New `device::readers::intel_gpu_engine` module owning the delta tracker, lock handling, and discovery walks. Splits cleanly into `intel_gpu_engine.rs` (state + refresh + reader-facing helpers) and `intel_gpu_engine/discovery.rs` (i915 flat/nested and xe flat/nested/multi-GT sysfs probing + class-name normalisation).
- `IntelGpuCard` gains an `engine_state: Mutex<EngineState>` field initialised at discovery time. Per-card mutex shape mirrors `AmdGpuDevice.vram_usage` including the poisoning-recovery flow (warn, replace with fresh `EngineState`, continue serving).
- `IntelGpuReader::get_gpu_info` now drives one engine refresh per card, clamps to `[0, 100]`, and folds the result into the produced `GpuInfo`. The primary utilization is `max(render, compute)`; per-class breakdown lands in `detail["Engine: <class>"]`. The first call per card is a seeding refresh that returns 0.0 and stamps the baseline; subsequent calls compute deltas.
- Kernels without engine counters surface a new explanatory `detail["Utilization"]` string (`"Engine counters unavailable (kernel does not expose engine busy)"`) instead of the old `intel_gpu_top` placeholder.

Tests added:

- 19 engine-module tests covering class-name normalisation, all four discovery layouts (i915 flat/nested, xe flat/nested/multi-GT), seeding semantics, delta computation, the `[0, 100]` clamp, counter-reset safety, multi-engine aggregation (render vs copy vs compute, multiple instances of the same class), graceful read-failure handling, and zero-wall-delta short-circuit. Wall clock injected via `EngineState::with_clock(now_fn)` so no real sleeps are involved.
- 2 reader-level tests confirming the seeding-call detail entry and the post-seeding `Engine: render` detail key + cleared `Utilization` note.
- Pre-existing reader test tightened to assert the new no-counter message rather than just `contains_key("Utilization")`.

v1 scope limitations (documented in module header and intentionally deferred):

- Sysfs only; the PMU `perf_event_open(2)` fallback used by `intel_gpu_top` on locked-down kernels is not shipped.
- First refresh per card returns `0.0` (seeding); real values appear from the second refresh onward.
- Primary `utilization` is `max(render, compute)`, not an aggregate across all engines.
- Per-process engine-time deltas (`/proc/<pid>/fdinfo`-driven) remain deferred — `get_process_info` still returns an empty Vec.

Verification (narrow scopes per the watchdog guard):

- `cargo check --lib --tests` clean
- `cargo clippy --lib --tests -- -D warnings` clean
- `cargo test --lib device::readers::intel_gpu_linux` -> 16/16 pass
- `cargo test --lib device::readers::intel_gpu_engine` -> 19/19 pass
- `cargo test --lib device::readers::intel_gpu_sysfs` -> 10/10 pass

All touched files stay under the 500-line cap (engine discovery split into `intel_gpu_engine/discovery.rs` and tests into `intel_gpu_engine/tests.rs`).

No public API or `Cargo.toml` changes.

Closes #246

* chore(intel-gpu): add mutex-poisoning test, normalize case variants, and doc updates

Add refresh_with_lock_recovers_from_poisoned_mutex test to cover the recovery path in refresh_with_lock: spawn a thread that panics while holding the lock, confirm the mutex is poisoned, then verify refresh_with_lock returns a valid readout without panicking and that the recovered EngineState is correctly reset. Note that std::sync::Mutex does not clear the poison flag via into_inner(), so the post-recovery lock is still technically poisoned and is acquired with unwrap_or_else.

Add COMPUTE and VIDEO_DECODE upper-case assertions to normalize_engine_class_handles_known_tokens, covering the all-caps xe path variants that the normalize_engine_class function handles via to_ascii_lowercase.

Update ARCHITECTURE.md Intel GPU section to describe the engine-counter module and its v1 constraints (sysfs-only, seeding call, max(render,compute) primary, PMU deferred). Update README.md Linux feature list to mention engine-busy utilization and the seeding semantics. Update manpage Intel GPU entry to include engine-busy utilization.

* fix(intel-gpu): narrow re-export to silence bin-target clippy

The pre-existing 'pub use discovery::normalize_engine_class' is only consumed by the unit-test module via the 'use super::*' glob. When clippy runs in CI without --lib filtering (the default 'cargo clippy -- -D warnings'), the binary build sees the import as unused. Narrow the re-export to discover_engine_counters only and have tests import normalize_engine_class directly from the discovery submodule, matching the existing pattern for split_class_instance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Medium priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Intel client GPU (Arc/Iris/Xe) reader for Windows and Linux

1 participant