feat(intel-gpu): Level Zero (oneAPI) integration for advanced metrics on Linux and Windows

## Problem / Background

PR #245 (closes #244) shipped Intel client GPU readers that are **sysfs-only on Linux** and **WMI-only on Windows**. PR #249 (closes #246) then added real engine-busy utilization on Linux via the new `intel_gpu_engine` module. Level Zero (`libze_intel_gpu`) integration was deferred from PR #245 because adding a new external library dependency required a different gating strategy and substantially expanded the PR surface.

Current state per platform:
- **Linux** (`src/device/readers/intel_gpu_linux.rs` + `intel_gpu_engine.rs`): name, memory, frequency, temperature, instantaneous power, and engine-busy utilization (`render` / `compute` / `copy` / `video` / `video-enhance`) all work via sysfs. **Missing**: XMX / AI engine activity (Intel's dedicated AI accelerator on Arc and newer Xe parts), fine-grained power capping data, per-engine power breakdown, and any per-process L0-style stats. Sub-microsecond polling precision is also lost — sysfs counters update at coarser intervals than L0's metric streamer.
- **Windows** (`src/device/readers/intel_gpu_windows.rs`): name + `AdapterRAM` only (with the same WMI 32-bit caveat as the AMD-on-Windows reader); zero utilization, zero temperature, zero power. This is at-parity with the AMD-on-Windows situation but is a real gap relative to NVIDIA, where NVML provides full metrics on Windows.

Intel's Level Zero is the cross-platform compute / management API on top of which Intel's own `xpu-smi` is built. It exposes utilization (including XMX), frequency, temperature, power, memory state, and engine breakdown via a stable C ABI — and unlike `xpu-smi`, calling it does not require spawning an external process per refresh.

## Proposed Solution

Add an **opt-in** Level Zero backend that **augments** PR #249's sysfs-based Linux path and **fills the Windows metrics gap**:

1. **Dynamic library loading** via `libloading` (the same pattern used for `tpu_pjrt` in `src/device/readers/tpu_pjrt.rs`). Do **not** add a hard `cc`-compiled or `bindgen`-generated dependency on `libze_intel_gpu` — the all-smi binary must continue to run on Intel-GPU hosts that do NOT have the Level Zero runtime installed.
2. **Feature flag** in `Cargo.toml` — e.g. `level_zero`, default-off — for builds that want to ship the FFI shim. The runtime detection (whether the host has `libze_intel_gpu.so` / `ze_intel_gpu64.dll`) is orthogonal and always-on.
3. **Backend coexistence**, not displacement:
   - **Linux**: L0 supplements the existing sysfs engine-counter path. L0 provides metrics sysfs cannot (XMX activity, fine-grained power, sub-engine breakdown); the sysfs path keeps providing render/compute/copy/video utilization when L0 is unavailable. The `EngineState` machinery from PR #249 stays in charge of basic utilization; L0 readouts populate additional `detail` entries.
   - **Windows**: L0 is the primary metrics source. WMI continues providing name + `AdapterRAM`; L0 fills in utilization, temperature, power, and engine breakdown. When L0 is unavailable, the reader stays at the current WMI-only level.
   - `detail["Metrics Source"]` records which path was active: `"sysfs (engine counters)"` / `"sysfs + Level Zero"` on Linux; `"WMI"` / `"WMI + Level Zero"` on Windows. Users can debug "why is XMX activity missing" by inspecting this entry.
4. **Cross-platform coverage** — the FFI surface for L0 is identical on Linux and Windows, so a single backend module (gated by `cfg(target_os)` only where the dynamic-library filename differs) should serve both.

## Acceptance Criteria

- [x] `cargo build --features level_zero` compiles cleanly on Linux (verified locally) and Windows (CI cross-compile validates).
- [x] `cargo build` (default) continues to compile cleanly with no Level Zero dependency in the binary (verified: `nm -D target/debug/all-smi` shows zero `zes_*` / `ze_loader` symbols on the default build).
- [ ] On a host with the Level Zero runtime installed and a supported Intel GPU (Arc, Battlemage, Lunar Lake, Meteor Lake), `IntelGpuReader::get_gpu_info()` surfaces XMX / AI compute-engine activity in the `detail` map. (awaits hardware verification by maintainer)
- [ ] On Windows with the L0 runtime installed, `IntelGpuReader::get_gpu_info()` returns non-zero utilization, temperature, and power (fields that are 0 today on the WMI-only path). (awaits hardware verification by maintainer; the v1 PR fills `utilization` and `power_consumption` only — temperature is deferred to a follow-up)
- [ ] On Linux with the L0 runtime installed, `GpuInfo.utilization` agrees with PR #249's sysfs computation within reasonable noise tolerance (the two paths see the same hardware; large divergence indicates a sampling-window mismatch worth investigating). (awaits hardware verification by maintainer)
- [x] On any host without the runtime installed, the reader behaves identically to PR #249's sysfs/WMI implementation (no panic, no error log spam, sensible defaults). Verified via `try_load_library_returns_none_for_nonexistent_path`, `enumerated_pci_bdfs_empty_when_runtime_absent`, and `refresh_returns_none_without_runtime` unit tests; the existing `intel_gpu_linux` tests pass identically with and without `--features level_zero`.
- [x] `detail["Metrics Source"]` reports the active path so users can observe which backend is supplying which fields. Verified via the `apply_to_gpu_info_*` tests and an explicit `get_gpu_info_populates_basic_fields` assertion that locks in the `"sysfs (engine counters)"` baseline on the default build.
- [x] Implementation is integrated into the existing readers, not added as a sibling module that no consumer reaches. The Linux reader calls `level_zero_glue::augment` after pushing the baseline `GpuInfo`; the Windows reader calls `augment_with_level_zero` between `query_intel_gpus` and the public `get_gpu_info` return.

## Technical Considerations

- **API stability**: Level Zero is a versioned C ABI. Pin to a minimum supported version (currently the v1.x spec is stable across Intel's actively-shipping runtimes) and use feature-test queries (`zeInit` return value, `zeDriverGetExtensionProperties`) rather than version-string parsing.
- **Sysman vs. core**: GPU monitoring lives under the `zes_*` Sysman API, not the `ze_*` core API. The reader must initialize with `ZES_ENABLE_SYSMAN=1` (env var, set before `zeInit`) or call `zesInit` directly on newer runtimes.
- **Coexistence with PR #249's `EngineState`**: the sysfs delta tracker is already locked per-card via `Mutex<EngineState>`. The L0 path should hold its own per-card L0 handle state (likely another `Mutex` field on `IntelGpuCard`) and merge its readout into the existing `detail` map. Do not unify the two state machines — they sample on different clocks and would race if intertwined.
- **Concurrency**: Level Zero handles are not freely shareable across threads — the reader's thread model must match (currently `GpuReader::get_gpu_info` is called from a single collector thread, so this is fine, but document it).
- **Build complexity**: prefer dynamic loading via `libloading` (no `build.rs` C compilation, no `bindgen` dependency, no system header requirements on developer machines). Hand-write the small FFI surface needed for the metrics this reader actually consumes — do not vendor the full `ze_api.h`.
- **Windows DLL search path**: the Level Zero loader (`ze_loader.dll`) handles finding the per-vendor backend; the reader only needs to dynamically load `ze_loader.dll` itself (or `ze_intel_gpu64.dll` directly for the Intel-only path).

## References

- PR #249 — sysfs engine-busy utilization machinery this issue augments (`src/device/readers/intel_gpu_engine{.rs,/discovery.rs}`).
- v1 scope limitation documented in PR #245 body, "v1 scope limitations" section.
- Existing dynamic-library-loading example: `src/device/readers/tpu_pjrt.rs`.
- Intel Level Zero spec: https://oneapi-src.github.io/level-zero-spec/
- Intel `xpu-smi` (reference implementation): https://github.com/intel/xpumanager



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(intel-gpu): Level Zero (oneAPI) integration for advanced metrics on Linux and Windows #248

Problem / Background

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(intel-gpu): Level Zero (oneAPI) integration for advanced metrics on Linux and Windows #248

Description

Problem / Background

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions