You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #245 (closes #244) shipped Intel client GPU readers that are sysfs-only on Linux and WMI-only on Windows. PR #249 (closes #246) then added real engine-busy utilization on Linux via the new intel_gpu_engine module. Level Zero (libze_intel_gpu) integration was deferred from PR #245 because adding a new external library dependency required a different gating strategy and substantially expanded the PR surface.
Current state per platform:
Linux (src/device/readers/intel_gpu_linux.rs + intel_gpu_engine.rs): name, memory, frequency, temperature, instantaneous power, and engine-busy utilization (render / compute / copy / video / video-enhance) all work via sysfs. Missing: XMX / AI engine activity (Intel's dedicated AI accelerator on Arc and newer Xe parts), fine-grained power capping data, per-engine power breakdown, and any per-process L0-style stats. Sub-microsecond polling precision is also lost — sysfs counters update at coarser intervals than L0's metric streamer.
Windows (src/device/readers/intel_gpu_windows.rs): name + AdapterRAM only (with the same WMI 32-bit caveat as the AMD-on-Windows reader); zero utilization, zero temperature, zero power. This is at-parity with the AMD-on-Windows situation but is a real gap relative to NVIDIA, where NVML provides full metrics on Windows.
Intel's Level Zero is the cross-platform compute / management API on top of which Intel's own xpu-smi is built. It exposes utilization (including XMX), frequency, temperature, power, memory state, and engine breakdown via a stable C ABI — and unlike xpu-smi, calling it does not require spawning an external process per refresh.
Proposed Solution
Add an opt-in Level Zero backend that augments PR #249's sysfs-based Linux path and fills the Windows metrics gap:
Dynamic library loading via libloading (the same pattern used for tpu_pjrt in src/device/readers/tpu_pjrt.rs). Do not add a hard cc-compiled or bindgen-generated dependency on libze_intel_gpu — the all-smi binary must continue to run on Intel-GPU hosts that do NOT have the Level Zero runtime installed.
Feature flag in Cargo.toml — e.g. level_zero, default-off — for builds that want to ship the FFI shim. The runtime detection (whether the host has libze_intel_gpu.so / ze_intel_gpu64.dll) is orthogonal and always-on.
Backend coexistence, not displacement:
Linux: L0 supplements the existing sysfs engine-counter path. L0 provides metrics sysfs cannot (XMX activity, fine-grained power, sub-engine breakdown); the sysfs path keeps providing render/compute/copy/video utilization when L0 is unavailable. The EngineState machinery from PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249 stays in charge of basic utilization; L0 readouts populate additional detail entries.
Windows: L0 is the primary metrics source. WMI continues providing name + AdapterRAM; L0 fills in utilization, temperature, power, and engine breakdown. When L0 is unavailable, the reader stays at the current WMI-only level.
detail["Metrics Source"] records which path was active: "sysfs (engine counters)" / "sysfs + Level Zero" on Linux; "WMI" / "WMI + Level Zero" on Windows. Users can debug "why is XMX activity missing" by inspecting this entry.
Cross-platform coverage — the FFI surface for L0 is identical on Linux and Windows, so a single backend module (gated by cfg(target_os) only where the dynamic-library filename differs) should serve both.
Acceptance Criteria
cargo build --features level_zero compiles cleanly on Linux (verified locally) and Windows (CI cross-compile validates).
cargo build (default) continues to compile cleanly with no Level Zero dependency in the binary (verified: nm -D target/debug/all-smi shows zero zes_* / ze_loader symbols on the default build).
On a host with the Level Zero runtime installed and a supported Intel GPU (Arc, Battlemage, Lunar Lake, Meteor Lake), IntelGpuReader::get_gpu_info() surfaces XMX / AI compute-engine activity in the detail map. (awaits hardware verification by maintainer)
On Windows with the L0 runtime installed, IntelGpuReader::get_gpu_info() returns non-zero utilization, temperature, and power (fields that are 0 today on the WMI-only path). (awaits hardware verification by maintainer; the v1 PR fills utilization and power_consumption only — temperature is deferred to a follow-up)
On Linux with the L0 runtime installed, GpuInfo.utilization agrees with PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249's sysfs computation within reasonable noise tolerance (the two paths see the same hardware; large divergence indicates a sampling-window mismatch worth investigating). (awaits hardware verification by maintainer)
On any host without the runtime installed, the reader behaves identically to PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249's sysfs/WMI implementation (no panic, no error log spam, sensible defaults). Verified via try_load_library_returns_none_for_nonexistent_path, enumerated_pci_bdfs_empty_when_runtime_absent, and refresh_returns_none_without_runtime unit tests; the existing intel_gpu_linux tests pass identically with and without --features level_zero.
detail["Metrics Source"] reports the active path so users can observe which backend is supplying which fields. Verified via the apply_to_gpu_info_* tests and an explicit get_gpu_info_populates_basic_fields assertion that locks in the "sysfs (engine counters)" baseline on the default build.
Implementation is integrated into the existing readers, not added as a sibling module that no consumer reaches. The Linux reader calls level_zero_glue::augment after pushing the baseline GpuInfo; the Windows reader calls augment_with_level_zero between query_intel_gpus and the public get_gpu_info return.
Technical Considerations
API stability: Level Zero is a versioned C ABI. Pin to a minimum supported version (currently the v1.x spec is stable across Intel's actively-shipping runtimes) and use feature-test queries (zeInit return value, zeDriverGetExtensionProperties) rather than version-string parsing.
Sysman vs. core: GPU monitoring lives under the zes_* Sysman API, not the ze_* core API. The reader must initialize with ZES_ENABLE_SYSMAN=1 (env var, set before zeInit) or call zesInit directly on newer runtimes.
Coexistence with PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249's EngineState: the sysfs delta tracker is already locked per-card via Mutex<EngineState>. The L0 path should hold its own per-card L0 handle state (likely another Mutex field on IntelGpuCard) and merge its readout into the existing detail map. Do not unify the two state machines — they sample on different clocks and would race if intertwined.
Concurrency: Level Zero handles are not freely shareable across threads — the reader's thread model must match (currently GpuReader::get_gpu_info is called from a single collector thread, so this is fine, but document it).
Build complexity: prefer dynamic loading via libloading (no build.rs C compilation, no bindgen dependency, no system header requirements on developer machines). Hand-write the small FFI surface needed for the metrics this reader actually consumes — do not vendor the full ze_api.h.
Windows DLL search path: the Level Zero loader (ze_loader.dll) handles finding the per-vendor backend; the reader only needs to dynamically load ze_loader.dll itself (or ze_intel_gpu64.dll directly for the Intel-only path).
Problem / Background
PR #245 (closes #244) shipped Intel client GPU readers that are sysfs-only on Linux and WMI-only on Windows. PR #249 (closes #246) then added real engine-busy utilization on Linux via the new
intel_gpu_enginemodule. Level Zero (libze_intel_gpu) integration was deferred from PR #245 because adding a new external library dependency required a different gating strategy and substantially expanded the PR surface.Current state per platform:
src/device/readers/intel_gpu_linux.rs+intel_gpu_engine.rs): name, memory, frequency, temperature, instantaneous power, and engine-busy utilization (render/compute/copy/video/video-enhance) all work via sysfs. Missing: XMX / AI engine activity (Intel's dedicated AI accelerator on Arc and newer Xe parts), fine-grained power capping data, per-engine power breakdown, and any per-process L0-style stats. Sub-microsecond polling precision is also lost — sysfs counters update at coarser intervals than L0's metric streamer.src/device/readers/intel_gpu_windows.rs): name +AdapterRAMonly (with the same WMI 32-bit caveat as the AMD-on-Windows reader); zero utilization, zero temperature, zero power. This is at-parity with the AMD-on-Windows situation but is a real gap relative to NVIDIA, where NVML provides full metrics on Windows.Intel's Level Zero is the cross-platform compute / management API on top of which Intel's own
xpu-smiis built. It exposes utilization (including XMX), frequency, temperature, power, memory state, and engine breakdown via a stable C ABI — and unlikexpu-smi, calling it does not require spawning an external process per refresh.Proposed Solution
Add an opt-in Level Zero backend that augments PR #249's sysfs-based Linux path and fills the Windows metrics gap:
libloading(the same pattern used fortpu_pjrtinsrc/device/readers/tpu_pjrt.rs). Do not add a hardcc-compiled orbindgen-generated dependency onlibze_intel_gpu— the all-smi binary must continue to run on Intel-GPU hosts that do NOT have the Level Zero runtime installed.Cargo.toml— e.g.level_zero, default-off — for builds that want to ship the FFI shim. The runtime detection (whether the host haslibze_intel_gpu.so/ze_intel_gpu64.dll) is orthogonal and always-on.EngineStatemachinery from PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249 stays in charge of basic utilization; L0 readouts populate additionaldetailentries.AdapterRAM; L0 fills in utilization, temperature, power, and engine breakdown. When L0 is unavailable, the reader stays at the current WMI-only level.detail["Metrics Source"]records which path was active:"sysfs (engine counters)"/"sysfs + Level Zero"on Linux;"WMI"/"WMI + Level Zero"on Windows. Users can debug "why is XMX activity missing" by inspecting this entry.cfg(target_os)only where the dynamic-library filename differs) should serve both.Acceptance Criteria
cargo build --features level_zerocompiles cleanly on Linux (verified locally) and Windows (CI cross-compile validates).cargo build(default) continues to compile cleanly with no Level Zero dependency in the binary (verified:nm -D target/debug/all-smishows zerozes_*/ze_loadersymbols on the default build).IntelGpuReader::get_gpu_info()surfaces XMX / AI compute-engine activity in thedetailmap. (awaits hardware verification by maintainer)IntelGpuReader::get_gpu_info()returns non-zero utilization, temperature, and power (fields that are 0 today on the WMI-only path). (awaits hardware verification by maintainer; the v1 PR fillsutilizationandpower_consumptiononly — temperature is deferred to a follow-up)GpuInfo.utilizationagrees with PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249's sysfs computation within reasonable noise tolerance (the two paths see the same hardware; large divergence indicates a sampling-window mismatch worth investigating). (awaits hardware verification by maintainer)try_load_library_returns_none_for_nonexistent_path,enumerated_pci_bdfs_empty_when_runtime_absent, andrefresh_returns_none_without_runtimeunit tests; the existingintel_gpu_linuxtests pass identically with and without--features level_zero.detail["Metrics Source"]reports the active path so users can observe which backend is supplying which fields. Verified via theapply_to_gpu_info_*tests and an explicitget_gpu_info_populates_basic_fieldsassertion that locks in the"sysfs (engine counters)"baseline on the default build.level_zero_glue::augmentafter pushing the baselineGpuInfo; the Windows reader callsaugment_with_level_zerobetweenquery_intel_gpusand the publicget_gpu_inforeturn.Technical Considerations
zeInitreturn value,zeDriverGetExtensionProperties) rather than version-string parsing.zes_*Sysman API, not theze_*core API. The reader must initialize withZES_ENABLE_SYSMAN=1(env var, set beforezeInit) or callzesInitdirectly on newer runtimes.EngineState: the sysfs delta tracker is already locked per-card viaMutex<EngineState>. The L0 path should hold its own per-card L0 handle state (likely anotherMutexfield onIntelGpuCard) and merge its readout into the existingdetailmap. Do not unify the two state machines — they sample on different clocks and would race if intertwined.GpuReader::get_gpu_infois called from a single collector thread, so this is fine, but document it).libloading(nobuild.rsC compilation, nobindgendependency, no system header requirements on developer machines). Hand-write the small FFI surface needed for the metrics this reader actually consumes — do not vendor the fullze_api.h.ze_loader.dll) handles finding the per-vendor backend; the reader only needs to dynamically loadze_loader.dllitself (orze_intel_gpu64.dlldirectly for the Intel-only path).References
src/device/readers/intel_gpu_engine{.rs,/discovery.rs}).src/device/readers/tpu_pjrt.rs.xpu-smi(reference implementation): https://github.com/intel/xpumanager