Skip to content

feat(intel-gpu): make Level Zero Sysman the primary metrics source #252

Description

@inureyes

Problem / Background

PR #251 added an opt-in Intel Level Zero backend behind the level_zero Cargo feature. The implementation correctly keeps the default build free of Level Zero symbols and dynamically loads libze_loader.so.1 / libze_loader.so on Linux or ze_loader.dll on Windows only when the feature is enabled. It currently resolves a narrow Sysman surface: zeInit, driver/device enumeration, PCI BDF lookup, engine activity (zesDeviceEnumEngineGroups + zesEngineGetActivity), and power energy counters (zesDeviceEnumPowerDomains + zesPowerGetEnergyCounter).

That is a good v1 safety shape, but the merged behavior is not yet the ideal metric-source architecture for Intel client GPUs. On Linux, sysfs remains the primary source for most fields and Level Zero mostly adds detail entries. On Windows, Level Zero fills only the WMI gaps for utilization and power. Temperature, memory state, frequency state, and fan state are still not read through Sysman at all.

For long-term maintainability and cross-platform parity, Intel GPU metrics should treat Level Zero Sysman as the preferred vendor API when it is available and fresh, then fall back to the existing OS-specific readers. This matches Intel's intended management API shape, avoids Linux/Windows drift, and keeps the current graceful degradation contract for hosts without the loader or without the level_zero feature.

Related implementation history:

Target Architecture

Keep the existing feature gate and dynamic-loading model:

default build
  -> no Level Zero module compiled
  -> no libze_loader dependency
  -> existing sysfs/WMI behavior unchanged

--features level_zero
  -> dynamically load libze_loader.so.1 / libze_loader.so on Linux, ze_loader.dll on Windows
  -> initialize Sysman with zesInit when the loader exports it
  -> support legacy loaders only when ZES_ENABLE_SYSMAN=1 was set before zeInit
  -> enumerate devices by PCI BDF where possible
  -> collect Sysman metric families
  -> merge only fresh, valid Sysman values into GpuInfo
  -> fall back to existing OS-specific sources when Sysman is absent, stale, unsupported, or unavailable

Desired source priority by metric:

temperature: fresh Sysman temperature sensor -> hwmon -> unavailable
power:       fresh Sysman energy-counter delta -> hwmon power -> unavailable
memory:      Sysman memory state -> DRM/sysfs VRAM -> unavailable
engine util: fresh Sysman engine-activity delta -> DRM/i915/xe sysfs/perf -> unavailable
fan:         hwmon -> Sysman fan state if available -> unavailable
frequency:   Sysman frequency state -> DRM/sysfs frequency -> unavailable

Important nuance: delta-based Sysman metrics (power, engine util) must not overwrite a valid fallback with a seeded zero. The first Sysman sample establishes a baseline. Only the second and later fresh delta samples are eligible to become the primary source.

Proposed Solution

Extend src/device/readers/intel_gpu_level_zero/ from a small engine/power augmentor into a reusable Sysman metrics provider that returns a richer per-card readout.

Suggested module shape:

  • ffi.rs: add the minimal Level Zero Sysman structs, enums, and function-pointer typedefs for temperature, memory, frequency, and fan. Continue hand-writing only the subset all-smi consumes; do not vendor full Level Zero headers and do not add bindgen.
  • loader.rs: continue dynamic loading with libloading; resolve new symbols optionally where the Level Zero spec/runtime allows older loaders to omit them. Missing optional symbols should degrade that metric family only, not disable the whole backend.
  • refresh.rs: keep delta-tracked engine/power logic and add point-in-time refresh helpers for temperature, memory, frequency, and fan.
  • intel_gpu_level_zero.rs: expose a richer LevelZeroReadout with explicit freshness/source flags per metric family, not just a boolean had_any_data.
  • intel_gpu_linux/level_zero_glue.rs: merge Sysman values into the Linux sysfs baseline using the priority rules above.
  • intel_gpu_windows.rs: merge Sysman values into the Windows WMI baseline using the same policy, with Windows-specific matching limitations documented.

The target readout type should make source decisions explicit. One possible shape:

pub struct LevelZeroReadout {
    pub engines: Vec<(&'static str, f64)>,
    pub primary_engine_utilization: Option<FreshValue<f64>>,
    pub power_watts: Option<FreshValue<f64>>,
    pub temperature_celsius: Option<FreshValue<u32>>,
    pub memory: Option<LevelZeroMemoryReadout>,
    pub frequency_mhz: Option<FreshValue<u32>>,
    pub fan: Option<LevelZeroFanReadout>,
    pub diagnostics: LevelZeroDiagnostics,
}

pub struct FreshValue<T> {
    pub value: T,
    pub source: &'static str,
}

The exact names are flexible, but the implementation must distinguish "unsupported/unavailable", "seeded but not fresh yet", and "fresh value". A single had_any_data: bool is too coarse once Sysman becomes a priority source.

Required Level Zero Surface

The implementation should verify names and layouts against the upstream zes_api.h and the published Level Zero Sysman specification before coding. At minimum, add only the following families:

Temperature:

  • zesDeviceEnumTemperatureSensors
  • zesTemperatureGetProperties
  • zesTemperatureGetState
  • Track sensor type where exposed. Prefer GPU/global sensors over memory/unknown sensors when multiple are present. Ignore non-finite or out-of-range readings.

Power:

  • Keep zesDeviceEnumPowerDomains
  • Keep zesPowerGetEnergyCounter
  • Continue delta tracking with µJ / µs = W.
  • A seeded sample is not fresh and must not overwrite hwmon. A valid delta is fresh and should become primary over hwmon.

Memory:

  • zesDeviceEnumMemoryModules
  • zesMemoryGetProperties
  • zesMemoryGetState
  • Prefer dedicated/local memory for discrete GPUs. For integrated GPUs, report semantics carefully: shared/GTT/system-backed memory must not be presented as a fixed dedicated VRAM budget if the API does not support that interpretation.
  • Preserve the current integrated-GPU behavior of not fabricating a total from system RAM.

Frequency:

  • zesDeviceEnumFrequencyDomains
  • zesFrequencyGetProperties
  • zesFrequencyGetState
  • Prefer GPU/media/compute frequency domains according to the spec labels available. If multiple domains are present, choose the domain that best represents the GPU core clock and expose other domains in detail.

Engine:

  • Keep zesDeviceEnumEngineGroups
  • Keep zesEngineGetProperties
  • Keep zesEngineGetActivity
  • Continue tracking singleton groups and avoid double counting aggregate _ALL groups.
  • For Linux primary utilization, allow fresh Sysman primary engine utilization to override sysfs. If Sysman is unavailable or still seeded, retain the sysfs primary value.

Fan:

  • zesDeviceEnumFans
  • zesFanGetProperties
  • zesFanGetState
  • Use hwmon first on Linux because fan telemetry is often exposed there and may reflect board-level control more reliably. Use Sysman fan only if hwmon is absent/unavailable. On Windows, Sysman fan can be the primary source if available.

Sysman initialization:

  • Prefer zesInit when the loader exports it. The current Level Zero documentation deprecates relying on ZES_ENABLE_SYSMAN for new applications.
  • Preserve compatibility with older loaders that do not export zesInit by allowing the CLI to set ZES_ENABLE_SYSMAN=1 during single-threaded process startup before any zeInit.
  • Library callers must not mutate process environment from lazy reader initialization. If zesInit is absent and the environment variable was not set before initialization, degrade Level Zero cleanly and keep the sysfs/WMI baseline.

Merge Policy

Linux:

  • GpuInfo.temperature: Sysman fresh GPU/global temperature if available and sane, else existing hwmon temperature, else 0.
  • GpuInfo.power_consumption: Sysman fresh energy-counter watts if available and sane, else existing hwmon watts, else 0.0.
  • GpuInfo.total_memory / used_memory: Sysman memory state if it represents dedicated/local VRAM for the card, else existing DRM/sysfs VRAM, else existing integrated-GPU zero-with-detail semantics.
  • GpuInfo.frequency: Sysman frequency state if available and sane, else existing sysfs frequency, else 0.
  • GpuInfo.utilization: Sysman fresh engine utilization if available, else existing i915/xe sysfs engine-busy utilization, else 0.0 with an explanatory detail["Utilization"] note.
  • detail["Metrics Source"]: should not be a single ambiguous string once different fields may come from different places. Add per-field source details such as detail["Source: Temperature"] = "Level Zero Sysman" or detail["Source: Power"] = "hwmon". Keeping the existing Metrics Source summary is fine, but it must not hide mixed-source results.

Windows:

  • WMI remains the baseline for name, UUID/PNP ID, driver version, video processor, and AdapterRAM when Sysman memory is unavailable.
  • Sysman should fill utilization, power, temperature, frequency, memory, and fan when available.
  • Ordinal BDF matching from PR feat(intel-gpu): add opt-in Level Zero backend for advanced metrics #251 is acceptable for v1 but should be isolated behind a small helper and documented. If this issue touches matching, prefer adding a future-compatible Win32_PnPEntity.LocationInformation parser rather than scattering ordinal assumptions through the merge code.

Validation / Safety Rules

  • Keep level_zero default-off.
  • Default cargo build must still contain no Level Zero symbol strings and no dynamic dependency on libze_loader.
  • Missing loader, missing symbols, unsupported sensors/domains, permission failures, and bad driver return codes must all degrade to the existing sysfs/WMI baseline without panic or log spam.
  • Cap every driver-reported count before allocating a handle vector. Reuse the existing MAX_L0_HANDLES = 256 guard.
  • All FFI structs must have #[repr(C)], correct stype, correct pNext, correct ze_bool_t width (u8), and size/layout tests on 64-bit targets.
  • Never read uninitialised driver output. Initialise every struct with Default, check return codes before reading, and ignore failed subdomains instead of failing the whole device.
  • Clamp reported values to the same sanity bounds already used by the Intel sysfs reader: utilization [0, 100], temperature cap, power cap, frequency cap, memory cap. Add new bounds only if justified.
  • Do not overwrite a meaningful fallback with a Sysman zero unless Sysman explicitly reports a real zero and the domain is fresh. Seeded delta metrics are "not fresh", not "zero".
  • Avoid per-refresh library loading. libloading::Library and function pointers should remain process-cached as they do now.
  • Avoid repeated enumeration when not needed. It is acceptable to enumerate handles once per bound card and reuse them in LevelZeroState; re-enumeration should happen only if the driver reports a stale/failing handle pattern that requires recovery.

Acceptance Criteria

  • With default features, cargo build --bin all-smi and cargo build --no-default-features --lib do not compile or link the Level Zero module; the produced binary has no zeInit, zes*, libze_loader, or ze_loader.dll symbol/string references attributable to the all-smi L0 backend.
  • With --features level_zero, the backend dynamically loads libze_loader.so.1 / libze_loader.so on Linux and ze_loader.dll on Windows via libloading; absence of the loader degrades silently to the existing baseline.
  • Sysman initialization prefers zesInit when available and never calls std::env::set_var from lazy reader initialization after worker threads may exist.
  • Temperature is sourced as Sysman -> hwmon -> unavailable, with tests covering Sysman fresh value winning over hwmon and hwmon winning when Sysman is unavailable.
  • Power is sourced as fresh Sysman energy-counter delta -> hwmon -> unavailable, with tests proving the first seeded Sysman sample does not overwrite hwmon.
  • Memory is sourced as Sysman memory state -> DRM/sysfs -> unavailable, with tests for discrete local memory and integrated shared-memory semantics.
  • Engine utilization is sourced as fresh Sysman engine activity -> i915/xe sysfs engine busy -> unavailable, with tests proving seeded Sysman activity does not overwrite sysfs.
  • Frequency is sourced as Sysman frequency state -> sysfs -> unavailable.
  • Fan is sourced as hwmon -> Sysman fan if available -> unavailable on Linux, and Sysman fan if available on Windows.
  • detail exposes per-field metric source information so users can see mixed-source results, not only a coarse Metrics Source.
  • All new FFI structs and enum constants have unit tests locking size/layout and spec values.
  • All driver-reported handle counts are capped before allocation.
  • Linux Intel reader tests cover mixed Sysman/sysfs fallback behavior without requiring real Intel hardware.
  • Windows merge tests cover WMI baseline + Sysman override behavior without requiring real Intel hardware.
  • Documentation updates README, docs/ARCHITECTURE.md, and docs/man/all-smi.1 to describe the Sysman-first priority model and the default-off feature gate.

Suggested Test Plan

  • cargo test --lib device::readers::intel_gpu_level_zero --features level_zero
  • cargo test --lib device::readers::intel_gpu_linux
  • cargo test --lib device::readers::intel_gpu_linux --features level_zero
  • cargo test --lib device::readers::intel_gpu_sysfs
  • cargo test --lib device::readers::intel_gpu_engine
  • cargo test --lib device::readers::intel_gpu_fdinfo
  • cargo check --lib --tests
  • cargo check --lib --tests --features level_zero
  • cargo clippy --lib --tests -- -D warnings
  • cargo clippy --lib --tests --features level_zero -- -D warnings
  • cargo clippy -- -D warnings
  • cargo clippy --features level_zero -- -D warnings
  • cargo fmt --check
  • cargo build --bin all-smi followed by a symbol/string audit proving no Level Zero references in the default build.

Hardware validation, when available:

  • On Linux with an Intel Arc / Iris Xe / Xe-LPG host and Level Zero runtime installed, compare all-smi temperature/power/memory/frequency/utilization against xpu-smi or Intel's reference tooling within reasonable sampling noise.
  • On Windows with an Intel Arc / Iris Xe / Xe-LPG host and Level Zero runtime installed, verify non-zero utilization/power/temperature/frequency when a GPU workload is active.
  • Verify no regression on hosts without Level Zero runtime: output should match the pre-issue sysfs/WMI baseline.

Non-Goals / Deferred

  • Per-process Level Zero stats (zesDeviceProcessesGetState) remain out of scope unless the implementation naturally needs it. The Linux fdinfo path from feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux #247 already covers per-process GPU memory.
  • Power-limit control (zesPowerGetLimits*, zesPowerSetLimits*) remains out of scope.
  • RAS / error reporting and performance-factor controls remain out of scope.
  • Replacing the existing Linux fdinfo process accounting with Level Zero is out of scope.

References

Refs #244
Refs #246
Refs #247
Refs #248

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions