Problem / Background
PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux. As a documented v1 limitation, IntelGpuLinuxReader::get_gpu_info() always reports utilization = 0.0 and tags the detail map with "Utilization": "Requires intel_gpu_top (perf engine counters)". The follow-up to compute a real value was explicitly deferred.
src/device/readers/intel_gpu_linux.rs does not currently read any engine-busy counters. As a result, downstream consumers that select an inference backend based on observed accelerator activity (e.g. the SYCL/oneAPI accelerator-auto-selection layer referenced by issue #244) see Intel hosts as "GPU present, but always idle" and cannot distinguish a free Intel GPU from a fully loaded one.
The AMD reader (src/device/readers/amd.rs) computes real utilization via libamdgpu_top; the NVIDIA reader uses NVML's nvmlDeviceGetUtilizationRates. The Intel reader is the only GPU reader currently reporting a fabricated-zero utilization.
Proposed Solution
Add real engine-busy% computation to the Intel Linux reader by reading kernel perf-style engine counters and tracking deltas across polling intervals:
- For
i915: read /sys/class/drm/card<N>/engine/<class>/<instance>/busy_ns (or the equivalent perf event)
- For
xe: read the equivalent under /sys/class/drm/card<N>/device/tile0/gt0/engines/...
- Track the previous
(busy_ns, wall_ns) pair per device in reader-owned state
- Compute
(delta_busy / delta_wall) * 100.0, clamped to [0, 100]
- Aggregate across engine classes (render, compute, video, copy) — surface render+compute as the primary
utilization, expose the per-class breakdown via the detail map
Mirror the existing per-call vs. cached-static-info split used by the AMD reader (AmdGpuDevice.static_info: OnceLock<DeviceStaticInfo>).
Acceptance Criteria
Technical Considerations
- The kernel exposes engine-busy via PMU events when
CONFIG_PERF_EVENTS is enabled — these are the same counters intel_gpu_top reads. The sysfs path (engine/.../busy_ns) is the simpler entry point but is not universally available across kernel versions; PMU is more portable but requires perf_event_open(2) syscalls.
- Engine class names and instance counts differ between
i915 and xe. The reader already handles driver detection in discover_cards — extend that to drive the engine enumeration.
- Holding mutable per-device state in the reader changes the existing stateless shape. Use the same
OnceLock + interior-mutability pattern (Mutex<EngineCounters>) the AMD reader uses for vram_usage.
References
Problem / Background
PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux. As a documented v1 limitation,
IntelGpuLinuxReader::get_gpu_info()always reportsutilization = 0.0and tags thedetailmap with"Utilization": "Requires intel_gpu_top (perf engine counters)". The follow-up to compute a real value was explicitly deferred.src/device/readers/intel_gpu_linux.rsdoes not currently read any engine-busy counters. As a result, downstream consumers that select an inference backend based on observed accelerator activity (e.g. the SYCL/oneAPI accelerator-auto-selection layer referenced by issue #244) see Intel hosts as "GPU present, but always idle" and cannot distinguish a free Intel GPU from a fully loaded one.The AMD reader (
src/device/readers/amd.rs) computes real utilization vialibamdgpu_top; the NVIDIA reader uses NVML'snvmlDeviceGetUtilizationRates. The Intel reader is the only GPU reader currently reporting a fabricated-zero utilization.Proposed Solution
Add real engine-busy% computation to the Intel Linux reader by reading kernel perf-style engine counters and tracking deltas across polling intervals:
i915: read/sys/class/drm/card<N>/engine/<class>/<instance>/busy_ns(or the equivalent perf event)xe: read the equivalent under/sys/class/drm/card<N>/device/tile0/gt0/engines/...(busy_ns, wall_ns)pair per device in reader-owned state(delta_busy / delta_wall) * 100.0, clamped to [0, 100]utilization, expose the per-class breakdown via thedetailmapMirror the existing per-call vs. cached-static-info split used by the AMD reader (
AmdGpuDevice.static_info: OnceLock<DeviceStaticInfo>).Acceptance Criteria
IntelGpuLinuxReader::get_gpu_info()returns a non-zeroutilizationvalue when the GPU is actively executing work, verified on at least one Arc (discrete) or Iris Xe / Xe-LPG (integrated) host. (awaits hardware verification by maintainer)"Utilization": "Requires intel_gpu_top (perf engine counters)"placeholder in thedetailmap is removed (or replaced with an engine-class breakdown).detailmap.Technical Considerations
CONFIG_PERF_EVENTSis enabled — these are the same countersintel_gpu_topreads. The sysfs path (engine/.../busy_ns) is the simpler entry point but is not universally available across kernel versions; PMU is more portable but requiresperf_event_open(2)syscalls.i915andxe. The reader already handles driver detection indiscover_cards— extend that to drive the engine enumeration.OnceLock+ interior-mutability pattern (Mutex<EngineCounters>) the AMD reader uses forvram_usage.References
src/device/readers/amd.rsintel_gpu_topsource for the canonical engine-busy reading approach