feat(intel-gpu): real Linux utilization via perf engine-busy counters

## Problem / Background

PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux. As a documented v1 limitation, `IntelGpuLinuxReader::get_gpu_info()` always reports `utilization = 0.0` and tags the `detail` map with `"Utilization": "Requires intel_gpu_top (perf engine counters)"`. The follow-up to compute a real value was explicitly deferred.

`src/device/readers/intel_gpu_linux.rs` does not currently read any engine-busy counters. As a result, downstream consumers that select an inference backend based on observed accelerator activity (e.g. the SYCL/oneAPI accelerator-auto-selection layer referenced by issue #244) see Intel hosts as "GPU present, but always idle" and cannot distinguish a free Intel GPU from a fully loaded one.

The AMD reader (`src/device/readers/amd.rs`) computes real utilization via `libamdgpu_top`; the NVIDIA reader uses NVML's `nvmlDeviceGetUtilizationRates`. The Intel reader is the only GPU reader currently reporting a fabricated-zero utilization.

## Proposed Solution

Add real engine-busy% computation to the Intel Linux reader by reading kernel perf-style engine counters and tracking deltas across polling intervals:

- For `i915`: read `/sys/class/drm/card<N>/engine/<class>/<instance>/busy_ns` (or the equivalent perf event)
- For `xe`: read the equivalent under `/sys/class/drm/card<N>/device/tile0/gt0/engines/...`
- Track the previous `(busy_ns, wall_ns)` pair per device in reader-owned state
- Compute `(delta_busy / delta_wall) * 100.0`, clamped to [0, 100]
- Aggregate across engine classes (render, compute, video, copy) — surface render+compute as the primary `utilization`, expose the per-class breakdown via the `detail` map

Mirror the existing per-call vs. cached-static-info split used by the AMD reader (`AmdGpuDevice.static_info: OnceLock<DeviceStaticInfo>`).

## Acceptance Criteria

- [ ] `IntelGpuLinuxReader::get_gpu_info()` returns a non-zero `utilization` value when the GPU is actively executing work, verified on at least one Arc (discrete) or Iris Xe / Xe-LPG (integrated) host. *(awaits hardware verification by maintainer)*
- [x] The `"Utilization": "Requires intel_gpu_top (perf engine counters)"` placeholder in the `detail` map is removed (or replaced with an engine-class breakdown).
- [x] Per-engine-class utilization (render / compute / video / copy where available) is surfaced via the `detail` map.
- [x] State tracking handles the device-removed and clock-skew cases gracefully (return last-known or 0.0, never panic).
- [x] Unit tests cover the delta computation logic with synthetic timestamps.
- [x] The implementation is fully integrated into the codebase (registered in the existing reader, no orphaned modules).

## Technical Considerations

- The kernel exposes engine-busy via PMU events when `CONFIG_PERF_EVENTS` is enabled — these are the same counters `intel_gpu_top` reads. The sysfs path (`engine/.../busy_ns`) is the simpler entry point but is not universally available across kernel versions; PMU is more portable but requires `perf_event_open(2)` syscalls.
- Engine class names and instance counts differ between `i915` and `xe`. The reader already handles driver detection in `discover_cards` — extend that to drive the engine enumeration.
- Holding mutable per-device state in the reader changes the existing stateless shape. Use the same `OnceLock` + interior-mutability pattern (`Mutex<EngineCounters>`) the AMD reader uses for `vram_usage`.

## References

- v1 scope limitation documented in PR #245 body, "v1 scope limitations" section
- Closes the "always-zero utilization" gap explicitly called out in #244 acceptance criteria discussion
- Existing AMD utilization implementation: `src/device/readers/amd.rs`
- `intel_gpu_top` source for the canonical engine-busy reading approach


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(intel-gpu): real Linux utilization via perf engine-busy counters #246

Problem / Background

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(intel-gpu): real Linux utilization via perf engine-busy counters #246

Description

Problem / Background

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions