feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux

## Problem / Background

PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux; PR #249 (closes #246) then added real engine-busy utilization via the new `intel_gpu_engine` module. Per-process GPU memory accounting is the remaining v1 limitation: `IntelGpuReader::get_process_info()` still returns `Vec::new()`. Per-process `/proc/<pid>/fdinfo/*` DRM client parsing was deferred from PR #245 because the field schema differs materially between the `i915` and `xe` kernel drivers.

The AMD reader (`src/device/readers/amd.rs`) reports per-process VRAM usage via `libamdgpu_top::stat::FdInfoStat`. The NVIDIA reader uses NVML's process enumeration. The Intel reader has no process-level visibility, so any TUI tab, dashboard, or scraper that consumes `ProcessInfo` (e.g. the existing per-process columns in the `all-smi view` TUI) shows Intel hosts as having zero GPU-using processes regardless of actual load.

## Foundation available from PR #249

PR #249 already added per-card mutable state to `IntelGpuCard` (`src/device/readers/intel_gpu_linux.rs`):

- `engine_state: Mutex<EngineState>` with mutex-poisoning recovery mirroring the AMD pattern.
- `intel_gpu_engine::{EngineState, EngineSample, refresh_with_lock}` machinery for delta-tracked counters with an injected `Instant` clock for testability.
- `intel_gpu_engine::discovery::{discover_engine_counters, normalize_engine_class}` covering both i915 (`engine/rcs0/busy`, `engine/rcs/0/busy`) and xe (`device/tile*/gt*/engines/<CLASS>/<i>/busy_ns`) sysfs layouts.

The primary deliverable of this issue — per-process **memory** — is point-in-time and does not need delta tracking, so it can ship without touching `EngineState`. The stretch-goal extension (per-process **engine-time** utilization) reuses the existing scaffold cleanly: add a parallel `Mutex<ProcessEngineState>` (or extend `EngineState` with a per-PID map) and call into a `refresh_processes_with_lock` helper sibling to the one introduced for #246.

## Proposed Solution

Implement per-process GPU memory accounting for the Intel Linux reader by parsing `/proc/<pid>/fdinfo/<fd>` entries that reference an Intel DRM device:

- Walk `/proc/<pid>/fd/` for each running PID, follow the symlink, check whether it points at `/dev/dri/card<N>` or `/dev/dri/renderD<N>` belonging to an Intel GPU enumerated by the reader.
- For each matching `fd`, read `/proc/<pid>/fdinfo/<fd>` and parse the DRM client fields:
  - `i915`: `drm-engine-render`, `drm-engine-copy`, `drm-engine-video`, `drm-total-system`, `drm-total-local0` (VRAM on discrete).
  - `xe`: `drm-total-vram0`, `drm-total-gtt`, `drm-engine-rcs`, `drm-engine-bcs`, ...
- Aggregate per-PID across all matching `fd`s.
- Populate `ProcessInfo` with `pid`, `command` (from `/proc/<pid>/comm`), `used_memory`, `uses_gpu = true`.

For per-process *engine-time* deltas (stretch goal), reuse PR #249's `intel_gpu_engine::refresh_with_lock` pattern: define a per-PID `EngineState` keyed on `(pid, drm_client_id)`, sample the same `drm-engine-*` counters, and compute the same `delta_busy / delta_wall` percentage with the injected clock. Engine-class names should pass through the existing `normalize_engine_class` so the `detail` map labels stay consistent.

## Acceptance Criteria

- [ ] `IntelGpuReader::get_process_info()` returns a non-empty `Vec<ProcessInfo>` on a host with Intel-GPU-using processes (e.g. a SYCL inference workload, a Vulkan game, `glxgears`).
- [ ] Returned `ProcessInfo` entries have correct `used_memory` (matching what `intel_gpu_top -p` reports for the same workload, within rounding tolerance).
- [ ] Permission errors (read of `/proc/<pid>/fdinfo/<fd>` for processes the user does not own) degrade gracefully — skip the process, do not panic, do not log per-process noise.
- [ ] Both `i915` and `xe` drivers are handled (the field names differ).
- [ ] Implementation is integrated end-to-end: the new `ProcessInfo` entries flow into existing TUI / API consumers without per-vendor special casing.
- [ ] *(Stretch)* Per-process engine-time utilization computed via the same delta-tracking machinery used by #246, surfaced via `ProcessInfo` or the existing per-process detail surface.

## Technical Considerations

- `/proc/<pid>/fd/` enumeration on a busy host can be slow. Use the same approach the AMD/NVIDIA readers use to bound cost — typically a one-time process scan refreshed on the same cadence as `get_gpu_info`, not per-call.
- DRM client `fdinfo` fields are stable kernel ABI since ~5.19 (i915) and from the initial xe upstreaming. Older kernels: gracefully degrade to empty.
- `drm-client-id` is the stable per-`fd` identifier; multiple `fd`s sharing the same `drm-client-id` belong to one client and should be deduplicated before aggregation. Required for the stretch-goal delta tracking (counters reset when the client closes its fd).
- The kernel may not expose VRAM-vs-GTT separately on integrated GPUs — that's fine, return total reserved memory and document the integrated-GPU semantics in the `detail` map.
- Mirror the mutex-poisoning recovery pattern PR #249 added (`catch_unwind` + `into_inner` + replace with `empty()` — see `src/device/readers/intel_gpu_engine.rs::refresh_with_lock`) if any new `Mutex<…>` state is introduced.

## References

- PR #249 — engine-busy utilization machinery this issue can extend (`src/device/readers/intel_gpu_engine{.rs,/discovery.rs}`).
- v1 scope limitation documented in PR #245 body, "v1 scope limitations" section.
- Existing AMD per-process implementation: `src/device/readers/amd.rs` (FdInfoStat usage).
- DRM client fdinfo kernel doc: `Documentation/gpu/drm-usage-stats.rst` in the Linux source tree.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux #247

Problem / Background

Foundation available from PR #249

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux #247

Description

Problem / Background

Foundation available from PR #249

Proposed Solution

Acceptance Criteria

Technical Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions