Skip to content

feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux #247

Description

@inureyes

Problem / Background

PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux; PR #249 (closes #246) then added real engine-busy utilization via the new intel_gpu_engine module. Per-process GPU memory accounting is the remaining v1 limitation: IntelGpuReader::get_process_info() still returns Vec::new(). Per-process /proc/<pid>/fdinfo/* DRM client parsing was deferred from PR #245 because the field schema differs materially between the i915 and xe kernel drivers.

The AMD reader (src/device/readers/amd.rs) reports per-process VRAM usage via libamdgpu_top::stat::FdInfoStat. The NVIDIA reader uses NVML's process enumeration. The Intel reader has no process-level visibility, so any TUI tab, dashboard, or scraper that consumes ProcessInfo (e.g. the existing per-process columns in the all-smi view TUI) shows Intel hosts as having zero GPU-using processes regardless of actual load.

Foundation available from PR #249

PR #249 already added per-card mutable state to IntelGpuCard (src/device/readers/intel_gpu_linux.rs):

  • engine_state: Mutex<EngineState> with mutex-poisoning recovery mirroring the AMD pattern.
  • intel_gpu_engine::{EngineState, EngineSample, refresh_with_lock} machinery for delta-tracked counters with an injected Instant clock for testability.
  • intel_gpu_engine::discovery::{discover_engine_counters, normalize_engine_class} covering both i915 (engine/rcs0/busy, engine/rcs/0/busy) and xe (device/tile*/gt*/engines/<CLASS>/<i>/busy_ns) sysfs layouts.

The primary deliverable of this issue — per-process memory — is point-in-time and does not need delta tracking, so it can ship without touching EngineState. The stretch-goal extension (per-process engine-time utilization) reuses the existing scaffold cleanly: add a parallel Mutex<ProcessEngineState> (or extend EngineState with a per-PID map) and call into a refresh_processes_with_lock helper sibling to the one introduced for #246.

Proposed Solution

Implement per-process GPU memory accounting for the Intel Linux reader by parsing /proc/<pid>/fdinfo/<fd> entries that reference an Intel DRM device:

  • Walk /proc/<pid>/fd/ for each running PID, follow the symlink, check whether it points at /dev/dri/card<N> or /dev/dri/renderD<N> belonging to an Intel GPU enumerated by the reader.
  • For each matching fd, read /proc/<pid>/fdinfo/<fd> and parse the DRM client fields:
    • i915: drm-engine-render, drm-engine-copy, drm-engine-video, drm-total-system, drm-total-local0 (VRAM on discrete).
    • xe: drm-total-vram0, drm-total-gtt, drm-engine-rcs, drm-engine-bcs, ...
  • Aggregate per-PID across all matching fds.
  • Populate ProcessInfo with pid, command (from /proc/<pid>/comm), used_memory, uses_gpu = true.

For per-process engine-time deltas (stretch goal), reuse PR #249's intel_gpu_engine::refresh_with_lock pattern: define a per-PID EngineState keyed on (pid, drm_client_id), sample the same drm-engine-* counters, and compute the same delta_busy / delta_wall percentage with the injected clock. Engine-class names should pass through the existing normalize_engine_class so the detail map labels stay consistent.

Acceptance Criteria

  • IntelGpuReader::get_process_info() returns a non-empty Vec<ProcessInfo> on a host with Intel-GPU-using processes (e.g. a SYCL inference workload, a Vulkan game, glxgears).
  • Returned ProcessInfo entries have correct used_memory (matching what intel_gpu_top -p reports for the same workload, within rounding tolerance).
  • Permission errors (read of /proc/<pid>/fdinfo/<fd> for processes the user does not own) degrade gracefully — skip the process, do not panic, do not log per-process noise.
  • Both i915 and xe drivers are handled (the field names differ).
  • Implementation is integrated end-to-end: the new ProcessInfo entries flow into existing TUI / API consumers without per-vendor special casing.
  • (Stretch) Per-process engine-time utilization computed via the same delta-tracking machinery used by feat(intel-gpu): real Linux utilization via perf engine-busy counters #246, surfaced via ProcessInfo or the existing per-process detail surface.

Technical Considerations

  • /proc/<pid>/fd/ enumeration on a busy host can be slow. Use the same approach the AMD/NVIDIA readers use to bound cost — typically a one-time process scan refreshed on the same cadence as get_gpu_info, not per-call.
  • DRM client fdinfo fields are stable kernel ABI since ~5.19 (i915) and from the initial xe upstreaming. Older kernels: gracefully degrade to empty.
  • drm-client-id is the stable per-fd identifier; multiple fds sharing the same drm-client-id belong to one client and should be deduplicated before aggregation. Required for the stretch-goal delta tracking (counters reset when the client closes its fd).
  • The kernel may not expose VRAM-vs-GTT separately on integrated GPUs — that's fine, return total reserved memory and document the integrated-GPU semantics in the detail map.
  • Mirror the mutex-poisoning recovery pattern PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249 added (catch_unwind + into_inner + replace with empty() — see src/device/readers/intel_gpu_engine.rs::refresh_with_lock) if any new Mutex<…> state is introduced.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions