You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux; PR #249 (closes #246) then added real engine-busy utilization via the new intel_gpu_engine module. Per-process GPU memory accounting is the remaining v1 limitation: IntelGpuReader::get_process_info() still returns Vec::new(). Per-process /proc/<pid>/fdinfo/* DRM client parsing was deferred from PR #245 because the field schema differs materially between the i915 and xe kernel drivers.
The AMD reader (src/device/readers/amd.rs) reports per-process VRAM usage via libamdgpu_top::stat::FdInfoStat. The NVIDIA reader uses NVML's process enumeration. The Intel reader has no process-level visibility, so any TUI tab, dashboard, or scraper that consumes ProcessInfo (e.g. the existing per-process columns in the all-smi view TUI) shows Intel hosts as having zero GPU-using processes regardless of actual load.
PR #249 already added per-card mutable state to IntelGpuCard (src/device/readers/intel_gpu_linux.rs):
engine_state: Mutex<EngineState> with mutex-poisoning recovery mirroring the AMD pattern.
intel_gpu_engine::{EngineState, EngineSample, refresh_with_lock} machinery for delta-tracked counters with an injected Instant clock for testability.
intel_gpu_engine::discovery::{discover_engine_counters, normalize_engine_class} covering both i915 (engine/rcs0/busy, engine/rcs/0/busy) and xe (device/tile*/gt*/engines/<CLASS>/<i>/busy_ns) sysfs layouts.
The primary deliverable of this issue — per-process memory — is point-in-time and does not need delta tracking, so it can ship without touching EngineState. The stretch-goal extension (per-process engine-time utilization) reuses the existing scaffold cleanly: add a parallel Mutex<ProcessEngineState> (or extend EngineState with a per-PID map) and call into a refresh_processes_with_lock helper sibling to the one introduced for #246.
Proposed Solution
Implement per-process GPU memory accounting for the Intel Linux reader by parsing /proc/<pid>/fdinfo/<fd> entries that reference an Intel DRM device:
Walk /proc/<pid>/fd/ for each running PID, follow the symlink, check whether it points at /dev/dri/card<N> or /dev/dri/renderD<N> belonging to an Intel GPU enumerated by the reader.
For each matching fd, read /proc/<pid>/fdinfo/<fd> and parse the DRM client fields:
i915: drm-engine-render, drm-engine-copy, drm-engine-video, drm-total-system, drm-total-local0 (VRAM on discrete).
For per-process engine-time deltas (stretch goal), reuse PR #249's intel_gpu_engine::refresh_with_lock pattern: define a per-PID EngineState keyed on (pid, drm_client_id), sample the same drm-engine-* counters, and compute the same delta_busy / delta_wall percentage with the injected clock. Engine-class names should pass through the existing normalize_engine_class so the detail map labels stay consistent.
Acceptance Criteria
IntelGpuReader::get_process_info() returns a non-empty Vec<ProcessInfo> on a host with Intel-GPU-using processes (e.g. a SYCL inference workload, a Vulkan game, glxgears).
Returned ProcessInfo entries have correct used_memory (matching what intel_gpu_top -p reports for the same workload, within rounding tolerance).
Permission errors (read of /proc/<pid>/fdinfo/<fd> for processes the user does not own) degrade gracefully — skip the process, do not panic, do not log per-process noise.
Both i915 and xe drivers are handled (the field names differ).
Implementation is integrated end-to-end: the new ProcessInfo entries flow into existing TUI / API consumers without per-vendor special casing.
/proc/<pid>/fd/ enumeration on a busy host can be slow. Use the same approach the AMD/NVIDIA readers use to bound cost — typically a one-time process scan refreshed on the same cadence as get_gpu_info, not per-call.
DRM client fdinfo fields are stable kernel ABI since ~5.19 (i915) and from the initial xe upstreaming. Older kernels: gracefully degrade to empty.
drm-client-id is the stable per-fd identifier; multiple fds sharing the same drm-client-id belong to one client and should be deduplicated before aggregation. Required for the stretch-goal delta tracking (counters reset when the client closes its fd).
The kernel may not expose VRAM-vs-GTT separately on integrated GPUs — that's fine, return total reserved memory and document the integrated-GPU semantics in the detail map.
Problem / Background
PR #245 (closes #244) shipped a sysfs-based Intel client GPU reader for Linux; PR #249 (closes #246) then added real engine-busy utilization via the new
intel_gpu_enginemodule. Per-process GPU memory accounting is the remaining v1 limitation:IntelGpuReader::get_process_info()still returnsVec::new(). Per-process/proc/<pid>/fdinfo/*DRM client parsing was deferred from PR #245 because the field schema differs materially between thei915andxekernel drivers.The AMD reader (
src/device/readers/amd.rs) reports per-process VRAM usage vialibamdgpu_top::stat::FdInfoStat. The NVIDIA reader uses NVML's process enumeration. The Intel reader has no process-level visibility, so any TUI tab, dashboard, or scraper that consumesProcessInfo(e.g. the existing per-process columns in theall-smi viewTUI) shows Intel hosts as having zero GPU-using processes regardless of actual load.Foundation available from PR #249
PR #249 already added per-card mutable state to
IntelGpuCard(src/device/readers/intel_gpu_linux.rs):engine_state: Mutex<EngineState>with mutex-poisoning recovery mirroring the AMD pattern.intel_gpu_engine::{EngineState, EngineSample, refresh_with_lock}machinery for delta-tracked counters with an injectedInstantclock for testability.intel_gpu_engine::discovery::{discover_engine_counters, normalize_engine_class}covering both i915 (engine/rcs0/busy,engine/rcs/0/busy) and xe (device/tile*/gt*/engines/<CLASS>/<i>/busy_ns) sysfs layouts.The primary deliverable of this issue — per-process memory — is point-in-time and does not need delta tracking, so it can ship without touching
EngineState. The stretch-goal extension (per-process engine-time utilization) reuses the existing scaffold cleanly: add a parallelMutex<ProcessEngineState>(or extendEngineStatewith a per-PID map) and call into arefresh_processes_with_lockhelper sibling to the one introduced for #246.Proposed Solution
Implement per-process GPU memory accounting for the Intel Linux reader by parsing
/proc/<pid>/fdinfo/<fd>entries that reference an Intel DRM device:/proc/<pid>/fd/for each running PID, follow the symlink, check whether it points at/dev/dri/card<N>or/dev/dri/renderD<N>belonging to an Intel GPU enumerated by the reader.fd, read/proc/<pid>/fdinfo/<fd>and parse the DRM client fields:i915:drm-engine-render,drm-engine-copy,drm-engine-video,drm-total-system,drm-total-local0(VRAM on discrete).xe:drm-total-vram0,drm-total-gtt,drm-engine-rcs,drm-engine-bcs, ...fds.ProcessInfowithpid,command(from/proc/<pid>/comm),used_memory,uses_gpu = true.For per-process engine-time deltas (stretch goal), reuse PR #249's
intel_gpu_engine::refresh_with_lockpattern: define a per-PIDEngineStatekeyed on(pid, drm_client_id), sample the samedrm-engine-*counters, and compute the samedelta_busy / delta_wallpercentage with the injected clock. Engine-class names should pass through the existingnormalize_engine_classso thedetailmap labels stay consistent.Acceptance Criteria
IntelGpuReader::get_process_info()returns a non-emptyVec<ProcessInfo>on a host with Intel-GPU-using processes (e.g. a SYCL inference workload, a Vulkan game,glxgears).ProcessInfoentries have correctused_memory(matching whatintel_gpu_top -preports for the same workload, within rounding tolerance)./proc/<pid>/fdinfo/<fd>for processes the user does not own) degrade gracefully — skip the process, do not panic, do not log per-process noise.i915andxedrivers are handled (the field names differ).ProcessInfoentries flow into existing TUI / API consumers without per-vendor special casing.ProcessInfoor the existing per-process detail surface.Technical Considerations
/proc/<pid>/fd/enumeration on a busy host can be slow. Use the same approach the AMD/NVIDIA readers use to bound cost — typically a one-time process scan refreshed on the same cadence asget_gpu_info, not per-call.fdinfofields are stable kernel ABI since ~5.19 (i915) and from the initial xe upstreaming. Older kernels: gracefully degrade to empty.drm-client-idis the stable per-fdidentifier; multiplefds sharing the samedrm-client-idbelong to one client and should be deduplicated before aggregation. Required for the stretch-goal delta tracking (counters reset when the client closes its fd).detailmap.catch_unwind+into_inner+ replace withempty()— seesrc/device/readers/intel_gpu_engine.rs::refresh_with_lock) if any newMutex<…>state is introduced.References
src/device/readers/intel_gpu_engine{.rs,/discovery.rs}).src/device/readers/amd.rs(FdInfoStat usage).Documentation/gpu/drm-usage-stats.rstin the Linux source tree.