feat(intel-gpu): per-process GPU memory accounting via fdinfo#250
Conversation
Introduces a new stateless module `intel_gpu_fdinfo` that parses Intel DRM client `/proc/<pid>/fdinfo/<fd>` blocks and correlates fds back to a reader-known card index. Provides: - `parse_fdinfo` — pure-string parser for the i915 and xe schemas. Handles truncated / malformed input without panicking, rejects foreign drivers (amdgpu / nvidia), and normalises memory values from kB to bytes. - `build_intel_drm_basenames` — walks `/sys/class/drm` once to find every `cardN` and `renderD<M>` minor that maps to one of the reader-enumerated Intel PCI devices. Both nodes share a card index so modern Vulkan / oneAPI / ffmpeg workloads opening the render node are captured. - `intel_drm_fds_for_pid` — reads `/proc/<pid>/fd/` and returns the fds pointing at known Intel DRM nodes. Permission errors degrade silently. - `collect_intel_gpu_processes` — top-level aggregator. Walks `/proc`, dedupes fds by `drm-client-id` per process+card (avoiding N× over-counting from `dup(2)`d fds) but sums across distinct clients (multi-context workloads). Returns deterministic, PID-sorted output. The module is path-injection friendly: every public helper accepts an explicit `proc_root` / `drm_root` so the entire walker is testable under `tempfile::tempdir` fixtures without touching the real `/proc` or `/sys`. 23 unit tests cover: i915 + xe parsing, integrated vs discrete schemas, foreign-driver rejection, truncated input, kB-to-bytes conversion, the card/renderD mapping for one and two cards, AMD render-node rejection, connector-child filtering, client-id dedup, distinct-client summing, multi-card grouping, and graceful no-Intel-card short-circuit. Refs #247
Replaces the `Vec::new()` stub in `IntelGpuReader::get_process_info()` with a full implementation built on top of the new `intel_gpu_fdinfo` module. The reader now: - Caches an `intel_drm_basenames` map at construction (one entry per `cardN` and `renderD<M>` known to belong to an Intel PCI device), so the per-process refresh is a flat `/proc` walk with no extra sysfs probing. - Threads a `proc_root` field through the constructor for test injection — production stays `/proc`. - Builds a `card_index -> uuid` table on each call and delegates to `build_intel_process_infos`, which collects `(pid, card_index, used_memory_bytes)` aggregates, performs one minimal sysinfo refresh, and merges sysinfo metadata (cpu_percent, user, state, rss, vms, etc.) into the final `ProcessInfo` rows. Pattern matches AMD's reader exactly so cross-vendor consumers see a consistent shape. To stay under the 500-line per-file budget after the integration, two cohesive subsections moved into siblings: - `intel_gpu_linux/detection.rs` — `has_intel_client_gpu_from_root` and `line_matches_intel_gpu`. The public `has_intel_client_gpu` is now a 4-line wrapper. - `intel_gpu_fdinfo/enrichment.rs` — the sysinfo merge helper. The parent module re-exports `build_intel_process_infos`, so the public API is unchanged. File sizes after refactor (all <500): - `intel_gpu_linux.rs` 474 - `intel_gpu_fdinfo.rs` 483 - `intel_gpu_fdinfo/enrichment.rs` 107 - `intel_gpu_fdinfo/tests.rs` 494 - `intel_gpu_linux/detection.rs` 86 Behaviour on hosts without Intel-GPU-using processes is unchanged: an empty basename map (no Intel GPUs detected) short-circuits to `Vec::new()`, and the fdinfo walker returns empty when no process holds an Intel DRM fd. The stretch goal (per-process engine-time deltas) is intentionally deferred to a follow-up; v1 reports `gpu_utilization = 0.0` per process. Refs #247
Adds three end-to-end tests against `IntelGpuReader::get_process_info()` driven by a synthetic procfs and DRM sysfs tree: - `get_process_info_returns_empty_when_no_intel_cards` — guarantees no regression on AMD-only or NVIDIA-only hosts: the empty basename map short-circuits to `Vec::new()` without touching `/proc`. - `get_process_info_collects_fdinfo_from_render_node` — full pipeline: a synthetic Intel card with a matching `renderD<M>` render node, plus a synthetic `/proc/<pid>/fdinfo/<fd>` containing the i915 schema, must yield exactly one populated `ProcessInfo` with the correct PID, `device_id`, `device_uuid` (`Intel-GPU-<bus>` format matching `get_gpu_info`), and `used_memory` (16384 kB -> 16777216 bytes). - `get_process_info_default_filter_keeps_uses_gpu_processes` — verifies the Intel reader is compatible with the trait's default `get_gpu_processes` filter (every emitted row has `uses_gpu = true`). Also updates `docs/ARCHITECTURE.md`: the Intel client GPU section now lists per-process GPU memory accounting alongside engine-busy utilization, including the i915 / xe key sets parsed and the `drm-client-id` dedup behaviour. Refs #247
Implementation Review SummaryIntent
Findings AddressedNone — the implementation is correct and complete as submitted. No fixes required. Verification
Critical correctness checks (ultrathink)
Module structure
Remaining items
Minor docs nit (LOW)
Final verdictReady to advance to security/perf review. Implementation is correct, complete, properly integrated, conforms to project conventions, and preserves |
`cargo fmt --check` (which CI enforces) flagged three formatting nits in the new module: a path-join chain that fits on one line, a `let` binding with an unnecessary multi-line break, and a function signature that fits on one line. Run rustfmt to bring the new code in line with the rest of the workspace. No behaviour change. All 42 fdinfo / intel_gpu_linux tests still pass; clippy clean on both `--lib --tests` and the default bin target.
Security and performance reviewComprehensive review of the per-process fdinfo accounting work against the surface areas called out in the briefing. The reviewer ran FindingsMEDIUM (auto-fixed in
LOW (informational, not blocking)
Critical surface areas — verdict
VerdictReady for finalizer. One MEDIUM (formatting) was auto-fixed on this branch ( |
…dinfo in README and manpage new_with_roots is called by production IntelGpuReader::new(), not test-only. Update the doc comment to reflect that it is an internal constructor accepting arbitrary roots, with production code routing through IntelGpuReader::new. README and manpage were updated in the engine-utilization PR (#249) but neither mentioned the per-process GPU memory tracking added by this PR. Add a bullet under the Intel Arc section in README and extend the manpage Intel entry to include the fdinfo-based process accounting, mirroring the detail already present in ARCHITECTURE.md line 217.
PR Finalization CompleteSummaryLint/Format: cargo fmt idempotent (no changes after security checker's prior run). cargo clippy --lib --tests and cargo clippy bin target both clean with -D warnings. Tests: 1120 lib tests pass. No new tests added — reviewer coverage of drm-client-id dedup, cardN/renderD mapping, and i915/xe schema branch was assessed as complete. Docs: Three targeted fixes committed as f8743b1.
All checks passing. Ready for merge. |
Summary
Implements per-process GPU memory accounting for the Intel client GPU reader on Linux by parsing DRM client
/proc/<pid>/fdinfo/<fd>blocks for both thei915andxekernel drivers. Replaces theVec::new()stub inIntelGpuReader::get_process_info()so any TUI tab, dashboard, or scraper that consumesProcessInfo(e.g. the existing per-process columns in theall-smi viewTUI) now sees Intel-GPU-using processes alongside AMD/NVIDIA ones.What changed
New module
src/device/readers/intel_gpu_fdinfo.rs(andintel_gpu_fdinfo/enrichment.rs,intel_gpu_fdinfo/tests.rs):parse_fdinfo— pure-Rust parser for the i915 and xe fdinfo schemas. Sumsdrm-resident-{local0,system}(i915) ordrm-resident-{vram0,gtt}(xe) into a singleresident_bytestotal (correct on integrated GPUs too, where only the system/GTT key is present). Rejects foreign drivers (amdgpu / nvidia / nouveau) so cross-vendor hosts cannot leak the wrong reader's processes. Tolerates truncated content during process teardown without panicking.build_intel_drm_basenames— walks/sys/class/drm/once and resolves bothcardNandrenderD<M>minors that point at each reader-enumerated Intel PCI bus. Both nodes share a card index so modern Vulkan / oneAPI / ffmpeg workloads that open the render node (no master/setmaster permission flow) are captured.intel_drm_fds_for_pid— reads/proc/<pid>/fd/and returns the fds pointing at known Intel DRM nodes. Permission errors (EACCESfor fds owned by another user) degrade silently per-process.collect_intel_gpu_processes— top-level aggregator. Walks/proc, dedupes fds bydrm-client-idper(pid, card_index)(avoiding N× over-counting fromdup(2)d fds), but sums across distinct clients (multi-context workloads). Bounded by aMAX_GPU_PROCESSES = 4096cap. Returns deterministic, PID-sorted output.build_intel_process_infos(inenrichment.rs) — runs one minimal sysinforefresh_processes_specifics(cpu + memory + user) and merges sysinfo metadata intoProcessInforows. Pattern matchessrc/device/readers/amd.rsexactly so cross-vendor consumers see a consistent shape.src/device/readers/intel_gpu_linux.rs:IntelGpuReadergains anintel_drm_basenames: HashMap<String, usize>cached at construction time and aproc_root: PathBuffield for test injection.IntelGpuCardstruct shape is unchanged — no newMutex<...>fields — so the upcoming Level Zero work in feat(intel-gpu): Level Zero (oneAPI) integration for advanced metrics on Linux and Windows #248 has no rebase friction.get_process_info()body is ~10 lines: builds the(card_index, uuid)table fresh each call (soProcessInfo.device_uuidalways matches the contemporaneousGpuInfo.uuid) and delegates tobuild_intel_process_infos.has_intel_client_gpu_from_root,line_matches_intel_gpu) moved into a new siblingintel_gpu_linux/detection.rsto stay under the 500-line per-file budget. The publichas_intel_client_gpu()is a 4-line wrapper.Path-injection coverage: every public helper in
intel_gpu_fdinfoaccepts an explicitproc_root/drm_root, andIntelGpuReaderhas a privatenew_with_roots(drm_root, proc_root)constructor. The full pipeline is tested undertempfile::tempdirfixtures without touching the real/procor/sys/class/drm.Files touched:
src/device/readers/mod.rs(registered new module)src/device/readers/intel_gpu_linux.rs(integration + detection split)src/device/readers/intel_gpu_linux/detection.rs(new, extracted)src/device/readers/intel_gpu_linux/tests.rs(3 new end-to-end tests)src/device/readers/intel_gpu_fdinfo.rs(new module)src/device/readers/intel_gpu_fdinfo/enrichment.rs(new, sysinfo merge)src/device/readers/intel_gpu_fdinfo/tests.rs(23 unit tests)docs/ARCHITECTURE.md(Intel section updated)Deferred: per-process engine-time utilization
The issue body lists per-process engine-time utilization as a stretch goal that would reuse PR #249's
EngineStatemachinery with a per-PID delta tracker keyed ondrm-client-id. That work is intentionally deferred to a separate follow-up PR. Reasoning: the stretch goal would add aMutex<ProcessEngineState>field toIntelGpuCard, and the next issue in the queue (#248, Level Zero integration) will also touch the struct shape. KeepingIntelGpuCardunchanged here makes #248's rebase trivial. The deferred work has a clean integration point — the same fdinfo walker can sampledrm-engine-*counters next to thedrm-resident-*keys it already reads — so the follow-up is mechanical.Test plan
cargo check --lib --testscleancargo clippy --lib --tests -- -D warningscleancargo clippy -- -D warningsclean (bin-target path that caught the bin-only unused-import issue in PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249)cargo test --lib device::readers::intel_gpu_fdinfo— 23 tests pass (i915 + xe parsing, integrated vs discrete, foreign-driver rejection, truncated/malformed input, kB-to-bytes, card+render mapping for one and two cards, AMD render-node rejection, connector-child filtering, client-id dedup, distinct-client summing, multi-card grouping, no-Intel-card short-circuit, missing-client-id tolerance)cargo test --lib device::readers::intel_gpu_linux— 19 tests pass (16 prior + 3 new: empty-when-no-cards, render-node fdinfo end-to-end, trait default filter compatibility)Acceptance criteria status
get_process_info()returns non-empty on a host with Intel-GPU-using processes (awaits hardware verification by maintainer)used_memorymatchesintel_gpu_top -pwithin rounding tolerance (awaits hardware verification by maintainer)EACCESon/proc/<pid>/fd/is silently skipped per-process, no panics, no per-process log spami915andxedrivers are handled — the parser branches ondrm-driverand accepts both schemas; unit tests cover bothProcessInfoflows through the existingGpuReadertrait without per-vendor special casing (the trait's defaultget_gpu_processesfilter is verified compatible)Closes #247