Skip to content

feat(intel-gpu): per-process GPU memory accounting via fdinfo#250

Merged
inureyes merged 5 commits into
mainfrom
feat/issue-247-intel-fdinfo-per-process
May 27, 2026
Merged

feat(intel-gpu): per-process GPU memory accounting via fdinfo#250
inureyes merged 5 commits into
mainfrom
feat/issue-247-intel-fdinfo-per-process

Conversation

@inureyes

Copy link
Copy Markdown
Member

Summary

Implements per-process GPU memory accounting for the Intel client GPU reader on Linux by parsing DRM client /proc/<pid>/fdinfo/<fd> blocks for both the i915 and xe kernel drivers. Replaces the Vec::new() stub in IntelGpuReader::get_process_info() so any TUI tab, dashboard, or scraper that consumes ProcessInfo (e.g. the existing per-process columns in the all-smi view TUI) now sees Intel-GPU-using processes alongside AMD/NVIDIA ones.

What changed

New module src/device/readers/intel_gpu_fdinfo.rs (and intel_gpu_fdinfo/enrichment.rs, intel_gpu_fdinfo/tests.rs):

  • parse_fdinfo — pure-Rust parser for the i915 and xe fdinfo schemas. Sums drm-resident-{local0,system} (i915) or drm-resident-{vram0,gtt} (xe) into a single resident_bytes total (correct on integrated GPUs too, where only the system/GTT key is present). Rejects foreign drivers (amdgpu / nvidia / nouveau) so cross-vendor hosts cannot leak the wrong reader's processes. Tolerates truncated content during process teardown without panicking.
  • build_intel_drm_basenames — walks /sys/class/drm/ once and resolves both cardN and renderD<M> minors that point at each reader-enumerated Intel PCI bus. Both nodes share a card index so modern Vulkan / oneAPI / ffmpeg workloads that open the render node (no master/setmaster permission flow) are captured.
  • intel_drm_fds_for_pid — reads /proc/<pid>/fd/ and returns the fds pointing at known Intel DRM nodes. Permission errors (EACCES for fds owned by another user) degrade silently per-process.
  • collect_intel_gpu_processes — top-level aggregator. Walks /proc, dedupes fds by drm-client-id per (pid, card_index) (avoiding N× over-counting from dup(2)d fds), but sums across distinct clients (multi-context workloads). Bounded by a MAX_GPU_PROCESSES = 4096 cap. Returns deterministic, PID-sorted output.
  • build_intel_process_infos (in enrichment.rs) — runs one minimal sysinfo refresh_processes_specifics (cpu + memory + user) and merges sysinfo metadata into ProcessInfo rows. Pattern matches src/device/readers/amd.rs exactly so cross-vendor consumers see a consistent shape.

src/device/readers/intel_gpu_linux.rs:

  • IntelGpuReader gains an intel_drm_basenames: HashMap<String, usize> cached at construction time and a proc_root: PathBuf field for test injection.
  • IntelGpuCard struct shape is unchanged — no new Mutex<...> fields — so the upcoming Level Zero work in feat(intel-gpu): Level Zero (oneAPI) integration for advanced metrics on Linux and Windows #248 has no rebase friction.
  • get_process_info() body is ~10 lines: builds the (card_index, uuid) table fresh each call (so ProcessInfo.device_uuid always matches the contemporaneous GpuInfo.uuid) and delegates to build_intel_process_infos.
  • Detection helpers (has_intel_client_gpu_from_root, line_matches_intel_gpu) moved into a new sibling intel_gpu_linux/detection.rs to stay under the 500-line per-file budget. The public has_intel_client_gpu() is a 4-line wrapper.

Path-injection coverage: every public helper in intel_gpu_fdinfo accepts an explicit proc_root / drm_root, and IntelGpuReader has a private new_with_roots(drm_root, proc_root) constructor. The full pipeline is tested under tempfile::tempdir fixtures without touching the real /proc or /sys/class/drm.

Files touched:

  • src/device/readers/mod.rs (registered new module)
  • src/device/readers/intel_gpu_linux.rs (integration + detection split)
  • src/device/readers/intel_gpu_linux/detection.rs (new, extracted)
  • src/device/readers/intel_gpu_linux/tests.rs (3 new end-to-end tests)
  • src/device/readers/intel_gpu_fdinfo.rs (new module)
  • src/device/readers/intel_gpu_fdinfo/enrichment.rs (new, sysinfo merge)
  • src/device/readers/intel_gpu_fdinfo/tests.rs (23 unit tests)
  • docs/ARCHITECTURE.md (Intel section updated)

Deferred: per-process engine-time utilization

The issue body lists per-process engine-time utilization as a stretch goal that would reuse PR #249's EngineState machinery with a per-PID delta tracker keyed on drm-client-id. That work is intentionally deferred to a separate follow-up PR. Reasoning: the stretch goal would add a Mutex<ProcessEngineState> field to IntelGpuCard, and the next issue in the queue (#248, Level Zero integration) will also touch the struct shape. Keeping IntelGpuCard unchanged here makes #248's rebase trivial. The deferred work has a clean integration point — the same fdinfo walker can sample drm-engine-* counters next to the drm-resident-* keys it already reads — so the follow-up is mechanical.

Test plan

  • cargo check --lib --tests clean
  • cargo clippy --lib --tests -- -D warnings clean
  • cargo clippy -- -D warnings clean (bin-target path that caught the bin-only unused-import issue in PR feat(intel-gpu): compute Linux utilization from engine-busy counters #249)
  • cargo test --lib device::readers::intel_gpu_fdinfo — 23 tests pass (i915 + xe parsing, integrated vs discrete, foreign-driver rejection, truncated/malformed input, kB-to-bytes, card+render mapping for one and two cards, AMD render-node rejection, connector-child filtering, client-id dedup, distinct-client summing, multi-card grouping, no-Intel-card short-circuit, missing-client-id tolerance)
  • cargo test --lib device::readers::intel_gpu_linux — 19 tests pass (16 prior + 3 new: empty-when-no-cards, render-node fdinfo end-to-end, trait default filter compatibility)

Acceptance criteria status

  • get_process_info() returns non-empty on a host with Intel-GPU-using processes (awaits hardware verification by maintainer)
  • used_memory matches intel_gpu_top -p within rounding tolerance (awaits hardware verification by maintainer)
  • Permission errors degrade gracefully — EACCES on /proc/<pid>/fd/ is silently skipped per-process, no panics, no per-process log spam
  • Both i915 and xe drivers are handled — the parser branches on drm-driver and accepts both schemas; unit tests cover both
  • Integrated end-to-end — ProcessInfo flows through the existing GpuReader trait without per-vendor special casing (the trait's default get_gpu_processes filter is verified compatible)
  • (Stretch) Per-process engine-time utilization — deferred to follow-up per the scope decision above

Closes #247

inureyes added 3 commits May 27, 2026 12:05
Introduces a new stateless module `intel_gpu_fdinfo` that parses Intel DRM client `/proc/<pid>/fdinfo/<fd>` blocks and correlates fds back to a reader-known card index. Provides:

- `parse_fdinfo` — pure-string parser for the i915 and xe schemas. Handles truncated / malformed input without panicking, rejects foreign drivers (amdgpu / nvidia), and normalises memory values from kB to bytes.
- `build_intel_drm_basenames` — walks `/sys/class/drm` once to find every `cardN` and `renderD<M>` minor that maps to one of the reader-enumerated Intel PCI devices. Both nodes share a card index so modern Vulkan / oneAPI / ffmpeg workloads opening the render node are captured.
- `intel_drm_fds_for_pid` — reads `/proc/<pid>/fd/` and returns the fds pointing at known Intel DRM nodes. Permission errors degrade silently.
- `collect_intel_gpu_processes` — top-level aggregator. Walks `/proc`, dedupes fds by `drm-client-id` per process+card (avoiding N× over-counting from `dup(2)`d fds) but sums across distinct clients (multi-context workloads). Returns deterministic, PID-sorted output.

The module is path-injection friendly: every public helper accepts an explicit `proc_root` / `drm_root` so the entire walker is testable under `tempfile::tempdir` fixtures without touching the real `/proc` or `/sys`.

23 unit tests cover: i915 + xe parsing, integrated vs discrete schemas, foreign-driver rejection, truncated input, kB-to-bytes conversion, the card/renderD mapping for one and two cards, AMD render-node rejection, connector-child filtering, client-id dedup, distinct-client summing, multi-card grouping, and graceful no-Intel-card short-circuit.

Refs #247
Replaces the `Vec::new()` stub in `IntelGpuReader::get_process_info()` with a full implementation built on top of the new `intel_gpu_fdinfo` module. The reader now:

- Caches an `intel_drm_basenames` map at construction (one entry per `cardN` and `renderD<M>` known to belong to an Intel PCI device), so the per-process refresh is a flat `/proc` walk with no extra sysfs probing.
- Threads a `proc_root` field through the constructor for test injection — production stays `/proc`.
- Builds a `card_index -> uuid` table on each call and delegates to `build_intel_process_infos`, which collects `(pid, card_index, used_memory_bytes)` aggregates, performs one minimal sysinfo refresh, and merges sysinfo metadata (cpu_percent, user, state, rss, vms, etc.) into the final `ProcessInfo` rows. Pattern matches AMD's reader exactly so cross-vendor consumers see a consistent shape.

To stay under the 500-line per-file budget after the integration, two cohesive subsections moved into siblings:

- `intel_gpu_linux/detection.rs` — `has_intel_client_gpu_from_root` and `line_matches_intel_gpu`. The public `has_intel_client_gpu` is now a 4-line wrapper.
- `intel_gpu_fdinfo/enrichment.rs` — the sysinfo merge helper. The parent module re-exports `build_intel_process_infos`, so the public API is unchanged.

File sizes after refactor (all <500):
- `intel_gpu_linux.rs`        474
- `intel_gpu_fdinfo.rs`       483
- `intel_gpu_fdinfo/enrichment.rs` 107
- `intel_gpu_fdinfo/tests.rs`     494
- `intel_gpu_linux/detection.rs`  86

Behaviour on hosts without Intel-GPU-using processes is unchanged: an empty basename map (no Intel GPUs detected) short-circuits to `Vec::new()`, and the fdinfo walker returns empty when no process holds an Intel DRM fd. The stretch goal (per-process engine-time deltas) is intentionally deferred to a follow-up; v1 reports `gpu_utilization = 0.0` per process.

Refs #247
Adds three end-to-end tests against `IntelGpuReader::get_process_info()` driven by a synthetic procfs and DRM sysfs tree:

- `get_process_info_returns_empty_when_no_intel_cards` — guarantees no regression on AMD-only or NVIDIA-only hosts: the empty basename map short-circuits to `Vec::new()` without touching `/proc`.
- `get_process_info_collects_fdinfo_from_render_node` — full pipeline: a synthetic Intel card with a matching `renderD<M>` render node, plus a synthetic `/proc/<pid>/fdinfo/<fd>` containing the i915 schema, must yield exactly one populated `ProcessInfo` with the correct PID, `device_id`, `device_uuid` (`Intel-GPU-<bus>` format matching `get_gpu_info`), and `used_memory` (16384 kB -> 16777216 bytes).
- `get_process_info_default_filter_keeps_uses_gpu_processes` — verifies the Intel reader is compatible with the trait's default `get_gpu_processes` filter (every emitted row has `uses_gpu = true`).

Also updates `docs/ARCHITECTURE.md`: the Intel client GPU section now lists per-process GPU memory accounting alongside engine-busy utilization, including the i915 / xe key sets parsed and the `drm-client-id` dedup behaviour.

Refs #247
@inureyes inureyes added status:review Under review type:enhancement New feature or request priority:low Low priority issue labels May 27, 2026
@inureyes

Copy link
Copy Markdown
Member Author

Implementation Review Summary

Intent

Fill in IntelGpuReader::get_process_info() (previously Vec::new()) with real per-process GPU memory accounting from /proc/<pid>/fdinfo/<fd> for both i915 and xe drivers. Stretch goal (per-process engine-time) intentionally deferred to keep IntelGpuCard struct shape stable for #248's rebase.

Findings Addressed

None — the implementation is correct and complete as submitted. No fixes required.

Verification

  • All stated requirements implemented (memory accounting via fdinfo for i915 + xe, render+card node mapping, drm-client-id dedup, EACCES silent skip, kB→bytes conversion, sysinfo enrichment mirroring AMD pattern)
  • No placeholder/mock code remaining (the Vec::new() stub is fully replaced; gpu_utilization: 0.0 is a documented stretch-goal deferral, not a stub)
  • Integrated into project code flow (IntelGpuReader::new() already registered in reader_factory.rs; get_process_info() flows through the GpuReader trait without per-vendor specialization)
  • Project conventions followed (conventional-commits subjects all ≤72 chars; English-only; no AI attribution in commits, PR body, or code; no unwrap()/expect() in library code; let-else and Rust 1.58+ inline format args used)
  • Existing modules reused (with_global_system, crate::device::process_list::get_all_processes, ProcessRefreshKind::nothing().with_cpu().with_memory().with_user(UpdateKind::OnlyIfNotSet) — exact mirror of the AMD reader pattern at src/device/readers/amd.rs:599-608)
  • No unintended structural changes (IntelGpuCard struct shape is unchanged per the explicit orchestrator instruction — diff confirms only IntelGpuReader gained intel_drm_basenames and proc_root fields, and the 25-line reduction in intel_gpu_linux.rs is pure code movement to detection.rs)
  • Tests pass (cargo check --lib --tests, cargo clippy --lib --tests -- -D warnings, cargo clippy -- -D warnings, 23 fdinfo tests, 19 linux tests — all green from a clean build)

Critical correctness checks (ultrathink)

  • cardNrenderD<M> mapping: build_intel_drm_basenames correctly registers BOTH nodes by walking /sys/class/drm and matching the basename of each node's device symlink target (the PCI bus identifier) against the pre-enumerated card list. Tests build_intel_drm_basenames_maps_card_and_render_to_same_index and build_intel_drm_basenames_two_cards_two_render_nodes cover the common case. Non-Intel render nodes are excluded by PCI-bus mismatch (test build_intel_drm_basenames_ignores_non_intel_render_nodes).
  • drm-client-id deduplication: Per-(pid, card_index) the parser keeps a HashMap<client_id, max_bytes> (so N fds sharing one client report ONE memory amount, not N) and sums by_client.values() across distinct client IDs. Test collect_intel_gpu_processes_dedupes_by_client_id proves no double-count; collect_intel_gpu_processes_sums_distinct_clients proves distinct clients still aggregate. NOT a per-(pid, fd) dedup.
  • i915 vs xe schema: Parser uses k.starts_with("drm-resident-") to sum every resident-memory key regardless of suffix (-system, -local0, -gtt, -vram0). Integrated GPUs (with only one key present) resolve correctly. Three tests cover both schemas explicitly.
  • Permission errors: EACCES on /proc/<pid>/fd/, read_link, and fdinfo all degrade silently via let-else/match — no eprintln! in the entire new module.
  • kB → bytes: parse_memory_value multiplies by 1024 for kb/kib. Test parse_fdinfo_kb_multiplier_is_1024 verifies 1 kB → 1024 bytes.
  • Malformed fdinfo: Truncated mid-key, non-numeric values, missing client-id — all skipped or treated as None without panic (4 tests cover these paths).
  • /proc injection: IntelGpuReader::new_with_roots(drm_root, proc_root) enables tempdir-based integration tests; IntelGpuReader::new() passes Path::new("/proc") for production. Integration tests in intel_gpu_linux/tests.rs exercise the full pipeline end-to-end.
  • No regression for non-Intel hosts: collect_intel_gpu_processes short-circuits when intel_drm_basenames is empty. Test get_process_info_returns_empty_when_no_intel_cards covers this directly. AMD-only hosts skip the IntelGpuReader entirely via reader_factory.rs registration.

Module structure

  • intel_gpu_fdinfo{.rs,/enrichment.rs,/tests.rs} mirrors intel_gpu_engine{.rs,/discovery.rs,/tests.rs} 3-file pattern.
  • File-size budgets: intel_gpu_fdinfo.rs 483 lines, enrichment.rs 107, tests.rs 494, intel_gpu_linux.rs 474 (was 499 — kept under 500 by extracting detection.rs), detection.rs 86. All under 500.
  • mod.rs declares intel_gpu_fdinfo under the same #[cfg(target_os = "linux")] gate as intel_gpu_engine.

Remaining items

  • Hardware verification ACs (get_process_info non-empty on real Intel GPU host, used_memory matches intel_gpu_top -p) remain unchecked with (awaits hardware verification by maintainer) — appropriate per the convention.
  • Stretch goal (per-process engine-time utilization) intentionally deferred to a follow-up PR; documented in PR body and tracked by ACs.

Minor docs nit (LOW)

intel_gpu_linux.rs:146 comment says "Test-only constructor" for new_with_roots, but it is also called by IntelGpuReader::new() (production). The function is correctly not #[cfg(test)] gated — the comment should read "Internal constructor that accepts arbitrary DRM and proc roots (production code uses default paths via [IntelGpuReader::new])." Not worth a separate fix cycle.

Final verdict

Ready to advance to security/perf review. Implementation is correct, complete, properly integrated, conforms to project conventions, and preserves IntelGpuCard struct shape for #248's rebase. All narrow-scope verifications pass on a clean build.

`cargo fmt --check` (which CI enforces) flagged three formatting nits in the new module: a path-join chain that fits on one line, a `let` binding with an unnecessary multi-line break, and a function signature that fits on one line. Run rustfmt to bring the new code in line with the rest of the workspace.

No behaviour change. All 42 fdinfo / intel_gpu_linux tests still pass; clippy clean on both `--lib --tests` and the default bin target.
@inureyes

Copy link
Copy Markdown
Member Author

Security and performance review

Comprehensive review of the per-process fdinfo accounting work against the surface areas called out in the briefing. The reviewer ran cargo check --lib --tests, cargo clippy --lib --tests -- -D warnings, cargo clippy -- -D warnings, cargo test --lib device::readers::intel_gpu_fdinfo (23 pass), and cargo test --lib device::readers::intel_gpu_linux (19 pass) on this branch.

Findings

MEDIUM (auto-fixed in 9bb3d75)

  • cargo fmt --check flagged three formatting nits in the new module (one path-join chain, one let binding, one function signature). CI runs cargo fmt --check (see .github/workflows/ci.yml:49), so this would have blocked merge. Applied cargo fmt, verified all 42 tests still pass, pushed as a separate style(intel-gpu) commit on this branch.

LOW (informational, not blocking)

  • collect_intel_gpu_processes only increments the MAX_GPU_PROCESSES cap counter for processes that have at least one Intel DRM fd (src/device/readers/intel_gpu_fdinfo.rs:407-410). On a host with thousands of idle PIDs and zero Intel-using processes, the full /proc walk still pays one read_dir(/proc/<pid>/fd/) per PID. This matches the AMD reader's cost shape via libamdgpu_top::FdInfoStat::update_proc_usage and is acceptable for v1, but it means the cap is a credit cap, not an enumeration cap. Worth noting in a follow-up if /proc walk latency ever becomes a hotspot.

  • The missing-drm-client-id fallback at src/device/readers/intel_gpu_fdinfo.rs:428-437 keys by fdinfo_path (which is unique per fd). Two fds in the same process to the same DRM client on a pre-drm-client-id kernel would still double-count. In practice this is a non-issue: the kernels that expose drm-resident-* (Linux >=5.19 i915, all xe) also emit drm-client-id, so the over-count only triggers on a hypothetical kernel that emits memory keys without the client-id — and on such a kernel resident_bytes is typically zero anyway.

  • read_to_string on fdinfo files has no take(N) upper bound (src/device/readers/intel_gpu_fdinfo.rs:413). In practice the kernel emits at most ~30 lines (~2 KB) per fdinfo, but a defence-in-depth byte cap would harden against a future kernel change emitting unbounded counters.

Critical surface areas — verdict

  • /proc/<pid>/fd/ traversal & EACCES: intel_drm_fds_for_pid uses let Ok(entries) = std::fs::read_dir(&fd_dir) else { return Vec::new(); } (line 307). Silently degrades on EACCES and ENOENT. No eprintln!, no unwrap, no panic! anywhere in the production path. Pass.
  • Symlink-follow safety: The walker uses std::fs::read_link(&fd_path) (line 320), returning only the symlink target as a PathBuf without following it. It then matches target.file_name() (basename only) against the HashMap<String, usize> of known Intel DRM nodes. No read_to_string on the fd entry itself; no path-join with the read_link result; no opportunity to follow a fd pointing at a sensitive file. Pass.
  • Path-injection via drm-pdev: drm_pdev is captured into FdInfo (line 159) but never used to construct a path anywhere in the pipeline. Pass.
  • TOCTOU on process exit: read_to_string(&fd.fdinfo_path) Err → continue (line 415); read_link Err → continue (line 321); read_dir(proc_root) Err → empty Vec. Process disappearance is a normal-not-error case throughout. Pass.
  • fdinfo parsing safety: All parse::<u64>() calls use .ok() (line 129) or ? chains; truncated/malformed input skips lines via let Some(...) = ... else { continue; } patterns; saturating_add/saturating_mul used on every memory math op (lines 141, 181, 456, 461). Non-UTF8 fdinfo would fail at read_to_string and skip cleanly. Pass.
  • resident vs total selection: Only drm-resident-* keys are summed (line 139: k.starts_with("drm-resident-")). drm-total-* is intentionally ignored per the kernel drm-usage-stats.rst spec — total includes freed pages. Tests verify this. Pass.
  • Cache reuse: build_intel_drm_basenames is called exactly once at IntelGpuReader::new_with_roots (line 154 of intel_gpu_linux.rs); subsequent get_process_info calls read self.intel_drm_basenames (line 299). The only per-call rebuild is card_uuids (O(cards), microseconds). Pass.
  • sysinfo refresh narrowness: ProcessRefreshKind::nothing().with_cpu().with_memory().with_user(UpdateKind::OnlyIfNotSet) (enrichment.rs:63-66). Matches AMD exactly; no with_disk_usage, no with_environ. Pass.

Verdict

Ready for finalizer. One MEDIUM (formatting) was auto-fixed on this branch (9bb3d75). The remaining LOW findings are informational and do not block merge.

…dinfo in README and manpage

new_with_roots is called by production IntelGpuReader::new(), not test-only. Update the doc comment to reflect that it is an internal constructor accepting arbitrary roots, with production code routing through IntelGpuReader::new.

README and manpage were updated in the engine-utilization PR (#249) but neither mentioned the per-process GPU memory tracking added by this PR. Add a bullet under the Intel Arc section in README and extend the manpage Intel entry to include the fdinfo-based process accounting, mirroring the detail already present in ARCHITECTURE.md line 217.
@inureyes

Copy link
Copy Markdown
Member Author

PR Finalization Complete

Summary

Lint/Format: cargo fmt idempotent (no changes after security checker's prior run). cargo clippy --lib --tests and cargo clippy bin target both clean with -D warnings.

Tests: 1120 lib tests pass. No new tests added — reviewer coverage of drm-client-id dedup, cardN/renderD mapping, and i915/xe schema branch was assessed as complete.

Docs: Three targeted fixes committed as f8743b1.

  • src/device/readers/intel_gpu_linux.rs line 146: stale "Test-only constructor" doc comment replaced with accurate description of the function as an internal constructor called by both production code and tests.
  • README.md: added per-process fdinfo bullet under the Intel Arc section (was missing despite ARCHITECTURE.md already documenting it); updated the Processes metrics line to include Intel Arc/Xe alongside AMD.
  • docs/man/all-smi.1: extended the Intel Arc entry to mention per-process GPU memory tracking via /proc fdinfo, consistent with the ARCHITECTURE.md description.

All checks passing. Ready for merge.

@inureyes inureyes added status:done Completed and removed status:review Under review labels May 27, 2026
@inureyes inureyes merged commit c95c559 into main May 27, 2026
4 checks passed
@inureyes inureyes deleted the feat/issue-247-intel-fdinfo-per-process branch May 27, 2026 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:low Low priority issue status:done Completed type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(intel-gpu): per-process GPU memory accounting via fdinfo on Linux

1 participant