feat(view): cluster-wide user/process aggregation tab ('V' key)

## Summary

Add a new "Users" tab to remote `view` mode that aggregates process information across all scraped hosts, grouping by user so operators can answer "who is using the cluster right now and how much?" at a glance. Drill-down to a selected user shows per-node, per-GPU, per-process detail.

## Motivation

`all-smi` already supports `--processes` in `api` mode, which emits per-process metrics with a `username` label. But the remote `view` has no tab that consumes those metrics cluster-wide: process data remains a local concept. For platform operators (Slurm, Backend.AI, Kubernetes GPU pools) the most valuable operational question is cluster-level: who is using what, where, for how long, at what power cost? An aggregation tab turns per-node process data into a first-class operator signal.

## Current state

- `all-smi api --processes` emits per-process Prometheus metrics with labels including `pid`, `user`, `command`, and the host tag.
- The remote view currently renders an "All" tab (summary across hosts) and per-host tabs, but no user-grouped view.
- `src/network/metrics_parser.rs` parses the GPU/CPU/memory metric families — process metrics may or may not be fully parsed into a remote-side structure yet; must be verified and extended as needed.
- `src/api/metrics/` emits the process metric family when `--processes` is passed.

## Proposed design

### Tab and keybindings

- New tab accessible via `V` (mnemonic: "Users").
- Within the tab:
  - `u` sort by username (default)
  - `m` sort by total GPU memory
  - `p` sort by total power (derived)
  - `n` sort by node count
  - `t` sort by oldest process start / longest TIME+
  - `Enter` drill down on the highlighted user
  - `ESC` exits drill-down
  - `e` exports the current view to CSV at `~/.cache/all-smi/users-<timestamp>.csv`
  - `f` toggles a system-process filter (hide uid < 1000 / root by default)

### Top-level aggregation table

Columns (all visible only if width permits; collapse low-priority columns on narrow terminals):

```
USER            NODES   GPUs   PROCS   VRAM        POWER*    LONGEST    CMD (top-1 by GPU mem)
inureyes        3       12     18      384 GiB     2.3 kW   6d 03:12   python train.py --bs=...
yeonji          1       4      1       48 GiB      0.9 kW   2:15:02    /opt/llm/infer -m ...
root            5       0      7       0           —        —          containerd-shim
```

`POWER*` is an approximation: `sum(gpu.power_consumption * (user_vram_on_gpu / gpu_total_vram_used_across_all_users))` per GPU, summed across GPUs a user touches. Mark with `*` in header tooltip and document the methodology.

`LONGEST` uses `TIME+` fields already tracked per process (wall clock since start).

### Drill-down view

On Enter, show per-node rows for that user:

```
inureyes / 3 nodes, 12 GPUs
─ dgx-01   GPU 0-3     VRAM 128 GiB   Power 760 W   4 PIDs    python train.py --bs=128 ...
─ dgx-02   GPU 4-7     VRAM 128 GiB   Power 780 W   4 PIDs    python train.py --bs=128 ...
─ dgx-03   GPU 0-3     VRAM 128 GiB   Power 760 W   4 PIDs    python eval.py --split=val
```

`Enter` again drills to the full process list for that user on the selected node (reuse the existing process renderer).

### Partial coverage indicator

If some hosts don't have `--processes` enabled in their API, show a chip `⚠ partial coverage: 3 of 5 nodes reporting process data` at the top of the tab so operators don't misread the numbers.

## Implementation plan

Files to add / modify:

- `src/network/metrics_parser.rs` — confirm process-family parsing; extend to build a `Vec<ParsedProcessRow>` with `host`, `pid`, `user`, `command`, `gpu_index`, `gpu_memory_bytes`, `cpu_pct`, `start_time`.
- `src/api/metrics/` (or equivalent submodule) — verify the per-process metric family includes labels `host`, `user`, `pid`, `gpu_index`. Add any missing labels, add `all_smi_process_start_time_seconds` counter for TIME+ derivation.
- New `src/ui/aggregation/user.rs`:
  - `aggregate_users(&[HostSnapshot]) -> Vec<UserAggregate>` pure function (heavily unit-testable).
  - Handles (host, pid) identity keying so the same PID on different hosts isn't collapsed.
  - Computes the power approximation with documented formula.
- `src/ui/tabs.rs` — add the `Users` variant; integrate with tab cycling and `V` hotkey.
- New `src/ui/renderers/user_renderer.rs` — table + drill-down rendering using the existing `widgets` for styling.
- `src/app_state.rs` — add `users_tab_state { sort, selected_user, drill_host }`, CSV export path.
- `src/view/render_snapshot.rs` — include the aggregated user view so it flows through the snapshot/replay pipeline (so replays also include the Users tab).
- `src/mock/generator.rs` — generate synthetic process entries per mock node so the mock server exercises this tab. Gate behind `ALL_SMI_MOCK_PROCESSES=1` (default on — since `--processes` is already a flag, just emit them in mock always, or follow the existing convention for flagged mock data).
- `src/ui/help.rs` — add the `V` shortcut and the in-tab keys.

## Acceptance criteria

- [x] `all-smi view --hostfile hosts.csv` with hosts in API mode `--processes` shows a `Users` tab.
- [x] Sorting hotkeys (`u`, `m`, `p`, `n`, `t`) re-rank the table correctly.
- [x] `Enter` drills down; `ESC` returns.
- [x] CSV export produces a well-formed file (header + one row per user).
- [x] If zero hosts report process data, the tab shows a clear "no process data; enable --processes on API mode" hint — not an empty table.
- [x] Partial-coverage chip is visible when some hosts report and others don't.
- [x] Unit tests on `aggregate_users` covering: empty input, same PID on two hosts, root filter, user with processes on multiple GPUs across multiple hosts, oldest TIME+ computation.
- [x] Mock cluster with 5 simulated nodes × 3 simulated users renders correctly.
- [x] Works over replayed snapshots (see companion record/replay issue).
- [x] Documentation updated: README Users tab section; help overlay lists the new shortcuts.

## Edge cases & non-goals

- Very large clusters (100+ nodes, 50+ processes each): aggregation must complete within one render budget (< 50 ms). Use a single pass; do not re-group on every keypress — cache the aggregation keyed on snapshot version.
- Power approximation: document the formula; ensure it never produces negative values (cap at zero if numerical issues arise). Mark as "approximation" in the UI.
- Username may be missing on some hosts (Windows API mode). Render as `?` and include such rows in a separate "unattributed" group.
- Non-goal: querying external user directories (LDAP) — `user` is whatever the API reports.
- Non-goal: alerts on user-level usage — follow-up.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(view): cluster-wide user/process aggregation tab ('V' key) #189

Summary

Motivation

Current state

Proposed design

Tab and keybindings

Top-level aggregation table

Drill-down view

Partial coverage indicator

Implementation plan

Acceptance criteria

Edge cases & non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(view): cluster-wide user/process aggregation tab ('V' key) #189

Description

Summary

Motivation

Current state

Proposed design

Tab and keybindings

Top-level aggregation table

Drill-down view

Partial coverage indicator

Implementation plan

Acceptance criteria

Edge cases & non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions