feat(tui): topology view tab ('T') — NvLink/NUMA/PCIe graph and matrix

## Summary

Add a dedicated topology tab (`T`) that visualizes intra-node GPU interconnect structure: NvLink connections (GPU↔GPU, GPU↔NvSwitch), NUMA affinity, and PCIe lanes. Includes a graph-style ASCII layout and a `nvidia-smi topo -m`-style matrix mode. Works on systems without NvLink by falling back to a PCIe-only rendering. This surfaces data the project already collects but currently has no home.

## Motivation

Modern GPU work depends heavily on topology: collective communication (NCCL all-reduce) performance is dominated by NvLink availability, NUMA affinity between CPU and GPU changes throughput meaningfully, and operators frequently need to answer "are these 8 GPUs fully-meshed or switch-connected?" without a second tool. `all-smi` already reads `nvlink_remote_devices`, `numa_node_id`, and PCIe info per GPU (from recent PRs #175 and the hardware-details series) — the data round-trips through the Prometheus exporter too. But no rendering surfaces this topology; it's invisible to the user.

## Current state

- `GpuInfo.nvlink_remote_devices: Vec<NvLinkRemoteDevice>` is populated by the NVIDIA reader.
- `GpuInfo.numa_node_id: Option<i32>` is populated on Linux hosts with NUMA topology.
- PCIe info lives in `GpuInfo.detail` map.
- None of the above is rendered anywhere in the TUI — only visible via Prometheus metrics.

## Proposed design

### Tab and modes

- New tab `T` (mnemonic: "Topology"). In remote mode, default to showing the currently selected host's topology (same host as the per-host tab); `Tab`/`Shift-Tab` cycles nodes. In local mode, always the local node.
- Two render modes, toggled with `M`:
  - **Graph mode** (default): ASCII graph showing NUMA zones as boxes, GPUs inside, edges for NvLink and NvSwitch. Active links green, degraded yellow, inactive gray.
  - **Matrix mode**: `nvidia-smi topo -m`-equivalent table with CPU affinity and NUMA columns.

### Graph rendering example

```
NODE: dgx-01   |   8× H200 (SXM)   |   PCIe Gen5 x16
┌─────── NUMA 0 ──────────┐   ┌─────── NUMA 1 ──────────┐
│ [GPU 0] ── NV8 ── [GPU 1]│   │ [GPU 4] ── NV8 ── [GPU 5]│
│    │  \            /   │ │   │    │  \            /   │ │
│   NV4   nvsw ── nvsw  NV4│   │   NV4   nvsw ── nvsw  NV4│
│    │  /            \   │ │   │    │  /            \   │ │
│ [GPU 2] ── NV8 ── [GPU 3]│   │ [GPU 6] ── NV8 ── [GPU 7]│
└─────────────────────────┘   └─────────────────────────┘

Active NvLinks: 16 / 16    Remote classes: gpu=8, switch=8
CPU Affinity:   GPU0-3 → CPU 0-55, 112-167   GPU4-7 → CPU 56-111, 168-223
```

### Matrix rendering example

```
            GPU0  GPU1  GPU2  GPU3  GPU4  GPU5  GPU6  GPU7   CPU Affinity   NUMA
GPU0         X    NV8   NV8   NV8   SYS   SYS   SYS   SYS    0-55,112-167   0
GPU1        NV8    X    NV8   NV8   SYS   SYS   SYS   SYS    0-55,112-167   0
...
Legend:  X=self   NVn=NvLink Gen-n   PXB=PCIe bridge   SYS=PCIe across NUMA   NODE=PCIe same NUMA
```

### Graceful degradation

- No NvLink present → graph shows NUMA boxes with PCIe-annotated GPUs only.
- Not NVIDIA → topology mode still renders NUMA + PCIe groupings; no `SYS`/`NVn` columns.
- `nvlink_remote_devices` empty → dim "no active NvLinks" placeholder.

## Implementation plan

Files to add / modify:

- `src/ui/tabs.rs` — add `Topology` tab variant, integrate with tab cycle, `T` hotkey.
- New `src/ui/topology/mod.rs`:
  - `TopologyModel` built from `GpuInfo + chassis/NUMA info`.
  - `layout.rs` — groups GPUs by NUMA, decides horizontal vs vertical stacking based on terminal width; places NvSwitch nodes deterministically.
  - `graph_render.rs` — ASCII edges; uses box-drawing chars already used in `src/ui/widgets.rs`.
  - `matrix_render.rs` — tabular view; column widths sized to fit GPU count.
  - `classify_edge()` — derives `NVn` bandwidth hint from active link count (if NVML provides per-link bandwidth; otherwise fallback to "NV"). If per-link bandwidth isn't currently captured, extend `NvLinkRemoteDevice` with `bandwidth_mb_s: Option<u32>` and populate from NVML. This extension must serialize cleanly through Prometheus and the network parser.
- New `src/ui/renderers/topology_renderer.rs` — draws a Topology panel for the selected host.
- `src/device/readers/nvidia_hardware.rs` / `nvidia.rs` — ensure all needed data (per-link state, remote type, bandwidth) is exposed; extend if needed.
- `src/device/types.rs` — optional new fields as needed; bump a `mod-internal` schema version for recorded snapshots (see Record/Replay issue) if the shape changes.
- `src/api/metrics/` — export any new fields added above as Prometheus labels/metrics so remote view sees them. Follow the `gpu_index`/`gpu_uuid` label convention established in PR #181.
- `src/network/metrics_parser.rs` — parse those new labels/metrics back into `GpuInfo` for the remote path.
- `src/mock/` — mock templates for DGX-like topology (8 GPUs, 2 NUMA nodes, full NvLink mesh) so the topology tab is exercised without real hardware. Gate with `ALL_SMI_MOCK_TOPOLOGY=1` following the existing mock env-var convention.
- `src/ui/help.rs` — document `T` and the in-tab `M` mode toggle.

## Acceptance criteria

- [x] On a system with NvLink, `T` opens a topology tab showing GPUs grouped by NUMA with NvLink edges.
- [x] `M` within the tab switches to matrix view; `M` again returns to graph view.
- [x] Non-NVIDIA systems show NUMA + PCIe layout without errors.
- [x] Systems without NUMA (e.g., single-socket workstation) show a single NUMA box.
- [x] Remote view: selecting different nodes via tabs updates the topology accordingly.
- [x] With `ALL_SMI_MOCK_TOPOLOGY=1` the mock server produces a DGX-like topology the tab renders correctly.
- [x] Any new fields added to `GpuInfo`/`NvLinkRemoteDevice` serialize through Prometheus and round-trip via `src/network/metrics_parser.rs` with backward compatibility — old exporters continue to be parseable.
- [x] Layout must not overflow on 80-column terminals (fall back to matrix-only under a minimum width).
- [x] `cargo test` covers: layout with 2 GPUs, 4 GPUs, 8 GPUs on 1 / 2 NUMAs; edge classification; matrix formatting.
- [x] README gains a "Topology View" section.

## Edge cases & non-goals

- 16+ GPU systems (HGX / Grace): layout must flow into multiple rows; matrix mode remains the reliable fallback.
- Active link counts differing between endpoints of a link: treat as degraded, color yellow.
- Non-goal: inter-node topology (NVLink Fabric across chassis) — v1 is per-node only.
- Non-goal: animating traffic — static structural view only.
- NvSwitch classification relies on `NvLinkRemoteType::Switch`. If it isn't exposed on a particular system, show a neutral `?` node.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tui): topology view tab ('T') — NvLink/NUMA/PCIe graph and matrix #190

Summary

Motivation

Current state

Proposed design

Tab and modes

Graph rendering example

Matrix rendering example

Graceful degradation

Implementation plan

Acceptance criteria

Edge cases & non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(tui): topology view tab ('T') — NvLink/NUMA/PCIe graph and matrix #190

Description

Summary

Motivation

Current state

Proposed design

Tab and modes

Graph rendering example

Matrix rendering example

Graceful degradation

Implementation plan

Acceptance criteria

Edge cases & non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions