Add NVIDIA vGPU monitoring support via nvml-wrapper 0.12

## Problem / Background

The `nvml-wrapper` crate v0.12.0 introduced several vGPU-related APIs that are now available for integration:

- `vgpu_scheduler_capabilities` – query scheduler capabilities
- `vgpu_host_mode` – check vGPU host mode
- `vgpu_accounting_pids` – list PIDs with vGPU accounting enabled
- `vgpu_accounting_stats` – per-vGPU utilization and memory stats
- `vgpu_scheduler_log` – scheduler log entries
- `vgpu_scheduler_state` – current scheduler state
- `set_vgpu_scheduler_state` – configure scheduler state

Currently, all-smi has no awareness of vGPU environments. Hosts running NVIDIA vGPU (commonly used in virtualized GPU sharing for cloud/enterprise workloads) are treated as standard GPU hosts, missing important scheduling and per-vGPU utilization data.

## Goal

Expose vGPU metrics in both the TUI view mode and the Prometheus API mode, giving operators visibility into vGPU scheduling, utilization, and memory allocation across virtualized GPU environments.

## Scope

### Detection
- Detect whether the host is vGPU-enabled by querying `vgpu_host_mode` or scheduler capabilities.
- Gracefully skip vGPU collection on non-vGPU hosts (no errors, no empty sections).

### Data Collection
- Read vGPU scheduler capabilities and current scheduler state.
- Collect per-vGPU accounting stats (utilization, memory usage) via `vgpu_accounting_pids` and `vgpu_accounting_stats`.
- Optionally collect scheduler log entries for diagnostic display.

### TUI Display
- Display vGPU information in the TUI, either as:
  - A sub-tab under the existing GPU tab, or
  - An additional collapsible section within each GPU's detail view.
- Show per-vGPU utilization, memory, and scheduler state.

### Prometheus API
- Export vGPU metrics at the `/metrics` endpoint in Prometheus format, including:
  - `allsmi_vgpu_utilization` (per vGPU instance)
  - `allsmi_vgpu_memory_used_bytes` / `allsmi_vgpu_memory_total_bytes`
  - `allsmi_vgpu_scheduler_state`
  - `allsmi_vgpu_host_mode`
- Use appropriate labels (`gpu_index`, `vgpu_id`, `host`, etc.).

## Acceptance Criteria

- [x] vGPU-enabled hosts are correctly detected; non-vGPU hosts are unaffected
- [x] Per-vGPU utilization and memory stats are collected via nvml-wrapper 0.12 APIs
- [x] vGPU scheduler capabilities and state are readable
- [x] TUI view displays vGPU information in a clear, navigable layout
- [x] Prometheus `/metrics` endpoint exports all vGPU metrics with correct labels
- [x] Existing GPU monitoring functionality is not regressed
- [x] Feature is integration-tested with the mock server or a vGPU-capable environment

## Technical Considerations

- **Dependency**: Requires upgrading `nvml-wrapper` to >= 0.12.0.
- **Fallback**: vGPU APIs may return errors on non-vGPU hosts or older drivers. All calls must be wrapped with proper error handling to avoid panics or degraded behavior on standard GPU hosts.
- **Architecture**: The vGPU reader logic should integrate into the existing `GpuReader` trait flow in `src/gpu/nvidia.rs`, extending `GpuInfo` or introducing a companion `VgpuInfo` struct.
- **Mock server**: The mock server should be extended to optionally simulate vGPU responses for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NVIDIA vGPU monitoring support via nvml-wrapper 0.12 #129

Problem / Background

Goal

Scope

Detection

Data Collection

TUI Display

Prometheus API

Acceptance Criteria

Technical Considerations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add NVIDIA vGPU monitoring support via nvml-wrapper 0.12 #129

Description

Problem / Background

Goal

Scope

Detection

Data Collection

TUI Display

Prometheus API

Acceptance Criteria

Technical Considerations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions