Problem / Background
NVIDIA has released DGX Spark, a desktop AI system based on the GB10 Grace Blackwell chip. This system uses Unified Memory Architecture (UMA) where CPU and GPU share the same physical memory, which is fundamentally different from traditional discrete GPUs with dedicated VRAM.
Current all-smi NVIDIA GPU monitoring assumes discrete GPUs with:
- Dedicated GPU memory (VRAM) separate from system RAM
device.memory_info() returning GPU-specific memory metrics
- Clear distinction between
used_memory and total_memory for the GPU
On UMA systems like DGX Spark:
- CPU and GPU share the same physical memory pool
- Traditional memory reporting concepts may not apply directly
- NVML may report memory differently or require different API calls
- Memory usage attribution between CPU and GPU workloads may differ
Affected Products
- NVIDIA DGX Spark (GB10 Grace Blackwell)
- Future Grace-based products with unified memory
- Similar architectures that NVIDIA may release
Verified on Real Hardware (2026-04-08)
Tested on DGX Spark (GB10), Driver 580.126.09, CUDA 13.0, Linux 6.17.0-1008-nvidia (aarch64).
What Works
| Metric |
Value |
Status |
| GPU Detection |
NVIDIA GB10 |
OK |
| Architecture |
Blackwell |
OK |
| GPU Utilization |
0% (idle) |
OK |
| Temperature |
56°C |
OK |
| Power Draw |
~5W |
OK |
| GPU Frequency |
208 MHz |
OK |
| System Memory |
121.7 GB / 130.7 GB |
OK (via system memory reader) |
What's Broken
| Metric |
Reported |
Expected |
Root Cause |
| GPU Memory Used |
0 bytes |
Should reflect shared memory usage |
NVML memory_info() returns [N/A] |
| GPU Memory Total |
0 bytes |
~128 GB (unified with system) |
NVML memory_info() returns [N/A] |
| Power Limit |
N/A |
N/A |
Not available on GB10 |
| PCIe Gen/Width |
Gen 1 x1 (max x16) |
N/A |
Misleading — GB10 uses internal interconnect, not PCIe |
| Brand |
NvidiaRTX |
DGX Spark or similar |
Does not distinguish UMA architecture |
Key Findings
-
NVML nvmlDeviceGetMemoryInfo() returns [N/A] on GB10 — confirmed via both nvidia-smi --query-gpu=memory.total and device.memory_info() in nvml-wrapper. The code at src/device/readers/nvidia.rs:184-185 falls back to 0 via unwrap_or(0).
-
nvidia-smi shows per-process GPU memory (e.g., Xorg 18MiB, gnome-shell 6MiB) even though aggregate memory queries fail — suggesting process-level memory attribution may still work via NVML.
-
System memory reader works correctly — reports 130.7 GB total, which IS the unified memory pool. The information exists but is not associated with the GPU.
-
PCIe reporting is misleading — GB10 uses an internal interconnect, so reporting "Gen 1 x1 (max x16)" is confusing and inaccurate.
Proposed Solution
Priority 1: UMA Memory Fallback
- Detect UMA architecture — when
memory_info() returns 0 for total AND the device is GB10/Blackwell, flag it as UMA
- Use system memory as GPU memory — fallback to
/proc/meminfo total as total_memory, similar to Jetson approach in nvidia_jetson.rs
- Reference:
src/device/readers/nvidia_jetson.rs:145 already does this for Jetson
Priority 2: Suppress Misleading Metrics
- PCIe metrics — suppress or annotate PCIe Gen/Width for UMA devices (not meaningful)
- Power Limit — gracefully handle N/A (already
unwrap_or but should be explicit)
Priority 3: UMA Identification
- Add memory type indicator — "Unified" vs "Discrete" in device detail
- Fix brand detection — identify DGX Spark properly instead of
NvidiaRTX
Acceptance Criteria
Technical Considerations
Confirmed NVML Behavior on GB10
$ nvidia-smi --query-gpu=memory.total,memory.used,memory.free --format=csv
memory.total [MiB], memory.used [MiB], memory.free [MiB]
[N/A], [N/A], [N/A]
$ nvidia-smi --query-gpu=name,compute_mode,temperature.gpu,power.draw,utilization.gpu --format=csv
NVIDIA GB10, Default, 56, 5.05 W, 0 %
nvmlDeviceGetMemoryInfo() → returns [N/A] (nvml-wrapper maps to error, unwrap_or(0))
nvmlDeviceGetArchitecture() → Blackwell
nvmlDeviceGetBrand() → NvidiaRTX (incorrect for DGX Spark)
nvmlDeviceGetPciInfo() → Gen 1 x1 (not meaningful for UMA)
- Temperature, power, utilization, frequency → all work correctly
Architecture Reference
Current relevant implementations:
src/device/readers/nvidia.rs:184-185 — where memory_info() returns 0 for UMA
src/device/readers/nvidia_jetson.rs — Jetson reader handling integrated GPU with shared memory (uses /proc/meminfo fallback)
src/device/memory_linux.rs — System memory reader (already reports correct 128GB unified pool)
Potential New Fields in GpuInfo
// Consider adding to device detail or as new fields
memory_type: Option<String>, // "Discrete", "Unified", "Shared"
unified_memory_total: Option<u64>, // Total unified memory pool
Graceful Degradation
If NVML on UMA systems doesn't provide expected metrics:
- Fall back to system memory reporting (similar to Jetson approach)
- Use
/proc/meminfo for unified memory systems
- Log warnings for unsupported queries
Additional Context
Related Implementations
- NVIDIA Jetson (
nvidia_jetson.rs): Uses tegrastats and system memory fallback for integrated GPU
- Apple Silicon (
apple.rs): Unified memory architecture with shared CPU/GPU memory pool
References
Problem / Background
NVIDIA has released DGX Spark, a desktop AI system based on the GB10 Grace Blackwell chip. This system uses Unified Memory Architecture (UMA) where CPU and GPU share the same physical memory, which is fundamentally different from traditional discrete GPUs with dedicated VRAM.
Current
all-smiNVIDIA GPU monitoring assumes discrete GPUs with:device.memory_info()returning GPU-specific memory metricsused_memoryandtotal_memoryfor the GPUOn UMA systems like DGX Spark:
Affected Products
Verified on Real Hardware (2026-04-08)
Tested on DGX Spark (GB10), Driver 580.126.09, CUDA 13.0, Linux 6.17.0-1008-nvidia (aarch64).
What Works
NVIDIA GB10BlackwellWhat's Broken
memory_info()returns[N/A]memory_info()returns[N/A]NvidiaRTXKey Findings
NVML
nvmlDeviceGetMemoryInfo()returns[N/A]on GB10 — confirmed via bothnvidia-smi --query-gpu=memory.totalanddevice.memory_info()in nvml-wrapper. The code atsrc/device/readers/nvidia.rs:184-185falls back to 0 viaunwrap_or(0).nvidia-smishows per-process GPU memory (e.g., Xorg 18MiB, gnome-shell 6MiB) even though aggregate memory queries fail — suggesting process-level memory attribution may still work via NVML.System memory reader works correctly — reports 130.7 GB total, which IS the unified memory pool. The information exists but is not associated with the GPU.
PCIe reporting is misleading — GB10 uses an internal interconnect, so reporting "Gen 1 x1 (max x16)" is confusing and inaccurate.
Proposed Solution
Priority 1: UMA Memory Fallback
memory_info()returns 0 for total AND the device is GB10/Blackwell, flag it as UMA/proc/meminfototal astotal_memory, similar to Jetson approach innvidia_jetson.rssrc/device/readers/nvidia_jetson.rs:145already does this for JetsonPriority 2: Suppress Misleading Metrics
unwrap_orbut should be explicit)Priority 3: UMA Identification
NvidiaRTXAcceptance Criteria
Technical Considerations
Confirmed NVML Behavior on GB10
nvmlDeviceGetMemoryInfo()→ returns[N/A](nvml-wrapper maps to error,unwrap_or(0))nvmlDeviceGetArchitecture()→BlackwellnvmlDeviceGetBrand()→NvidiaRTX(incorrect for DGX Spark)nvmlDeviceGetPciInfo()→ Gen 1 x1 (not meaningful for UMA)Architecture Reference
Current relevant implementations:
src/device/readers/nvidia.rs:184-185— wherememory_info()returns 0 for UMAsrc/device/readers/nvidia_jetson.rs— Jetson reader handling integrated GPU with shared memory (uses/proc/meminfofallback)src/device/memory_linux.rs— System memory reader (already reports correct 128GB unified pool)Potential New Fields in
GpuInfoGraceful Degradation
If NVML on UMA systems doesn't provide expected metrics:
/proc/meminfofor unified memory systemsAdditional Context
Related Implementations
nvidia_jetson.rs): Uses tegrastats and system memory fallback for integrated GPUapple.rs): Unified memory architecture with shared CPU/GPU memory poolReferences