Skip to content

[DGX Spark][Onboard] preflight GPU detection prints "1 GPU(s), 284208 MB VRAM" without GPU model name (GB300) #2669

@hulynn

Description

@hulynn

Description

Description

On a DGX Spark / NVIDIA GB300 host, `nemoclaw onboard` Phase [1/8] Preflight checks prints:
    ✓ NVIDIA GPU detected: 1 GPU(s), 284208 MB VRAM
The output reports VRAM and GPU count but never names the GPU model. nvidia-smi on the same host reports the model correctly (NVIDIA GB300). For QA / support / docs, the GPU model is more useful identifying info than "1 GPU(s)" — and DevTest case 517913 cross-check explicitly expects format `NVIDIA GPU detected (,  MB)`. The data is available to the installer (nvidia-smi --query-gpu=name --format=csv,noheader); it just isn't surfaced.
Environment
Device:        DGX Spark / NVIDIA GB300 (host: galaxy-ts2-052)
OS:            Ubuntu 24.04.3 LTS (Linux 6.17.0-1008-nvidia-64k)
Architecture:  aarch64
Node.js:       v22.22.2
npm:           10.9.7
Docker:        Docker Engine 29.1.3
OpenShell CLI: openshell 0.0.36
NemoClaw:      v0.0.29
OpenClaw:      N/A (issue surfaces in preflight, before sandbox creation)
Steps to Reproduce
1. On a host with NVIDIA GPU, run:
     curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -s -- --non-interactive --yes-i-accept-third-party-software
   or:
     nemoclaw onboard

2. Watch [1/8] Preflight checks output.

3. Compare with the GPU model the host actually has:
     nvidia-smi --query-gpu=name --format=csv,noheader
Expected Result
Preflight prints the GPU model (and VRAM, and count if multi-GPU). Format suggested by DevTest case 517913 cross-check:
     ✓ NVIDIA GPU detected (NVIDIA GB300, 284208 MB)

For multi-GPU hosts, list models or summarize, e.g.:
     ✓ NVIDIA GPU detected: 2x NVIDIA H100 80GB, 163840 MB VRAM total
Actual Result
On the GB300 lab host:

[1/8] Preflight checks
  ──────────────────────────────────────────────────
  ✓ Docker is running
  ✓ Container DNS resolution works
  ✓ Container runtime: docker
  ✓ openshell CLI: openshell 0.0.36
  ✓ Port 8080 available (OpenShell gateway)
  ✓ NVIDIA GPU detected: 1 GPU(s), 284208 MB VRAM        <-- model name absent
  ✓ Memory OK: 806139 MB RAM + 0 MB swap

Meanwhile, on the same host:
  $ nvidia-smi --query-gpu=name --format=csv,noheader
  NVIDIA GB300

The data is available; preflight just doesn't surface it.


Note: secondary observation — when re-running onboard with the gateway already healthy, the preflight line for port 8080 changes to "✓ Port 8080 already owned by healthy NemoClaw runtime (OpenShell gateway)" which has the same shape — useful pattern but the GPU line never adapts to provide model info even on subsequent runs.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard

[NVB#6126096]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputplatform: dgx-sparkAffects DGX Spark hardware or workflows

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions