Skip to content

[Ubuntu 24.04][Onboard] preflight reports "nvidia-smi is not available" when nvidia-smi works but nvidia-container-toolkit is missing #3174

@zNeill

Description

@zNeill

Description

Description

nemoclaw onboard preflight prints "NVIDIA GPU hardware detected but nvidia-smi is not available" on a host where /usr/bin/nvidia-smi works correctly. The actual missing component is nvidia-container-toolkit (docker GPU runtime), so the preflight message is misleading and sends users to debug nvidia-smi/driver instead of the container toolkit.
Environment
Device:        Ubuntu 24.04 server (NVIDIA RTX 6000 Ada Generation)
OS:            Ubuntu 24.04.4 LTS (Linux 6.17.0-19-generic)
Architecture:  x86_64 / amd64
Node.js:       v22.22.2
npm:           10.9.7
Docker:        29.4.3, build 055a478
OpenShell CLI: openshell 0.0.36
NemoClaw:      nemoclaw v0.0.36
OpenClaw:      2026.4.24 (cbcfdf6)
Steps to Reproduce
1. On Ubuntu 24.04 host with NVIDIA driver installed (nvidia-smi works) but nvidia-container-toolkit NOT installed.
2. Run: nemoclaw onboard --fresh --no-gpu --name ollama-claw --yes-i-accept-third-party-software
3. Observe the [1/8] Preflight checks output.
Expected Result
The preflight error message should accurately identify the missing component, e.g.:
  "NVIDIA Container Toolkit not configured: docker run --gpus failed (no known GPU vendor found from CDI). Install nvidia-container-toolkit and run: sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker."
That way the user knows to install the toolkit, not waste time on the driver.
Actual Result
[1/8] Preflight checks
  --------------------------------------------------
  ✓ NVIDIA GPU detected (NVIDIA RTX 6000 Ada Generation, 46068 MB)
  ...
  NVIDIA GPU hardware detected but nvidia-smi is not available.
  Install NVIDIA drivers and the Container Toolkit for default GPU passthrough.

But on the same host, in the same shell, BOTH of these succeed:
  $ which nvidia-smi
  /usr/bin/nvidia-smi
  $ nvidia-smi
  NVIDIA-SMI 595.58.03  Driver Version: 595.58.03  CUDA Version: 13.2
  | NVIDIA RTX 6000 Ada Gene...   46068 MiB | ...

The real failure is at the docker GPU layer, not nvidia-smi:
  $ docker run --rm --gpus all ubuntu:22.04 nvidia-smi -L
  docker: Error response from daemon: failed to discover GPU vendor from CDI: no known GPU vendor found

  $ which nvidia-container-toolkit
  (empty -- package not installed)
  $ docker info | grep -i runtime
  Runtimes: io.containerd.runc.v2 runc
  Default Runtime: runc        # no nvidia runtime
Logs
Not captured -- preflight output above is the relevant excerpt.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard

[NVB#6154930]

Metadata

Metadata

Assignees

No one assigned

    Labels

    NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputplatform: ubuntuAffects Ubuntu Linux environments

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions