Skip to content

[Jetson Orin][CLI&UX] nemoclaw status shows "Sandbox GPU: enabled" but CUDA is unusable inside sandbox — misleading status #4231

@mercl-lau

Description

@mercl-lau

Description

On Jetson Orin, nemoclaw status reports "Sandbox GPU: enabled (auto)" after onboard with GPU passthrough. GPU device files (/dev/nvidia0, /dev/nvmap, /dev/nvhost-*) are mapped into the sandbox, and the Docker container runs with --runtime=nvidia. However, CUDA is actually unusable inside the sandbox because the sandbox user (uid=998) is not in the video group, and /dev/nvmap is owned by root:video with mode cr--r-----. Any attempt to use CUDA inside the sandbox fails with "NvRmMemInitNvmap failed: Permission denied" and cuInit returns 100 (CUDA_ERROR_NO_DEVICE).

The status display misleads users into thinking GPU compute is available inside the sandbox when it is not. Normal inference (cloud API or host Ollama) is unaffected since it does not use sandbox-side CUDA, but users who want to run GPU-accelerated code inside the sandbox will be surprised.

Additionally, the onboard GPU proof check ("nvidia-smi when available") silently passes on Jetson because nvidia-smi is not in the sandbox base image — it prints "skipping optional visibility check" and exits 0. The two other proofs (proc-comm-write and cuInit) are marked optional and are skipped after the first optional failure.

Environment

Device:        Jetson Orin (aarch64, JetPack R39 release 1.0)
OS:            Ubuntu 24.04.4 LTS (Linux 6.8.12-1018-tegra aarch64)
Architecture:  aarch64
Node.js:       v24.15.0
npm:           11.12.1
Docker:        Docker version 29.4.0
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.50
OpenClaw:      v2026.5.18

Steps to Reproduce

  1. Fresh install NemoClaw v0.0.50 on Jetson Orin via curl|bash
  2. Onboard with any provider (cloud API or Ollama)
  3. Observe onboard output: "✓ GPU proof passed: nvidia-smi when available"
  4. Run: nemoclaw my-assistant status
  5. Run: nemoclaw my-assistant exec -- id
  6. Run: nemoclaw my-assistant exec -- ls -la /dev/nvmap
  7. Run: nemoclaw my-assistant exec -- python3 -c "import ctypes; cuda=ctypes.CDLL('libcuda.so.1'); r=cuda.cuInit(0); print(f'cuInit={r}')"

Expected Result

Either:

  • Sandbox user has video group membership, CUDA works, status correctly shows "enabled"
  • Or status honestly reports that GPU devices are mapped but CUDA is not functional (e.g. "Sandbox GPU: mapped (CUDA not verified)")

Actual Result

  1. Status shows: "Sandbox GPU: enabled (auto)" — implies GPU is usable
  2. Sandbox user: uid=998(sandbox) gid=998(sandbox) groups=998(sandbox) — NOT in video group
  3. /dev/nvmap: cr--r----- root:video — requires video group membership
  4. CUDA output:
NvRmMemInitNvmap failed: error Permission denied
NvRmMemMgrInit failed: Memory Manager Not supported, line 340
cuInit=100  (CUDA_ERROR_NO_DEVICE)

GPU proof during onboard passed because:

  • nvidia-smi probe: nvidia-smi not in sandbox image → "skipping optional visibility check" → exit 0 (false pass)
  • proc-comm-write probe: optional=true → failure causes early return, skips remaining proofs
  • cuInit probe: never reached (previous optional failure returned early)

Logs

$ nemoclaw my-assistant status | grep -i gpu
    Host GPU: yes
    Sandbox GPU: enabled (auto)

$ nemoclaw my-assistant exec -- id
uid=998(sandbox) gid=998(sandbox) groups=998(sandbox)

$ nemoclaw my-assistant exec -- ls -la /dev/nvmap
cr--r----- 1 root video 10, 261 May 26 08:40 /dev/nvmap

$ nemoclaw my-assistant exec -- python3 -c "import ctypes; cuda=ctypes.CDLL('libcuda.so.1'); r=cuda.cuInit(0); print(f'cuInit={r}')"
NvRmMemInitNvmap failed: error Permission denied
NvRmMemMgrInit failed: Memory Manager Not supported, line 340
NvRmMemMgrInit failed: error type 196626
cuInit=100

sandboxes.json shows: "gpuEnabled": true, "sandboxGpuEnabled": true

NVB#6222618

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: jetsonAffects Jetson AGX Thor or Orinv0.0.58Release target

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions