Skip to content

[All Platforms (GPU)][Onboard] Existing gateway started without GPU passthrough — recreate-sandbox aborts #3578

@zNeill

Description

@zNeill

Description

Description

On any GPU-capable host (ubuntu24-gpu / dgxspark / dgx-station, x86_64 and aarch64), nemoclaw onboard brings the openshell-gateway up without GPU passthrough on the first install. Any subsequent onboard call that needs to recreate the sandbox (NEMOCLAW_RECREATE_SANDBOX=1 — used by every sandbox-lifecycle / policy / multi-sandbox test) aborts with "Existing gateway was started without GPU passthrough". The product itself recommends nemoclaw uninstall && nemoclaw onboard --gpu as the only path forward, which is a full reinstall just to reset gateway state. In v0.0.43 sanity matrix pipeline 51356759 this single bug produced 20+ cascading beforeAll/afterAll hook failures per GPU host (ubuntu24-gpu: 26 fails of 38, dgxspark: 26 fails of 37). It does NOT happen on non-GPU hosts (ubuntu22 / ubuntu24 / ubuntu26), which pass 29–31 of 38 templates on the same code.Environment

Device: DGX Station (galaxy-ts2-052, GB300 GPU) — also reproduced on DGX Spark (GB10) and Ubuntu 24.04 GPU runner
OS: Ubuntu 24.04 (Linux 6.17.0-1008-nvidia-64k)
Architecture: aarch64 (DGX Station / Spark) and x86_64 (ubuntu24-gpu)
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3, build f52814d
OpenShell CLI: openshell 0.0.39
NemoClaw: nemoclaw v0.0.43
OpenClaw: N/A (recreate-sandbox aborts before sandbox container starts)Steps to Reproduce

  1. Fresh GPU-capable host, no prior nemoclaw state
  2. nemoclaw onboard --non-interactive (any provider; gateway comes up without GPU passthrough by default)
  3. NEMOCLAW_RECREATE_SANDBOX=1 NEMOCLAW_PROVIDER=custom NEMOCLAW_SANDBOX_NAME=my-assistant nemoclaw onboard --non-interactiveExpected Result

Either the existing gateway is restarted in GPU-passthrough mode automatically, or --recreate-sandbox proceeds without GPU on the new sandbox (matching --no-gpu semantics). User should not have to uninstall the entire CLI just to flip a gateway flag.Actual Result

[1/8] Preflight checks
──────────────────────────────────────────────────
✓ NVIDIA GPU detected (NVIDIA GB300, 284208 MB)
✓ Docker CDI GPU support detected (/etc/cdi/nvidia.yaml)
✓ Sandbox GPU: enabled (auto)
NVIDIA GPU detected; enabling OpenShell GPU passthrough. Use --no-gpu to opt out.
Docker-driver GPU patch will use host networking; local inference providers will use sandbox loopback.
Existing gateway was started without GPU passthrough.
No sandboxes are registered, so there is nothing for nemoclaw destroy to act on.
Clear the stale gateway state and re-onboard with GPU enabled:
nemoclaw uninstall && nemoclaw onboard --gpu

exit 1

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_Automation, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw_Sandbox, NemoClaw-SWQA-RelBlckr-Recommended

[NVB#6180214]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.area: sandboxOpenShell sandbox lifecycle, runtime, config, or recovery

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions