Description
Description
On any GPU-capable host (ubuntu24-gpu / dgxspark / dgx-station, x86_64 and aarch64), nemoclaw onboard brings the openshell-gateway up without GPU passthrough on the first install. Any subsequent onboard call that needs to recreate the sandbox (NEMOCLAW_RECREATE_SANDBOX=1 — used by every sandbox-lifecycle / policy / multi-sandbox test) aborts with "Existing gateway was started without GPU passthrough". The product itself recommends nemoclaw uninstall && nemoclaw onboard --gpu as the only path forward, which is a full reinstall just to reset gateway state. In v0.0.43 sanity matrix pipeline 51356759 this single bug produced 20+ cascading beforeAll/afterAll hook failures per GPU host (ubuntu24-gpu: 26 fails of 38, dgxspark: 26 fails of 37). It does NOT happen on non-GPU hosts (ubuntu22 / ubuntu24 / ubuntu26), which pass 29–31 of 38 templates on the same code.Environment
Device: DGX Station (galaxy-ts2-052, GB300 GPU) — also reproduced on DGX Spark (GB10) and Ubuntu 24.04 GPU runner
OS: Ubuntu 24.04 (Linux 6.17.0-1008-nvidia-64k)
Architecture: aarch64 (DGX Station / Spark) and x86_64 (ubuntu24-gpu)
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3, build f52814d
OpenShell CLI: openshell 0.0.39
NemoClaw: nemoclaw v0.0.43
OpenClaw: N/A (recreate-sandbox aborts before sandbox container starts)Steps to Reproduce
- Fresh GPU-capable host, no prior nemoclaw state
- nemoclaw onboard --non-interactive (any provider; gateway comes up without GPU passthrough by default)
- NEMOCLAW_RECREATE_SANDBOX=1 NEMOCLAW_PROVIDER=custom NEMOCLAW_SANDBOX_NAME=my-assistant nemoclaw onboard --non-interactiveExpected Result
Either the existing gateway is restarted in GPU-passthrough mode automatically, or --recreate-sandbox proceeds without GPU on the new sandbox (matching --no-gpu semantics). User should not have to uninstall the entire CLI just to flip a gateway flag.Actual Result
[1/8] Preflight checks
──────────────────────────────────────────────────
✓ NVIDIA GPU detected (NVIDIA GB300, 284208 MB)
✓ Docker CDI GPU support detected (/etc/cdi/nvidia.yaml)
✓ Sandbox GPU: enabled (auto)
NVIDIA GPU detected; enabling OpenShell GPU passthrough. Use --no-gpu to opt out.
Docker-driver GPU patch will use host networking; local inference providers will use sandbox loopback.
Existing gateway was started without GPU passthrough.
No sandboxes are registered, so there is nothing for nemoclaw destroy to act on.
Clear the stale gateway state and re-onboard with GPU enabled:
nemoclaw uninstall && nemoclaw onboard --gpu
exit 1
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_Automation, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw_Sandbox, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6180214]
Description
Description
On any GPU-capable host (ubuntu24-gpu / dgxspark / dgx-station, x86_64 and aarch64),
nemoclaw onboardbrings the openshell-gateway up without GPU passthrough on the first install. Any subsequent onboard call that needs to recreate the sandbox (NEMOCLAW_RECREATE_SANDBOX=1 — used by every sandbox-lifecycle / policy / multi-sandbox test) aborts with "Existing gateway was started without GPU passthrough". The product itself recommendsnemoclaw uninstall && nemoclaw onboard --gpuas the only path forward, which is a full reinstall just to reset gateway state. In v0.0.43 sanity matrix pipeline 51356759 this single bug produced 20+ cascading beforeAll/afterAll hook failures per GPU host (ubuntu24-gpu: 26 fails of 38, dgxspark: 26 fails of 37). It does NOT happen on non-GPU hosts (ubuntu22 / ubuntu24 / ubuntu26), which pass 29–31 of 38 templates on the same code.EnvironmentDevice: DGX Station (galaxy-ts2-052, GB300 GPU) — also reproduced on DGX Spark (GB10) and Ubuntu 24.04 GPU runner
OS: Ubuntu 24.04 (Linux 6.17.0-1008-nvidia-64k)
Architecture: aarch64 (DGX Station / Spark) and x86_64 (ubuntu24-gpu)
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3, build f52814d
OpenShell CLI: openshell 0.0.39
NemoClaw: nemoclaw v0.0.43
OpenClaw: N/A (recreate-sandbox aborts before sandbox container starts)Steps to Reproduce
Either the existing gateway is restarted in GPU-passthrough mode automatically, or
--recreate-sandboxproceeds without GPU on the new sandbox (matching--no-gpusemantics). User should not have to uninstall the entire CLI just to flip a gateway flag.Actual Result[1/8] Preflight checks
──────────────────────────────────────────────────
✓ NVIDIA GPU detected (NVIDIA GB300, 284208 MB)
✓ Docker CDI GPU support detected (/etc/cdi/nvidia.yaml)
✓ Sandbox GPU: enabled (auto)
NVIDIA GPU detected; enabling OpenShell GPU passthrough. Use --no-gpu to opt out.
Docker-driver GPU patch will use host networking; local inference providers will use sandbox loopback.
Existing gateway was started without GPU passthrough.
No sandboxes are registered, so there is nothing for
nemoclaw destroyto act on.Clear the stale gateway state and re-onboard with GPU enabled:
nemoclaw uninstall && nemoclaw onboard --gpu
exit 1
Bug Details
[NVB#6180214]