Description
Description
On DGX Spark, running express setup (curl|bash) repeatedly fails because each failed onboard attempt leaves an orphaned openclaw-gateway process listening on port 18789. The next onboard detects the port conflict, falls back to 18790, but the sandbox never reaches Ready state (180s timeout). The destroy/uninstall cycle does not kill the stale openclaw-gateway process.
Root cause analysis:
1. destroy.ts:stopDockerDriverGatewayProcess() only kills openshell-gateway (checks cmdline for "openshell-gateway"), NOT openclaw-gateway
2. uninstall run-plan.ts kills openshell processes but does not specifically target openclaw-gateway
3. After a failed onboard, openclaw-gateway (spawned inside the sandbox container) survives because the sandbox container may be removed but the gateway process was forwarded to the host network
4. nemoclaw onboard --fresh does not check for or kill stale openclaw-gateway processes before starting
The port fallback path (18789→18790) also appears broken — the sandbox is created but never reaches Ready when using the fallback port, suggesting CHAT_UI_URL mismatch between the Dockerfile ARG and the actual forwarded port.
Environment
Device: DGX Spark (spark-6087)
OS: Ubuntu (aarch64)
Architecture: aarch64
Node.js: v22.22.2
npm: 10.9.7
Docker: Docker CE 28.3.3
OpenShell CLI: 0.0.37
NemoClaw: v0.0.39
OpenClaw: 2026.4.24
Steps to Reproduce
1. On DGX Spark, run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
2. Select Express setup
3. Sandbox creation fails (180s timeout) — any reason (slow build, network, etc.)
4. Observe: orphaned openclaw-gateway still running on port 18789
5. Run: nemoclaw uninstall --yes
6. Run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash again
7. Observe: "Port 18789 is taken. Using port 18790 instead."
8. Sandbox creation fails again — 180s timeout, same pattern
9. Each retry leaves another openclaw-gateway process
Verification:
ss -tlnp | grep 18789
→ LISTEN 127.0.0.1:18789 openclaw-gatewa (stale pid from previous attempt)
Expected Result
1. nemoclaw onboard --fresh should detect and kill any stale openclaw-gateway processes before starting
2. nemoclaw uninstall should kill openclaw-gateway processes (not just openshell-gateway)
3. destroy.ts:stopDockerDriverGatewayProcess() should also match "openclaw-gateway" in cmdline check
4. Port fallback (18789→18790) should produce a working sandbox, or fail with actionable error
Actual Result
- Each failed onboard leaves orphaned openclaw-gateway on 18789
- uninstall does not clean it up
- onboard --fresh does not clean it up
- Port fallback to 18790 also fails (sandbox never reaches Ready)
- User is stuck in an unrecoverable loop without manual "kill" command
- Workaround: manually run "pkill -f openclaw-gatewa" before retrying
Logs
! Port 18789 is taken. Using port 18790 instead.
Direct sandbox GPU enabled; allowing only /proc task comm writes.
Creating sandbox 'my-assistant' (this takes a few minutes on first run)...
...63 Docker build steps complete...
Create stream exited with code 1 after sandbox was created.
Sandbox 'my-assistant' was created but did not become ready within 180s.
The orphaned sandbox has been removed — you can safely retry.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_Automation, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Install, NemoClaw_Sandbox, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6168123]
Description
Description
Environment Steps to Reproduce Expected Result Actual Result LogsBug Details
[NVB#6168123]