Skip to content

[DGX Spark][Upgrade] In-place upgrade leaves stale openclaw-gateway on port 18789 — new sandbox creation fails with 180s timeout #3397

@zNeill

Description

@zNeill

Description

Description

Upgrading NemoClaw on DGX Spark from a previous version to v0.0.39 leaves the old openclaw-gateway process running on port 18789. The installer detects the port conflict and falls back to 18790, but the new sandbox fails to reach Ready state within 180s and is automatically destroyed. The user is left with no working sandbox and no actionable recovery guidance.

This affects all DGX Spark users upgrading from any prior version with an existing sandbox. The old openclaw-gateway (pid owned by the user) is not killed or stopped by the installer or by nemoclaw onboard --fresh.
Environment
Device:        DGX Spark (spark-6087)
OS:            Ubuntu (aarch64)
Architecture:  aarch64
Node.js:       v22.22.2
npm:           10.9.7
Docker:        Docker CE 28.3.3
OpenShell CLI: 0.0.37
NemoClaw:      v0.0.39
OpenClaw:      2026.4.24
Steps to Reproduce
1. Have a working NemoClaw sandbox on a previous version (e.g. v0.0.38)
2. Upgrade: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
3. Select Express setup or run nemoclaw onboard --fresh
4. Observe port conflict message: "Port 18789 is taken. Using port 18790 instead."
5. Wait for sandbox creation (Docker build completes ~8 min)
6. Observe: "Sandbox was created but did not become ready within 180s"
7. Sandbox is auto-destroyed. Old openclaw-gateway still running on 18789.
Expected Result
The installer or onboard should detect the stale openclaw-gateway process from the previous version and either:
a) Stop it gracefully before creating the new sandbox, OR
b) Provide actionable guidance: "Kill the old gateway process (pid XXXX) and retry"

After upgrade, the new sandbox should start on the default port 18789 without manual intervention.
Actual Result
- Old openclaw-gateway (pid 2522044) keeps running on 18789
- New sandbox forced to 18790, fails to reach Ready in 180s
- Sandbox auto-destroyed, user left with no working sandbox
- No guidance to kill the old process
- ss -tlnp shows:
  LISTEN 127.0.0.1:18789 openclaw-gatewa (pid 2522044) — stale from old version
  LISTEN 127.0.0.1:8080  openshell-gatew (pid 842854) — new gateway
Logs
! Port 18789 is taken. Using port 18790 instead.
Create stream exited with code 1 after sandbox was created.
Sandbox 'my-assistant' was created but did not become ready within 180s.
The orphaned sandbox has been removed — you can safely retry.
Workaround
kill 
nemoclaw onboard --fresh

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_Automation, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Install, NemoClaw_Upgrade, NemoClaw-SWQA-RelBlckr-Recommended

[NVB#6168039]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.platform: dgx-sparkAffects DGX Spark hardware or workflows

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions