Description
Description
NemoClaw v0.0.21 onboard fails fast when onboarding a second sandbox while another sandbox already holds the default dashboard forward (port 18789). The error requires the user to
manually set CHAT_UI_URL, with no CLI flag, no automatic port allocation, and no way to discover the assigned port from nemoclaw list / status. This is a breaking change for automation
and test pipelines that relied on the previous behavior.
Environment
- Device: 245 GiB RAM host, CPU-only (no NVIDIA GPU attached)
- OS: Ubuntu 25.10
- OpenShell CLI: v0.0.26
- NemoClaw: v0.0.21 (installed globally via npm)
- OpenClaw: v2026.4.2
- Inference: NVIDIA build endpoint (https://integrate.api.nvidia.com/v1), model nvidia/nemotron-3-super-120b-a12b
- Reproduced in: NemoClaw DevTest automation (nemoclaw-test) during full regression run
Reproduction Steps
1. Onboard the first sandbox (uses default port 18789):
nemoclaw onboard --non-interactive
2. After success, openshell forward list shows:
my-assistant 127.0.0.1 18789 running
3. Without destroying my-assistant, attempt to onboard a second sandbox with a different name:
nemoclaw onboard --non-interactive # pick a different sandbox name, e.g. my-assistant-temp
4. The second onboard crashes in phase [6/8] Creating sandbox inside ensureDashboardForward.
Actual Result
Onboard crashes with an uncaught error:
Error: Port 18789 is already forwarded for sandbox 'my-assistant'.
Set CHAT_UI_URL to a different local port (e.g. http://127.0.0.1:18790) before onboarding a second sandbox.
at ensureDashboardForward (/.nemoclaw/source/dist/lib/onboard.js:4880:15)
at createSandbox (dist/lib/onboard.js:3033:5)
at async Object.onboard (dist/lib/onboard.js:5448:27)
at async runOnboardCommand (dist/lib/onboard-command.js:82:5)
at async onboard (dist/nemoclaw.js:723:5)
The second sandbox's creation is aborted mid-flight. The openshell gateway may already have a partially-created record, but nemoclaw list shows only the first sandbox.
Stably reproducible every time, byte-for-byte identical to the QA-machine failure log for T67 ([T5882262]).
Expected Result
The second onboard should complete successfully. Acceptable options:
- (a) Auto-allocate the next free dashboard port (18790, 18791, …), store it with the sandbox, and expose it in nemoclaw list / nemoclaw status.
- (b) Add a first-class CLI flag such as --control-ui-port (today the only override is the CHAT_UI_URL env var).
- (c) On conflict, emit a warning and auto-pick the next free port instead of throwing.
Analysis:
1. New guard location — dist/lib/onboard.js:4878-4884 (v0.0.21):
if (portOwner !== null && portOwner !== sandboxName) {
throw new Error(`Port ${portToStop} is already forwarded for sandbox '${portOwner}'. ` +
`Set CHAT_UI_URL to a different local port ...`);
}
2. v0.0.20 behavior (prior): the same code path silently called openshell forward stop and then openshell forward start , effectively stealing the dashboard
forward away from the previous sandbox with no warning. That silent stealing was itself a latent bug; the new v0.0.21 guard is the correct direction, but it surfaces two downstream
problems:
- Breaking change for automation: flows that previously "worked" (even if the old sandbox's dashboard was quietly broken afterward) now throw.
- Error message leaves the user stranded:
- Doesn't name the next free port the user should pick.
- CHAT_UI_URL is an env var only; no CLI flag equivalent.
- After successful onboard with a non-default port, there is no nemoclaw list / status field showing the dashboard URL, so the user has no way to rediscover it later.
3. Observed blast radius in DevTest automation: at least 12 P0/P1 test cases currently fail due to this single change (examples: T15 NVIDIA Cloud, T35 OpenAI-compatible, T36
Anthropic-compatible, T37 Onboard interrupt/resume, T66 Destroy-and-cleanup, T67 Re-onboard, T83 CI non-interactive, T22 No-GPU fallback, T87 Quickstart E2E, T40.1 npm preset, T157
--dangerously-skip-permissions). All share the pattern: a baseline my-assistant sandbox is kept alive while a short-lived second sandbox is onboarded for the test. None of them set
CHAT_UI_URL, so all hit the conflict on port 18789.
4. Suggested fix (keeping the fail-fast semantics):
- Add a --control-ui-port CLI flag (takes precedence over CHAT_UI_URL env).
- In ensureDashboardForward, when a conflict is detected, auto-allocate the next free port in a sane range and emit a warning instead of throwing.
- Surface the assigned dashboard port in nemoclaw list and nemoclaw status so users can find it post-hoc.
- If still throwing, include a concrete "use this port next" suggestion in the error message rather than the generic 18790 example.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_Automation, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended, NemoClaw-SWQA-Test-Blocker |
[NVB#6099899]
Description
Description
Environment
Reproduction Steps
Actual Result
Onboard crashes with an uncaught error: Error: Port 18789 is already forwarded for sandbox 'my-assistant'. Set CHAT_UI_URL to a different local port (e.g. http://127.0.0.1:18790) before onboarding a second sandbox. at ensureDashboardForward (/.nemoclaw/source/dist/lib/onboard.js:4880:15) at createSandbox (dist/lib/onboard.js:3033:5) at async Object.onboard (dist/lib/onboard.js:5448:27) at async runOnboardCommand (dist/lib/onboard-command.js:82:5) at async onboard (dist/nemoclaw.js:723:5) The second sandbox's creation is aborted mid-flight. The openshell gateway may already have a partially-created record, but nemoclaw list shows only the first sandbox.Stably reproducible every time, byte-for-byte identical to the QA-machine failure log for T67 ([T5882262]).
Expected Result
Analysis:
1. New guard location — dist/lib/onboard.js:4878-4884 (v0.0.21): if (portOwner !== null && portOwner !== sandboxName) { throw new Error(`Port ${portToStop} is already forwarded for sandbox '${portOwner}'. ` + `Set CHAT_UI_URL to a different local port ...`); } 2. v0.0.20 behavior (prior): the same code path silently called openshell forward stop and then openshell forward start , effectively stealing the dashboard forward away from the previous sandbox with no warning. That silent stealing was itself a latent bug; the new v0.0.21 guard is the correct direction, but it surfaces two downstream problems: - Breaking change for automation: flows that previously "worked" (even if the old sandbox's dashboard was quietly broken afterward) now throw. - Error message leaves the user stranded: - Doesn't name the next free port the user should pick. - CHAT_UI_URL is an env var only; no CLI flag equivalent. - After successful onboard with a non-default port, there is no nemoclaw list / status field showing the dashboard URL, so the user has no way to rediscover it later. 3. Observed blast radius in DevTest automation: at least 12 P0/P1 test cases currently fail due to this single change (examples: T15 NVIDIA Cloud, T35 OpenAI-compatible, T36 Anthropic-compatible, T37 Onboard interrupt/resume, T66 Destroy-and-cleanup, T67 Re-onboard, T83 CI non-interactive, T22 No-GPU fallback, T87 Quickstart E2E, T40.1 npm preset, T157 --dangerously-skip-permissions). All share the pattern: a baseline my-assistant sandbox is kept alive while a short-lived second sandbox is onboarded for the test. None of them set CHAT_UI_URL, so all hit the conflict on port 18789. 4. Suggested fix (keeping the fail-fast semantics): - Add a --control-ui-port CLI flag (takes precedence over CHAT_UI_URL env). - In ensureDashboardForward, when a conflict is detected, auto-allocate the next free port in a sane range and emit a warning instead of throwing. - Surface the assigned dashboard port in nemoclaw list and nemoclaw status so users can find it post-hoc. - If still throwing, include a concrete "use this port next" suggestion in the error message rather than the generic 18790 example.Bug Details
[NVB#6099899]