feat(onboard): auto-allocate dashboard port for multi-sandbox#2411
Conversation
…2174) When onboarding a second sandbox while the first holds port 18789, the dashboard forward now auto-allocates the next free port in range 18789-18799 instead of crashing with an unhandled error. Changes: - Add --control-ui-port <N> CLI flag (takes precedence over CHAT_UI_URL env var and defaults) - Auto-allocate next free port when preferred port is taken, with warning message showing the reassigned port - Persist dashboardPort in the sandbox registry so it survives across sessions and resume flows - Display dashboard URL in `nemoclaw list` output - Display dashboard port in `nemoclaw status` output - Throw only when the entire 18789-18799 range is exhausted, with actionable error listing each port's owner - Fix pre-existing local-inference preset test (node binary added) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces dashboard port auto-allocation to support multiple simultaneous sandboxes. Adds Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…et (#2402) - The `messaging-providers-e2e` nightly test fails at **S1: Gateway is not serving on port 18789** because the Slack SDK hangs or crashes during initialization - **Root cause**: the base network policy (`openclaw-sandbox.yaml`) includes Telegram and Discord rules but **not Slack** — Slack access requires the `slack.yaml` preset, which onboard applies in Step 8 *after* the sandbox container starts in Step 6. By the time the preset is applied, the Slack SDK has already hung on a blocked connection or crashed on a CONNECT 403 - **Fix (E2E test)**: pre-merge the Slack network policy rules into the base policy BEFORE `install.sh` runs, so the sandbox is created with Slack access from the start. The SDK gets a fast `invalid_auth` rejection, the channel guard catches it, and the gateway serves normally - **Fix (guard)**: add `slack.com` domain check to `isSlackRejection()` so proxy CONNECT tunnel failures (e.g. `CONNECT tunnel to api.slack.com:443 failed with status 403`) are also caught — these errors lack `@slack/` in the stack trace because they originate from the HTTP client, not the SDK Fixes the nightly failure: https://github.com/NVIDIA/NemoClaw/actions/runs/24866963508/job/72805036431 | File | What | |------|------| | `test/e2e/test-messaging-providers.sh` | Pre-merge Slack policy into base policy before `install.sh`; remove broken Phase 1b restart (`openshell sandbox restart` does not exist); fail fast on missing policy files | | `scripts/nemoclaw-start.sh` | Add `slack.com` domain check to `isSlackRejection()` for proxy/network errors | | `test/nemoclaw-start.test.ts` | Update guard detection test to cover domain-based matching | | `test/local-slack-auth-test.sh` | Add T3b scenario for CONNECT tunnel failure to `api.slack.com` | - [ ] Re-run the `nightly-e2e` workflow — S1 and S2 should now pass - [ ] Verify the Slack policy pre-merge check correctly appends rules (idempotent: skips if `api.slack.com` already present) - [ ] Verify `npm test` passes (unit test updated for domain-based guard detection) - [ ] Run `test/local-slack-auth-test.sh` to verify T3b (CONNECT tunnel failure) is caught by the guard 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> * **New Features** * Injects a Slack network policy into the sandbox during setup when missing. * **Bug Fixes** * Expanded safety net to classify/log Slack-related network errors (domain-referenced rejections) without terminating the process. * **Tests** * Added/updated tests to verify sandbox policy injection, Slack connection handling, and updated preset expectations to include additional runtime tools (node, curl, python3). <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/lib/onboard.ts (2)
4020-4043:⚠️ Potential issue | 🟠 MajorPropagate the selected port into the current onboarding session.
actualDashboardPortis persisted, but the rest of this run still derives dashboard URLs fromprocess.env.CHAT_UI_URL/CONTROL_UI_PORT. On the first auto-allocation, the post-onboard banner can still tell the user to open the wrong port.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 4020 - 4043, The onboard flow persists actualDashboardPort but still builds post-onboard URLs from process.env.CHAT_UI_URL / CONTROL_UI_PORT; update the session state after computing actualDashboardPort so later URL generation uses the allocated port: set the in-memory chatUiUrl/dashboardUrl variables (or mutate the onboarding session object) to reflect actualDashboardPort and update process.env.CONTROL_UI_PORT or process.env.CHAT_UI_URL equivalents so functions that build the banner/URLs (where chatUiUrl and CONTROL_UI_PORT are used) pick up the allocated port; locate the code around ensureDashboardForward, actualDashboardPort, and registry.registerSandbox to perform this propagation.
6938-6950:⚠️ Potential issue | 🟠 MajorApply
--control-ui-portbefore preflight, not after it.The new flag only reaches
createSandbox(). Preflight still hard-checks the default dashboard port, sonemoclaw onboard --control-ui-port <free-port>can still fail early if18789is occupied. That makes the override unable to bypass the conflict it was added for.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 6938 - 6950, Preflight is still validating the default dashboard port (18789) before the CLI flag is applied, so move the application of opts.controlUiPort to run before preflight: ensure the port override (opts.controlUiPort) is assigned to the same variable or config that preflight uses (or pass it into the preflight call) so the preflight check uses the user-specified port instead of the hardcoded 18789; specifically, apply opts.controlUiPort to the dashboard port variable/config prior to invoking the preflight routine and keep passing it into createSandbox(...) as currently done.
🧹 Nitpick comments (3)
src/lib/onboard-command.test.ts (1)
223-290: Add one regression test for precedence overCHAT_UI_URL.The new tests validate parsing, but not the contract that
--control-ui-portshould win over env-driven dashboard URL/port selection.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-command.test.ts` around lines 223 - 290, Add a regression test asserting that CLI --control-ui-port takes precedence over the CHAT_UI_URL env var: call parseOnboardArgs (or runOnboardCommand) with env containing CHAT_UI_URL set to a URL with a different port and provide "--control-ui-port" on the args; then assert the returned result.controlUiPort equals the CLI port (not the port parsed from CHAT_UI_URL) and that runOnboardCommand's logged/used port matches the CLI value; reference parseOnboardArgs and runOnboardCommand to locate where to add the test.src/lib/inventory-commands.ts (1)
127-129: Prefer explicit null checks fordashboardPortrendering.Using truthiness works for current ports, but
!= nullis clearer and avoids accidental suppression if semantics ever expand.♻️ Proposed refactor
- if (sb.dashboardPort) { + if (sb.dashboardPort != null) { log(` dashboard: http://127.0.0.1:${sb.dashboardPort}/`); } @@ - const portSuffix = sb.dashboardPort ? ` :${sb.dashboardPort}` : ""; + const portSuffix = sb.dashboardPort != null ? ` :${sb.dashboardPort}` : "";Also applies to: 158-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/inventory-commands.ts` around lines 127 - 129, Replace the truthy check for sb.dashboardPort with an explicit null/undefined check to avoid accidental suppression of valid falsy port values; update the condition in the block that logs the dashboard URL (currently using "if (sb.dashboardPort)") to "if (sb.dashboardPort != null)" and apply the same change to the other similar check around the later logging (the second occurrence at the block referenced by lines ~158-159) so both use explicit != null checks on sb.dashboardPort.src/lib/ports.ts (1)
43-46: Avoid duplicating the default dashboard port literal.
DASHBOARD_PORT_RANGE_STARTduplicatesSANDBOX_DASHBOARD_PORT(18789). Deriving from the existing constant avoids future drift.♻️ Proposed refactor
-export const DASHBOARD_PORT_RANGE_START = 18789; +export const DASHBOARD_PORT_RANGE_START = SANDBOX_DASHBOARD_PORT;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/ports.ts` around lines 43 - 46, The DASHBOARD_PORT_RANGE_START constant duplicates the SANDBOX_DASHBOARD_PORT numeric literal; change DASHBOARD_PORT_RANGE_START to derive from the existing SANDBOX_DASHBOARD_PORT constant (e.g., set DASHBOARD_PORT_RANGE_START = SANDBOX_DASHBOARD_PORT) so the default port is defined in one place; update any exports or references to continue using DASHBOARD_PORT_RANGE_START and ensure SANDBOX_DASHBOARD_PORT is in scope (import or reference the same module) so there’s no duplicated literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard-command.ts`:
- Around line 92-109: The CLI flag parsing correctly sets controlUiPort but
downstream onboarding still lets process.env.CHAT_UI_URL override it; update the
chatUiUrl construction in onboard.ts so that when the parsed controlUiPort (from
onboard-command.ts) is non-null it takes precedence and is used to build the
chat UI URL, otherwise fall back to process.env.CHAT_UI_URL and then the
existing default; locate the chatUiUrl assignment and replace the current
`process.env.CHAT_UI_URL || ...` precedence with a conditional that uses
controlUiPort first (e.g., check controlUiPort !== null) so the CLI flag cannot
be bypassed by the environment variable.
In `@src/lib/onboard.ts`:
- Around line 6212-6224: Existing forwards owned by the same sandbox are not
cleaned up; after computing existingForwards (from
runCaptureOpenshell(["forward","list"])) and before starting the new forward,
iterate the parsed existing forwards to find entries whose owner equals
sandboxName and call runOpenshell(["forward","stop", port], { ignoreError: true
}) for each stale port (skip stopping the port that will be used as actualPort
if you already handle it separately). Update the block around
existingForwards/actualPort/getDashboardForwardTarget and the runOpenshell
stop/start calls so all forwards owned by sandboxName (except the desired active
port) are stopped before starting the new background forward. Ensure you use the
existing runOpenshell helper, respect ignoreError: true, and reference
sandboxName, existingForwards, actualPort, runOpenshell and runCaptureOpenshell
when locating the code to change.
- Around line 3368-3370: The port-selection logic currently uses CHAT_UI_URL
before the CLI flag and never reads a persisted dashboardPort; change it so
effectivePort follows the desired precedence: if controlUiPort is set use it;
else if process.env.CHAT_UI_URL is set use that URL as chatUiUrl and parse its
port into effectivePort; else if agent?.dashboardPort use that; else if
agent?.forwardPort use that; otherwise fallback to CONTROL_UI_PORT. Ensure
chatUiUrl is the raw CHAT_UI_URL when provided, otherwise build it from the
computed effectivePort. Update references to effectivePort, chatUiUrl,
controlUiPort, process.env.CHAT_UI_URL, agent.dashboardPort and
agent.forwardPort accordingly.
In `@src/lib/registry.ts`:
- Line 25: The registerSandbox routine is building and persisting a sandbox
object but omits the dashboardPort property so any provided dashboardPort is
lost; update the object constructed/returned in registerSandbox to include
dashboardPort (preserve null/undefined as appropriate) and ensure the persisted
payload written by registerSandbox includes the dashboardPort field (update the
create/update payload and any mappings) so the stored sandbox record contains
the dashboardPort value.
In `@test/policies.test.ts`:
- Around line 167-172: The test "local-inference preset restricts binaries to
expected set" in policies.test.ts is missing an assertion for the alternate Node
path; update the test that calls policies.loadPreset("local-inference") to also
assert that the returned content contains "/usr/bin/node" (in addition to the
existing checks for "/usr/local/bin/node", "/usr/local/bin/openclaw", and
"/usr/local/bin/claude") so the allowed apt-based Node binary path is covered.
---
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 4020-4043: The onboard flow persists actualDashboardPort but still
builds post-onboard URLs from process.env.CHAT_UI_URL / CONTROL_UI_PORT; update
the session state after computing actualDashboardPort so later URL generation
uses the allocated port: set the in-memory chatUiUrl/dashboardUrl variables (or
mutate the onboarding session object) to reflect actualDashboardPort and update
process.env.CONTROL_UI_PORT or process.env.CHAT_UI_URL equivalents so functions
that build the banner/URLs (where chatUiUrl and CONTROL_UI_PORT are used) pick
up the allocated port; locate the code around ensureDashboardForward,
actualDashboardPort, and registry.registerSandbox to perform this propagation.
- Around line 6938-6950: Preflight is still validating the default dashboard
port (18789) before the CLI flag is applied, so move the application of
opts.controlUiPort to run before preflight: ensure the port override
(opts.controlUiPort) is assigned to the same variable or config that preflight
uses (or pass it into the preflight call) so the preflight check uses the
user-specified port instead of the hardcoded 18789; specifically, apply
opts.controlUiPort to the dashboard port variable/config prior to invoking the
preflight routine and keep passing it into createSandbox(...) as currently done.
---
Nitpick comments:
In `@src/lib/inventory-commands.ts`:
- Around line 127-129: Replace the truthy check for sb.dashboardPort with an
explicit null/undefined check to avoid accidental suppression of valid falsy
port values; update the condition in the block that logs the dashboard URL
(currently using "if (sb.dashboardPort)") to "if (sb.dashboardPort != null)" and
apply the same change to the other similar check around the later logging (the
second occurrence at the block referenced by lines ~158-159) so both use
explicit != null checks on sb.dashboardPort.
In `@src/lib/onboard-command.test.ts`:
- Around line 223-290: Add a regression test asserting that CLI
--control-ui-port takes precedence over the CHAT_UI_URL env var: call
parseOnboardArgs (or runOnboardCommand) with env containing CHAT_UI_URL set to a
URL with a different port and provide "--control-ui-port" on the args; then
assert the returned result.controlUiPort equals the CLI port (not the port
parsed from CHAT_UI_URL) and that runOnboardCommand's logged/used port matches
the CLI value; reference parseOnboardArgs and runOnboardCommand to locate where
to add the test.
In `@src/lib/ports.ts`:
- Around line 43-46: The DASHBOARD_PORT_RANGE_START constant duplicates the
SANDBOX_DASHBOARD_PORT numeric literal; change DASHBOARD_PORT_RANGE_START to
derive from the existing SANDBOX_DASHBOARD_PORT constant (e.g., set
DASHBOARD_PORT_RANGE_START = SANDBOX_DASHBOARD_PORT) so the default port is
defined in one place; update any exports or references to continue using
DASHBOARD_PORT_RANGE_START and ensure SANDBOX_DASHBOARD_PORT is in scope (import
or reference the same module) so there’s no duplicated literal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8b3e74e3-4d7b-462d-8158-a669dade4f64
📒 Files selected for processing (8)
src/lib/inventory-commands.tssrc/lib/onboard-command.test.tssrc/lib/onboard-command.tssrc/lib/onboard.tssrc/lib/ports.tssrc/lib/registry.tstest/onboard.test.tstest/policies.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Line 217: The new trap replaces the existing EXIT handler (registered by
sandbox-teardown.sh) and prevents sandbox teardown; preserve and chain the
existing handler instead of overwriting it by reading the current EXIT trap
(trap -p EXIT) into a variable and installing a new EXIT trap that first invokes
the original handler and then performs the policy cleanup (using BASE_POLICY_BAK
and BASE_POLICY), or wrap the cleanup in a function that calls the saved
original handler before running cp "$BASE_POLICY_BAK" "$BASE_POLICY" && rm -f
"$BASE_POLICY_BAK".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eec121bf-ed86-40f3-8f5d-c40632547d93
📒 Files selected for processing (5)
scripts/nemoclaw-start.shtest/e2e/test-messaging-providers.shtest/local-slack-auth-test.shtest/nemoclaw-start.test.tstest/policies.test.ts
✅ Files skipped from review due to trivial changes (1)
- test/policies.test.ts
- Fix --control-ui-port precedence over CHAT_UI_URL env var - Consult registry dashboardPort for resumed sandboxes - Update chatUiUrl after auto-allocation for correct banner - Pass controlUiPort to preflight port check - Use explicit != null checks for dashboardPort rendering - Derive DASHBOARD_PORT_RANGE_START from constant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
2842-2864:⚠️ Potential issue | 🔴 CriticalAuto-allocation never gets a chance in preflight.
_preflightDashboardPortis documented asnull => skip dashboard port check, butconst dashboardPortToCheck = _preflightDashboardPort ?? DASHBOARD_PORT;turns that back into18789. The default path still hard-fails in preflight when18789is busy, so onboarding never reachesensureDashboardForward(). The hard-codedDASHBOARD_PORTchecks below also miss non-default--control-ui-portvalues.Suggested fix
- const dashboardPortToCheck = _preflightDashboardPort ?? DASHBOARD_PORT; + const dashboardPortToCheck = _preflightDashboardPort; const requiredPorts = [ { port: GATEWAY_PORT, label: "OpenShell gateway" }, ...(dashboardPortToCheck !== null ? [{ port: dashboardPortToCheck, label: "NemoClaw dashboard" }] : []), ]; @@ - if ((port === GATEWAY_PORT || port === DASHBOARD_PORT) && gatewayReuseState === "healthy") { + if ( + (port === GATEWAY_PORT || port === dashboardPortToCheck) && + gatewayReuseState === "healthy" + ) { console.log(` ✓ Port ${port} already owned by healthy NemoClaw runtime (${label})`); continue; } @@ - if (port === DASHBOARD_PORT && portCheck.process === "ssh" && portCheck.pid) { + if (port === dashboardPortToCheck && portCheck.process === "ssh" && portCheck.pid) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 2842 - 2864, The preflight logic forces a dashboard port check by computing dashboardPortToCheck = _preflightDashboardPort ?? DASHBOARD_PORT which converts a documented null (meaning "skip check") into the default DASHBOARD_PORT and prevents auto-allocation and non-default --control-ui-port handling; change the logic so that when _preflightDashboardPort is strictly null you do not include a dashboard entry in requiredPorts (i.e. set dashboardPortToCheck to null when _preflightDashboardPort === null, or build requiredPorts conditionally), and ensure checks that compare against DASHBOARD_PORT/GATEWAY_PORT use the actual requested port variable (dashboardPortToCheck or the explicit control-ui port) when deciding the gatewayReuseState and SSH cleanup branches; update references in this block around dashboardPortToCheck, requiredPorts, checkPortAvailable and any comparisons to DASHBOARD_PORT so preflight will skip the dashboard check when _preflightDashboardPort is null and honor non-default control-ui-port values before calling ensureDashboardForward().
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)
6228-6241:⚠️ Potential issue | 🟠 MajorStop this sandbox's old forwards before starting the new one.
If this sandbox already owns a different forwarded port, only stopping
actualPortleaves the old forward behind. That leaks stale dashboard URLs and can eventually consume the whole18789-18799pool.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 6228 - 6241, The code currently only stops the forward at actualPort, leaving any other forwards owned by this sandbox running and leaking ports; modify the flow around runOpenshell/forward to first parse existingForwards (the result of runCaptureOpenshell(["forward", "list"])) and stop any forwards whose owner equals sandboxName and whose port !== String(actualPort) before starting the new forward; reference findAvailableDashboardPort, existingForwards, sandboxName, actualPort and use runOpenshell(["forward","stop", <port>], { ignoreError: true }) for each stale port to ensure all previous forwards for this sandbox are cleaned up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6182-6219: The parser in getOccupiedPorts currently treats any
forward list line with a numeric port as occupied, causing stopped/stale
forwards to block ports; update getOccupiedPorts (or switch to the canonical
parser from sandbox-session-state.ts) to only include entries whose status is
"running" (i.e., filter out stopped entries) before mapping port → owner so
findAvailableDashboardPort only considers active forwards within
DASHBOARD_PORT_RANGE_START..END.
- Around line 4034-4037: The local rewrite of chatUiUrl (using
actualDashboardPort and getDashboardForwardPort) doesn't affect what
printDashboard() / getDashboardAccessInfo() output because they read
process.env.CHAT_UI_URL or the default port; update the shared source of truth
instead: after computing actualDashboardPort and chatUiUrl, set
process.env.CHAT_UI_URL (or the persisted dashboardPort variable used by
getDashboardAccessInfo/printDashboard) to reflect the new URL/port so subsequent
calls to getDashboardAccessInfo() and printDashboard() produce links with the
allocated port; reference chatUiUrl, actualDashboardPort,
getDashboardForwardPort, process.env.CHAT_UI_URL, getDashboardAccessInfo, and
printDashboard when making the change.
- Around line 3377-3383: The code picks effectivePort early but discovers
actualDashboardPort only after sandbox creation, causing the UI URL to point to
a port with no listener; before building the sandbox (and before calling
patchStagedDockerfile(), setting sandbox env, and the readiness probe), resolve
and lock the final dashboard port (compute actualDashboardPort using the same
logic that handles forwardPort conflicts and registry.getSandbox(sandboxName)
resolution), then set effectivePort/chatUiUrl to that resolved
actualDashboardPort so patchStagedDockerfile(), sandbox environment variables,
and readiness probe all use the final allocated port rather than a provisional
one; update references to persistedPort, effectivePort, chatUiUrl, and usages in
patchStagedDockerfile()/readiness probe to use this precomputed
actualDashboardPort.
---
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 2842-2864: The preflight logic forces a dashboard port check by
computing dashboardPortToCheck = _preflightDashboardPort ?? DASHBOARD_PORT which
converts a documented null (meaning "skip check") into the default
DASHBOARD_PORT and prevents auto-allocation and non-default --control-ui-port
handling; change the logic so that when _preflightDashboardPort is strictly null
you do not include a dashboard entry in requiredPorts (i.e. set
dashboardPortToCheck to null when _preflightDashboardPort === null, or build
requiredPorts conditionally), and ensure checks that compare against
DASHBOARD_PORT/GATEWAY_PORT use the actual requested port variable
(dashboardPortToCheck or the explicit control-ui port) when deciding the
gatewayReuseState and SSH cleanup branches; update references in this block
around dashboardPortToCheck, requiredPorts, checkPortAvailable and any
comparisons to DASHBOARD_PORT so preflight will skip the dashboard check when
_preflightDashboardPort is null and honor non-default control-ui-port values
before calling ensureDashboardForward().
---
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 6228-6241: The code currently only stops the forward at
actualPort, leaving any other forwards owned by this sandbox running and leaking
ports; modify the flow around runOpenshell/forward to first parse
existingForwards (the result of runCaptureOpenshell(["forward", "list"])) and
stop any forwards whose owner equals sandboxName and whose port !==
String(actualPort) before starting the new forward; reference
findAvailableDashboardPort, existingForwards, sandboxName, actualPort and use
runOpenshell(["forward","stop", <port>], { ignoreError: true }) for each stale
port to ensure all previous forwards for this sandbox are cleaned up.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: af741e6b-030e-4647-932d-5dcef0fdf5f0
📒 Files selected for processing (3)
src/lib/inventory-commands.tssrc/lib/onboard.tssrc/lib/ports.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/inventory-commands.ts
- src/lib/ports.ts
|
✨ Thanks for submitting this pull request that proposes a way to fix a bug that prevents first-time config sets under unset namespaces. Related open issues: |
Merge conflicts resolved in src/lib/onboard.ts, test/onboard.test.ts, and test/policies.test.ts. In each case, kept the feature branch's auto-allocate additions while incorporating main's refactors (requirePresetContent helper, regex update, ensureDashboardForward refactoring). CodeRabbit feedback addressed: 1. (Critical) Resolve dashboard port before building sandbox so CHAT_UI_URL baked into the Dockerfile, sandbox env, and readiness probe all use the final forwarded port. Previously effectivePort was provisional and actualDashboardPort was only discovered post-build, causing a host→container port mismatch on the conflict path. 2. Set process.env.CHAT_UI_URL after port allocation so printDashboard and getDashboardAccessInfo see the actual port instead of re-reading a stale env var or the default. 3. Clean up stale forwards owned by the same sandbox on other ports before starting the new forward, preventing forward leaks that exhaust the 18789-18799 range over time. 4. Persist dashboardPort in registerSandbox — the field existed on SandboxEntry but was silently dropped during serialization. 5. Add /usr/bin/node assertion to local-inference preset test. 6. Preserve existing EXIT trap handler in test-messaging-providers.sh so sandbox-teardown.sh cleanup is not silently discarded. 7. Filter stopped forwards in getOccupiedPorts so stale entries don't block port allocation or trigger false "range exhausted" errors. Also skips the header row and adds TypeScript type annotations. 8. Add TypeScript types to getOccupiedPorts, findAvailableDashboardPort, ensureDashboardForward, and controlUiPort parameter/OnboardOptions.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/lib/onboard.ts (3)
6794-6805:⚠️ Potential issue | 🟠 MajorTreat only running forwards as occupied.
getOccupiedPorts()skips"stopped", but it still counts any other status or malformed row as occupied. That can falsely block allocation or trigger a bogus range-exhausted error. The canonical parser insrc/lib/sandbox-session-state.ts:59-92already modelsstatus; this helper should either reuse it or filter tostatus === "running"only.Suggested fix
- const status = (parts[4] || "").toLowerCase(); - if (status === "stopped") continue; + const status = parts.slice(4).join(" ").toLowerCase(); + if (status !== "running") continue; occupied.set(parts[2], parts[0]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 6794 - 6805, getOccupiedPorts currently treats any non-"stopped" row as occupied and can falsely block ports; update getOccupiedPorts to only mark ports occupied when the parsed status equals "running" (i.e., replace the current check that skips "stopped" with an explicit status === "running" guard), or alternatively reuse the canonical parser from sandbox-session-state.ts (the parser that models status) to determine status and only set occupied.set(parts[2], parts[0]) when status === "running".
3995-3997:⚠️ Potential issue | 🟠 MajorUpdate the shared dashboard URL before these early returns.
These branches persist
dashboardPort, but they never rewriteprocess.env.CHAT_UI_URL.printDashboard()/getDashboardAccessInfo()still read from that env var (or fall back to18789), so reused sandboxes can still print the wrong URL after auto-allocation.Suggested fix
const reusedPort = ensureDashboardForward(sandboxName, chatUiUrl); + const parsedChatUiUrl = new URL( + chatUiUrl.includes("://") ? chatUiUrl : `http://${chatUiUrl}`, + ); + parsedChatUiUrl.port = String(reusedPort); + process.env.CHAT_UI_URL = parsedChatUiUrl.toString(); registry.updateSandbox(sandboxName, { dashboardPort: reusedPort }); return sandboxName;Apply the same rewrite in each early-return branch above.
Also applies to: 4025-4027, 4070-4072, 4087-4089
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3995 - 3997, The branches that call ensureDashboardForward(sandboxName, chatUiUrl) and then registry.updateSandbox(sandboxName, { dashboardPort: reusedPort }) return early without updating the shared CHAT_UI_URL, so printDashboard/getDashboardAccessInfo still read the old env var; fix by setting process.env.CHAT_UI_URL to the new chatUiUrl (or to the computed host:port value) immediately after registry.updateSandbox in each early-return branch (e.g., the branches that call ensureDashboardForward and return sandboxName), and apply the same change to the other similar early-return branches that update dashboardPort so the environment always reflects the reused sandbox URL.
3830-3843:⚠️ Potential issue | 🟠 MajorHonor
CHAT_UI_URLwhen deriving the effective dashboard port.This still ignores the env URL's port when computing
preferredPort/effectivePort. IfCHAT_UI_URLishttp://127.0.0.1:19000, the Dockerfile/env get19000but the allocator and readiness probe still operate on the persisted/default port, so the sandbox can start on one port while onboarding probes another.Suggested fix
- const persistedPort = registry.getSandbox(sandboxName)?.dashboardPort ?? null; - const preferredPort = controlUiPort ?? persistedPort ?? (agent ? agent.forwardPort : CONTROL_UI_PORT); + const persistedPort = registry.getSandbox(sandboxName)?.dashboardPort ?? null; + const baseChatUiUrl = + process.env.CHAT_UI_URL || + `http://127.0.0.1:${persistedPort ?? (agent ? agent.forwardPort : CONTROL_UI_PORT)}`; + const parsedChatUiUrl = new URL( + baseChatUiUrl.includes("://") ? baseChatUiUrl : `http://${baseChatUiUrl}`, + ); + if (controlUiPort != null) { + parsedChatUiUrl.port = String(controlUiPort); + } + const preferredPort = Number(parsedChatUiUrl.port || CONTROL_UI_PORT); const earlyForwards = runCaptureOpenshell(["forward", "list"], { ignoreError: true }); const effectivePort = findAvailableDashboardPort(sandboxName, preferredPort, earlyForwards); if (effectivePort !== preferredPort) { console.warn(` ! Port ${preferredPort} is taken. Using port ${effectivePort} instead.`); } - let chatUiUrl = - controlUiPort != null - ? `http://127.0.0.1:${effectivePort}` - : process.env.CHAT_UI_URL || `http://127.0.0.1:${effectivePort}`; + parsedChatUiUrl.port = String(effectivePort); + let chatUiUrl = parsedChatUiUrl.toString();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3830 - 3843, The code ignores the port in process.env.CHAT_UI_URL when computing preferredPort/effectivePort, causing probes and allocator to use a different port than CHAT_UI_URL; update the logic in onboard.ts so you parse the port from process.env.CHAT_UI_URL (e.g., new URL(process.env.CHAT_UI_URL).port) into a variable (envPort), then include that envPort in the preferredPort selection (change preferredPort = controlUiPort ?? envPort ?? persistedPort ?? (agent ? agent.forwardPort : CONTROL_UI_PORT)); keep using findAvailableDashboardPort(sandboxName, preferredPort, earlyForwards) and leave chatUiUrl assignment to use controlUiPort or process.env.CHAT_UI_URL as before so effectivePort and the URL are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3280-3289: The preflight uses DASHBOARD_PORT because
dashboardPortToCheck is set with nullish coalescing (_preflightDashboardPort ??
DASHBOARD_PORT), which forces a check even when no explicit port was provided;
change the assignment so dashboardPortToCheck is null when no explicit port was
supplied (e.g., set dashboardPortToCheck = _preflightDashboardPort === undefined
? null : _preflightDashboardPort) so the requiredPorts array will skip the
dashboard check in auto-allocation mode and allow
findAvailableDashboardPort()/ensureDashboardForward to pick a free port.
---
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 6794-6805: getOccupiedPorts currently treats any non-"stopped" row
as occupied and can falsely block ports; update getOccupiedPorts to only mark
ports occupied when the parsed status equals "running" (i.e., replace the
current check that skips "stopped" with an explicit status === "running" guard),
or alternatively reuse the canonical parser from sandbox-session-state.ts (the
parser that models status) to determine status and only set
occupied.set(parts[2], parts[0]) when status === "running".
- Around line 3995-3997: The branches that call
ensureDashboardForward(sandboxName, chatUiUrl) and then
registry.updateSandbox(sandboxName, { dashboardPort: reusedPort }) return early
without updating the shared CHAT_UI_URL, so
printDashboard/getDashboardAccessInfo still read the old env var; fix by setting
process.env.CHAT_UI_URL to the new chatUiUrl (or to the computed host:port
value) immediately after registry.updateSandbox in each early-return branch
(e.g., the branches that call ensureDashboardForward and return sandboxName),
and apply the same change to the other similar early-return branches that update
dashboardPort so the environment always reflects the reused sandbox URL.
- Around line 3830-3843: The code ignores the port in process.env.CHAT_UI_URL
when computing preferredPort/effectivePort, causing probes and allocator to use
a different port than CHAT_UI_URL; update the logic in onboard.ts so you parse
the port from process.env.CHAT_UI_URL (e.g., new
URL(process.env.CHAT_UI_URL).port) into a variable (envPort), then include that
envPort in the preferredPort selection (change preferredPort = controlUiPort ??
envPort ?? persistedPort ?? (agent ? agent.forwardPort : CONTROL_UI_PORT)); keep
using findAvailableDashboardPort(sandboxName, preferredPort, earlyForwards) and
leave chatUiUrl assignment to use controlUiPort or process.env.CHAT_UI_URL as
before so effectivePort and the URL are consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 082f3b3f-825d-4cc1-bcb4-cbc2e864e83c
📒 Files selected for processing (7)
nemoclaw-blueprint/scripts/ws-proxy-fix.jssrc/lib/onboard-command.test.tssrc/lib/onboard.tssrc/lib/registry.tstest/e2e/test-messaging-providers.shtest/onboard.test.tstest/policies.test.ts
✅ Files skipped from review due to trivial changes (1)
- nemoclaw-blueprint/scripts/ws-proxy-fix.js
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard-command.test.ts
…ctive When no explicit --control-ui-port is set, the preflight check was still falling back to DASHBOARD_PORT (18789), which would fail if that port was already taken by another sandbox — before findAvailableDashboardPort ever got a chance to auto-allocate a free port. This reintroduced the #2174 regression for the common multi-sandbox case. Fix: default dashboardPortToCheck to null instead of DASHBOARD_PORT so the preflight skips the dashboard port check entirely when auto-allocation is active. The explicit --control-ui-port path still checks the user-specified port.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/lib/onboard.ts (2)
3995-3997:⚠️ Potential issue | 🟠 MajorUpdate the shared dashboard URL before these early returns.
These branches persist
dashboardPort, but they never normalizechatUiUrl/process.env.CHAT_UI_URLto the port returned byensureDashboardForward().printDashboard()will still read the old env/default value and can announce the wrong URL for reuse and backup-abort flows.Also applies to: 4025-4027, 4070-4072, 4087-4089
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3995 - 3997, The early-return branches call ensureDashboardForward(sandboxName, chatUiUrl) and registry.updateSandbox(sandboxName, { dashboardPort: reusedPort }) but never normalize the shared CHAT_UI_URL, so printDashboard reads the old URL; after calling ensureDashboardForward and before returning from functions that reuse/backup-abort, update the normalized chat UI URL (e.g. reconstruct chatUiUrl with the returned reusedPort and set process.env.CHAT_UI_URL or the module-level shared variable) so printDashboard sees the correct URL; apply the same change around the other reuse branches that call ensureDashboardForward/registry.updateSandbox (the occurrences around the reusedPort returns).
3830-3843:⚠️ Potential issue | 🟠 MajorKeep
effectivePortandCHAT_UI_URLon the same port.When
CHAT_UI_URLis set and--control-ui-portis not, Line 3843 keeps the env URL, but Lines 3834-3836 still resolveeffectivePortfrom the registry/default path. The image is then built with one port while Line 4490 probes another, so onboarding can wait on a listener that will never exist.Also applies to: 4488-4491
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3830 - 3843, The code can end up building the image with one port and using a different port for readiness because preferredPort is chosen without considering process.env.CHAT_UI_URL; ensure CHAT_UI_URL and effectivePort stay in sync by parsing a port from process.env.CHAT_UI_URL when --control-ui-port is not provided and using that port as part of preferredPort (so findAvailableDashboardPort(..., preferredPort, ...) uses the same port), or alternatively always construct chatUiUrl from the computed effectivePort; update the logic around preferredPort, findAvailableDashboardPort, effectivePort and chatUiUrl to pick the CHAT_UI_URL port when present (and fall back to registry.getSandbox(...).dashboardPort, agent.forwardPort, CONTROL_UI_PORT) so the built image and readiness probe use the same final port.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6815-6835: findAvailableDashboardPort currently trusts openshell
"forward list" (via getOccupiedPorts and forwardListOutput) and can return ports
already bound on the host; change the allocator to validate host-port
availability before selecting or returning a port. Specifically, in
findAvailableDashboardPort (and the related ensureDashboardForward usage), after
determining a candidate port (preferredPort or loop candidate between
DASHBOARD_PORT_RANGE_START and DASHBOARD_PORT_RANGE_END) perform a real host
check (e.g., attempt a short-lived bind/listen or OS-level port probe) and skip
any candidate that fails the host-availability check; only return or persist a
port that passes both the occupied-owner check and the host-port-availability
check, and include this validation when populating the owners/error message if
no usable port is found.
---
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 3995-3997: The early-return branches call
ensureDashboardForward(sandboxName, chatUiUrl) and
registry.updateSandbox(sandboxName, { dashboardPort: reusedPort }) but never
normalize the shared CHAT_UI_URL, so printDashboard reads the old URL; after
calling ensureDashboardForward and before returning from functions that
reuse/backup-abort, update the normalized chat UI URL (e.g. reconstruct
chatUiUrl with the returned reusedPort and set process.env.CHAT_UI_URL or the
module-level shared variable) so printDashboard sees the correct URL; apply the
same change around the other reuse branches that call
ensureDashboardForward/registry.updateSandbox (the occurrences around the
reusedPort returns).
- Around line 3830-3843: The code can end up building the image with one port
and using a different port for readiness because preferredPort is chosen without
considering process.env.CHAT_UI_URL; ensure CHAT_UI_URL and effectivePort stay
in sync by parsing a port from process.env.CHAT_UI_URL when --control-ui-port is
not provided and using that port as part of preferredPort (so
findAvailableDashboardPort(..., preferredPort, ...) uses the same port), or
alternatively always construct chatUiUrl from the computed effectivePort; update
the logic around preferredPort, findAvailableDashboardPort, effectivePort and
chatUiUrl to pick the CHAT_UI_URL port when present (and fall back to
registry.getSandbox(...).dashboardPort, agent.forwardPort, CONTROL_UI_PORT) so
the built image and readiness probe use the same final port.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 33651536-aecb-4124-93c3-2ffed53955ab
📒 Files selected for processing (1)
src/lib/onboard.ts
…rd port - getOccupiedPorts: filter to status === "running" only, not just skip "stopped" — prevents unknown/malformed statuses from falsely blocking ports - findAvailableDashboardPort: add host-port availability check via lsof so ports bound by non-OpenShell processes are skipped during auto-allocation - Parse CHAT_UI_URL env port into preferredPort precedence chain so the allocator, Dockerfile, and readiness probe all use the same port - Build chatUiUrl from effectivePort while preserving CHAT_UI_URL hostname so remote-access URLs stay correct after auto-allocation - Set process.env.CHAT_UI_URL before returning from all 4 early-return branches (reuse, interactive-reuse, backup-abort, backup-throw) so printDashboard/getDashboardAccessInfo see the allocated port - Add regression test: --control-ui-port parsed independently of CHAT_UI_URL
Addressing remaining CodeRabbit review commentsCommit 619d182 resolves all open CodeRabbit findings from the latest review rounds. Here's the mapping: Inline comment (new)
Duplicate comments (repeated across 3 review rounds)
Nitpick comments
|
PR Review: feat(onboard): auto-allocate dashboard port for multi-sandboxFiles Changed: 9 | Lines: +304 / -43 🔴 Blockers (must fix before merge)1. Duplicate PR #2454 — merge conflict guaranteed PR #2454 by @cjagwani implements the identical feature (auto-allocate dashboard port, Decision needed: Which PR to land? #2411 was opened first, but #2454 has more unit tests (6 port-allocation tests, 2 inventory tests) and extracts 2. Double port-conflict warning —
Suggested fix: Have 🟡 Warnings (should fix)1. This follows the existing 2.
3. Missing unit tests for port allocation logic The onboard-command tests cover CLI flag parsing well (5 new tests), but 4. 🔵 Suggestions (nice to have)1. Extract port helpers into
2. Status format — 3. Port range size — ✅ What's Good
|
…e-dashboard-port # Conflicts: # src/lib/onboard-command.ts
## Summary Refreshes the 0.0.29 documentation for user-facing changes merged in the past 24 hours. Version metadata stays on `0.0.29`. ## Changes - `docs/get-started/quickstart.md`, `docs/reference/commands.md`, and `docs/reference/troubleshooting.md`: Document dashboard port auto-allocation, `--control-ui-port`, and `nemoclaw list` dashboard URL output from [#2411](#2411). - `docs/inference/inference-options.md` and `docs/inference/switch-inference-providers.md`: Document local Ollama and local vLLM credential isolation from `OPENAI_API_KEY` from [#2580](#2580). - `docs/inference/inference-options.md`: Document Local NVIDIA NIM validation behavior from [#2505](#2505). - `docs/reference/commands.md`: Document the cloud-only NIM status display behavior from [#2622](#2622). - `docs/deployment/deploy-to-remote-gpu.md`: Clarify runtime propagation for `NEMOCLAW_PROXY_HOST` and `NEMOCLAW_PROXY_PORT` from [#2581](#2581). - `docs/workspace/backup-restore.md`: Document snapshot restore symlink handling for sandbox data paths from [#2488](#2488). - `docs/reference/commands.md`: Document `skill install --help` and OpenClaw plugin-shaped directory guidance from [#2585](#2585). ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Codex --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added `--control-ui-port` flag for explicit dashboard port control * Implemented automatic port selection (18789–18799) when the default port is occupied * Clarified that local inference routes (Ollama, local vLLM) don't require `OPENAI_API_KEY` * Improved dashboard URL display in list and status commands * Enhanced symlink handling in workspace backup restoration * Updated multi-sandbox quickstart and troubleshooting guidance <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…2174 followup) (#2627) ## Summary Followup to #2411 (merged). Closes the second half of #2174's title — "leaks ghost sandbox" — that the merged fix didn't address. When `findAvailableDashboardPort` exhausts the port range it `throw`s. By the time `ensureDashboardForward` runs (`src/lib/onboard.ts:3830`), the openshell sandbox has already been streamed into existence (`streamSandboxCreate` at line 3734, ready-wait at lines 3789–3803). The throw escapes unhandled, leaving the sandbox in `openshell sandbox list` with no NemoClaw registry entry — exactly the drift reported in #2174. Issue comment with the analysis: #2174 (comment) ## Related Issue Followup to #2174 (closed by #2411). #2411 fixed the crash; this PR fixes the leak. ## Changes - `src/lib/onboard.ts` — `ensureDashboardForward` takes a new `{ rollbackSandboxOnFailure?: boolean }` option. On unrecoverable port-allocation failure (range exhaustion), the just-created openshell sandbox is deleted, an actionable error is printed, and the process exits with code 1 — mirroring the not-ready rollback pattern already in place at lines 3789–3803. - The post-create call site (line 3830) opts in (`rollbackSandboxOnFailure: true`). - The four reuse-path call sites (lines 3295, 3326, 3372, 3390) keep the default (`false`) — they operate on already-existing sandboxes where deleting on alloc failure would destroy user state. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npm run build:cli` + `npm run typecheck:cli` green - [x] `npx vitest run --project cli` — 2250/2250 passed - [x] `npx vitest run --project plugin` — 410/410 passed - [x] `npx prek run --files src/lib/onboard.ts` — green - [x] No new tests added: the rollback path mirrors the existing not-ready rollback (3789–3803), which also has no dedicated unit test. Adding test infrastructure here would exceed the minimum scope of this followup. - [x] No secrets, API keys, or credentials committed ## Notes for reviewer - Diff is ~35 lines on a single file. Surgical fix matching an existing pattern in the same function. - Doc comment on `ensureDashboardForward` updated to describe the new option and reference the pattern it mirrors. ## AI Disclosure - [x] AI-assisted — tool: Claude Code Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Sandbox initialization now attempts automatic rollback if dashboard port allocation fails, and exits safely when rollback succeeds. * When automatic cleanup fails, users are shown clear manual-cleanup instructions and the exact command to remove the orphaned sandbox. * **Tests** * Added runtime tests verifying rollback messaging covers both successful and failed cleanup scenarios and accurate sandbox naming in instructions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
) ## Summary Adds two regression assertions to Phase 4 of `test-double-onboard.sh` that verify the #2174 auto-allocation fix works end-to-end: 1. **Log check** — Second-sandbox onboard output contains `"is taken. Using port"` (confirms `ensureDashboardForward` auto-allocated) 2. **List check** — `nemoclaw list` shows two distinct `dashboard: http://127.0.0.1:<port>` entries for both sandboxes ## Related - Guards against regression of #2174 (fix landed via #2411 + #2627) - Supersedes #2455 (which had a stale assertion string `"allocated port"` that doesn't match any runtime output, and conflicting workflow changes already on main) ## Changes - `test/e2e/test-double-onboard.sh` — 21 lines added between the existing #849 regression check and Phase 5 ## Why this supersedes #2455 | Issue | #2455 | This PR | |-------|-------|---------| | Assertion string | `"allocated port"` (doesn't exist in codebase) | `"is taken. Using port"` (matches `ensureDashboardForward` line ~6600) | | Workflow changes | Adds a `double-onboard-e2e` job that already exists on main → merge conflict | None needed — job already runs this script | | Branch freshness | 95 commits behind main | Based on current main | ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `bash -n test/e2e/test-double-onboard.sh` — syntax check passes - [x] shellcheck clean - [x] No workflow changes needed — `double-onboard-e2e` nightly job already runs this script - [x] No secrets, API keys, or credentials committed ## AI Disclosure - [x] AI-assisted — tool: Claude (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end test assertions for multi-sandbox onboarding scenarios * Validates dashboard port auto-allocation prevents conflicts and maintains isolation across concurrent sandbox environments <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
…#2411) ## Summary Fixes NVIDIA#2174. When onboarding a second sandbox while the first holds port 18789, the dashboard forward now auto-allocates the next free port instead of crashing. ## Related Issue Fixes NVIDIA#2174 ## Changes - **`--control-ui-port <N>` CLI flag**: Explicit port override (takes precedence over `CHAT_UI_URL` env var and defaults) - **Auto-allocation**: When the preferred port is taken by another sandbox, scans 18789-18799 for the next free port and warns the user - **Registry persistence**: `dashboardPort` stored in `sandboxes.json` so it survives across sessions and `--resume` flows - **`nemoclaw list`**: Shows `dashboard: http://127.0.0.1:{port}/` per sandbox - **`nemoclaw status`**: Shows `:port` suffix per sandbox line - **Range exhaustion**: Throws with actionable error listing each port's owner and suggesting `--control-ui-port` with a port outside the range - **Pre-existing test fix**: Updated `local-inference` preset test to match current YAML (node binary was added) ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes (CLI project: 1965/1965) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added `--control-ui-port` CLI flag to specify the dashboard port during sandbox onboarding * Dashboard port information now displayed in `nemoclaw status` and `nemoclaw list` commands * **Improvements** * Enhanced dashboard port allocation to automatically find the next available port instead of failing when the preferred port is occupied * Improved port management and persistence across sandbox operations <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
## Summary Refreshes the 0.0.29 documentation for user-facing changes merged in the past 24 hours. Version metadata stays on `0.0.29`. ## Changes - `docs/get-started/quickstart.md`, `docs/reference/commands.md`, and `docs/reference/troubleshooting.md`: Document dashboard port auto-allocation, `--control-ui-port`, and `nemoclaw list` dashboard URL output from [NVIDIA#2411](NVIDIA#2411). - `docs/inference/inference-options.md` and `docs/inference/switch-inference-providers.md`: Document local Ollama and local vLLM credential isolation from `OPENAI_API_KEY` from [NVIDIA#2580](NVIDIA#2580). - `docs/inference/inference-options.md`: Document Local NVIDIA NIM validation behavior from [NVIDIA#2505](NVIDIA#2505). - `docs/reference/commands.md`: Document the cloud-only NIM status display behavior from [NVIDIA#2622](NVIDIA#2622). - `docs/deployment/deploy-to-remote-gpu.md`: Clarify runtime propagation for `NEMOCLAW_PROXY_HOST` and `NEMOCLAW_PROXY_PORT` from [NVIDIA#2581](NVIDIA#2581). - `docs/workspace/backup-restore.md`: Document snapshot restore symlink handling for sandbox data paths from [NVIDIA#2488](NVIDIA#2488). - `docs/reference/commands.md`: Document `skill install --help` and OpenClaw plugin-shaped directory guidance from [NVIDIA#2585](NVIDIA#2585). ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Codex --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added `--control-ui-port` flag for explicit dashboard port control * Implemented automatic port selection (18789–18799) when the default port is occupied * Clarified that local inference routes (Ollama, local vLLM) don't require `OPENAI_API_KEY` * Improved dashboard URL display in list and status commands * Enhanced symlink handling in workspace backup restoration * Updated multi-sandbox quickstart and troubleshooting guidance <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…VIDIA#2174 followup) (NVIDIA#2627) ## Summary Followup to NVIDIA#2411 (merged). Closes the second half of NVIDIA#2174's title — "leaks ghost sandbox" — that the merged fix didn't address. When `findAvailableDashboardPort` exhausts the port range it `throw`s. By the time `ensureDashboardForward` runs (`src/lib/onboard.ts:3830`), the openshell sandbox has already been streamed into existence (`streamSandboxCreate` at line 3734, ready-wait at lines 3789–3803). The throw escapes unhandled, leaving the sandbox in `openshell sandbox list` with no NemoClaw registry entry — exactly the drift reported in NVIDIA#2174. Issue comment with the analysis: NVIDIA#2174 (comment) ## Related Issue Followup to NVIDIA#2174 (closed by NVIDIA#2411). NVIDIA#2411 fixed the crash; this PR fixes the leak. ## Changes - `src/lib/onboard.ts` — `ensureDashboardForward` takes a new `{ rollbackSandboxOnFailure?: boolean }` option. On unrecoverable port-allocation failure (range exhaustion), the just-created openshell sandbox is deleted, an actionable error is printed, and the process exits with code 1 — mirroring the not-ready rollback pattern already in place at lines 3789–3803. - The post-create call site (line 3830) opts in (`rollbackSandboxOnFailure: true`). - The four reuse-path call sites (lines 3295, 3326, 3372, 3390) keep the default (`false`) — they operate on already-existing sandboxes where deleting on alloc failure would destroy user state. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npm run build:cli` + `npm run typecheck:cli` green - [x] `npx vitest run --project cli` — 2250/2250 passed - [x] `npx vitest run --project plugin` — 410/410 passed - [x] `npx prek run --files src/lib/onboard.ts` — green - [x] No new tests added: the rollback path mirrors the existing not-ready rollback (3789–3803), which also has no dedicated unit test. Adding test infrastructure here would exceed the minimum scope of this followup. - [x] No secrets, API keys, or credentials committed ## Notes for reviewer - Diff is ~35 lines on a single file. Surgical fix matching an existing pattern in the same function. - Doc comment on `ensureDashboardForward` updated to describe the new option and reference the pattern it mirrors. ## AI Disclosure - [x] AI-assisted — tool: Claude Code Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Sandbox initialization now attempts automatic rollback if dashboard port allocation fails, and exits safely when rollback succeeds. * When automatic cleanup fails, users are shown clear manual-cleanup instructions and the exact command to remove the orphaned sandbox. * **Tests** * Added runtime tests verifying rollback messaging covers both successful and failed cleanup scenarios and accurate sandbox naming in instructions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
…rd (NVIDIA#2709) ## Summary Adds two regression assertions to Phase 4 of `test-double-onboard.sh` that verify the NVIDIA#2174 auto-allocation fix works end-to-end: 1. **Log check** — Second-sandbox onboard output contains `"is taken. Using port"` (confirms `ensureDashboardForward` auto-allocated) 2. **List check** — `nemoclaw list` shows two distinct `dashboard: http://127.0.0.1:<port>` entries for both sandboxes ## Related - Guards against regression of NVIDIA#2174 (fix landed via NVIDIA#2411 + NVIDIA#2627) - Supersedes NVIDIA#2455 (which had a stale assertion string `"allocated port"` that doesn't match any runtime output, and conflicting workflow changes already on main) ## Changes - `test/e2e/test-double-onboard.sh` — 21 lines added between the existing NVIDIA#849 regression check and Phase 5 ## Why this supersedes NVIDIA#2455 | Issue | NVIDIA#2455 | This PR | |-------|-------|---------| | Assertion string | `"allocated port"` (doesn't exist in codebase) | `"is taken. Using port"` (matches `ensureDashboardForward` line ~6600) | | Workflow changes | Adds a `double-onboard-e2e` job that already exists on main → merge conflict | None needed — job already runs this script | | Branch freshness | 95 commits behind main | Based on current main | ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `bash -n test/e2e/test-double-onboard.sh` — syntax check passes - [x] shellcheck clean - [x] No workflow changes needed — `double-onboard-e2e` nightly job already runs this script - [x] No secrets, API keys, or credentials committed ## AI Disclosure - [x] AI-assisted — tool: Claude (pi agent) --- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added end-to-end test assertions for multi-sandbox onboarding scenarios * Validates dashboard port auto-allocation prevents conflicts and maintains isolation across concurrent sandbox environments <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Summary
Fixes #2174. When onboarding a second sandbox while the first holds port 18789, the dashboard forward now auto-allocates the next free port instead of crashing.
Related Issue
Fixes #2174
Changes
--control-ui-port <N>CLI flag: Explicit port override (takes precedence overCHAT_UI_URLenv var and defaults)dashboardPortstored insandboxes.jsonso it survives across sessions and--resumeflowsnemoclaw list: Showsdashboard: http://127.0.0.1:{port}/per sandboxnemoclaw status: Shows:portsuffix per sandbox line--control-ui-portwith a port outside the rangelocal-inferencepreset test to match current YAML (node binary was added)Type of Change
Verification
npx prek run --all-filespassesnpm testpasses (CLI project: 1965/1965)AI Disclosure
Signed-off-by: Aaron Erickson aerickson@nvidia.com
Summary by CodeRabbit
Release Notes
New Features
--control-ui-portCLI flag to specify the dashboard port during sandbox onboardingnemoclaw statusandnemoclaw listcommandsImprovements