fix(onboard): auto-allocate dashboard port on multi-sandbox conflict by cjagwani · Pull Request #2444 · NVIDIA/NemoClaw

cjagwani · 2026-04-24T18:38:22Z

Summary

Second nemoclaw onboard previously crashed mid-flight when port 18789 was held by the first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox alive while onboarding a short-lived second one. This PR auto-allocates the next free port on loopback conflict, persists it, surfaces it via list/status, and adds a --control-ui-port flag for pinning. Also closes the "leaks ghost sandbox" path by rolling back the openshell sandbox if forward setup or registry write throws.

Related Issue

Fixes #2174

Changes

src/lib/ports.ts — findFreeDashboardPort picker with 10-port window and injectable probe seam. Clamped to 1024-65535 with boundary guard. 6 unit tests.
src/lib/onboard.ts — ensureDashboardForward auto-allocates on loopback conflict and returns the chosen port; throws only on user-pinned CHAT_UI_URL or window exhaustion. Rollback try/catch around forward + registry deletes the openshell sandbox on failure so nemoclaw list and openshell sandbox list no longer drift. Picker uses a real lsof bind probe. Reuse paths seed chatUiUrl from the sandbox's stored dashboardPort via reuseChatUiUrlFor so re-onboarding an auto-allocated sandbox doesn't ratchet its port upward.
src/lib/registry.ts — dashboardPort?: number on SandboxEntry, persisted in the stored entry by registerSandbox.
src/lib/inventory-commands.ts — dashboard URL in nemoclaw list and nemoclaw status. 2 unit tests.
src/nemoclaw.ts — dashboard URL in nemoclaw <name> status.
src/lib/onboard-command.ts — --control-ui-port <n> flag (precedence: flag > CHAT_UI_URL env > stored port > auto-alloc > default). 2 unit tests.
src/lib/onboard.ts (dashboard display) — getDashboardAccessInfo and getDashboardGuidanceLines read the stored dashboardPort as a precedence tier so the onboard completion banner shows the actually-allocated port, not the default.
test/e2e/test-double-onboard.sh — three new assertions in the existing two-sandbox phase: second onboard must log "allocated port", and nemoclaw list must show two distinct dashboard URLs.
.github/workflows/nightly-e2e.yaml — new double-onboard-e2e job wires the script into nightly so regressions land within 24h.

Verification

Unit suite: 10 new tests, 1829 total passing locally (picker × 6, list/status display × 2, CLI flag × 2).

E2E: test/e2e/test-double-onboard.sh now asserts the #2174 auto-alloc path end-to-end. Runs nightly on an Ubuntu runner against a fake OpenAI endpoint — no NVIDIA_API_KEY needed. Regressions in the two-sandbox dance will auto-file a GitHub issue via the existing notify-on-failure job.

Manual repro: two back-to-back nemoclaw onboard --non-interactive — first takes 18789, second logs Dashboard port 18789 in use by 'alpha'; allocated port 18790 for 'beta', nemoclaw <name> status prints each port, both dashboards reachable.

Type of Change

Code change (feature, bug fix, or refactor)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed

Notes for reviewer

Net diff is the [All Platform][Onboard][Regression] Second onboard crashes with unhandled exception and leaks ghost sandbox when dashboard port 18789 is already forwarded #2174 changes. Branch history includes three commits (timeout bump + revert) from an investigation into an unrelated local-machine flake that turned out to be a stuck OneDrive sync on my Mac, not a codebase bug. The flake commits cancel out.
The picker's default 10-port window (18789–18798) matches the issue's "18790 or higher" hint. Exhaustion throws with an actionable error listing which sandbox holds each port.
ensureDashboardForward throws rather than process.exit(1) (unlike refactor(cli): extract dashboard delivery chain into contract/health/recover modules #2398's pattern) because the ghost-rollback try/catch in createSandbox needs the exception to know it should clean up.
Picker uses lsof for the bind probe so auto-alloc avoids ports bound by non-openshell processes (Docker, etc.). Falls back gracefully if lsof is missing.

AI Disclosure

AI-assisted — tool: Claude Code

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

Root cause: install.sh sources scripts/install.sh (1374-line payload), which runs resolve_installer_version() → resolve_repo_root() → git describe at source time. In isolation the test takes ~0.9s. Under vitest's default parallelism on macOS (up to 14 workers), fork()/execve() serializes and each subprocess spawn balloons 50-200x, pushing total runtime to 5-7s and blowing past the default 5000ms testTimeout. The default timeout is correct for every other test in the file; this one has a legitimately different workload (real bash + real subprocess chain). Bumping its per-test timeout is the minimum-scope fix; caching the payload's init subshells in scripts/install.sh is the general improvement and is tracked for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

Second onboard previously crashed mid-flight when port 18789 was held by the first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox alive while onboarding a short-lived second one. The guard from v0.0.21 fixed silent forward stealing but left no recovery path — users had to set CHAT_UI_URL manually and the assigned port was not discoverable post-hoc. Changes: - src/lib/ports.ts: add findFreeDashboardPort picker with 10-port window and injectable probe seam for tests. - src/lib/onboard.ts: ensureDashboardForward now auto-allocates on loopback conflict and returns the chosen port; throws only on user-pinned CHAT_UI_URL or window exhaustion. Rollback try/catch around the forward + registry block deletes the openshell sandbox on failure so nemoclaw list and openshell sandbox list no longer drift. - src/lib/registry.ts: persist dashboardPort on SandboxEntry so the assigned port survives across CLI invocations. - src/lib/inventory-commands.ts: surface dashboard URL in nemoclaw list and nemoclaw status. - src/nemoclaw.ts: surface dashboard URL in nemoclaw <name> status. - src/lib/onboard-command.ts: add --control-ui-port <n> flag with precedence flag > CHAT_UI_URL env > auto-alloc > default. Fixes #2174 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

coderabbitai · 2026-04-24T18:38:33Z

📝 Walkthrough

Walkthrough

Adds CLI flag and runtime logic to select, persist, and surface a local dashboard port during sandbox onboarding; implements port-finding/probing, updates registry and display commands, and adds tests and e2e coverage for allocated dashboard URLs. Rollback/cleanup is added on onboarding failure.

Changes

Cohort / File(s)	Summary
Port discovery `src/lib/ports.ts`, `src/lib/ports.test.ts`	New `PortProbe` abstraction and `findFreeDashboardPort(preferred, { probe })` which validates preferred port (1024–65535), scans a 10-port window, respects existing forwards and OS probes, and returns an available port or `null`. Tests added for selection logic and edge cases.
Onboard command CLI `src/lib/onboard-command.ts`, `src/lib/onboard-command.test.ts`	Adds `--control-ui-port <n>` parsing/validation (1024–65535). `OnboardCommandOptions` gains `controlUiPort: number
Onboard core & forwarding `src/lib/onboard.ts`	Refactors dashboard forwarding: `ensureDashboardForward` can choose/return an effective allocated port, avoids stealing user-pinned or non-loopback ports, consults OS/listeners, and returns the actual port. `createSandbox` now records `dashboardPort`, persists it via `registry.updateSandbox`, and performs rollback (stop forward, remove registry entry, delete sandbox) on post-creation failures.
Registry / model `src/lib/registry.ts`	`SandboxEntry` gains optional `dashboardPort?: number`. `registerSandbox`/persistence now validate and persist `dashboardPort` only when integer within 1024–65535.
Display & CLI output `src/lib/inventory-commands.ts`, `src/lib/inventory-commands.test.ts`, `src/nemoclaw.ts`	`listSandboxesCommand` and `showStatusCommand` print `dashboard: http://127.0.0.1:<port>` when sandbox has numeric `dashboardPort`. Tests added to assert dashboard URL rendering. `sandboxStatus` includes same output line.
E2E workflow & script `.github/workflows/nightly-e2e.yaml`, `test/e2e/test-double-onboard.sh`	New nightly E2E job `double-onboard-e2e` added; e2e script now asserts two distinct dashboard ports after double onboard and extracts ports from `nemoclaw list`. Failure artifact upload added.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "Onboard CLI"
    participant Onboard as "Onboard Core"
    participant Port as "findFreeDashboardPort"
    participant OS as "OS / probe / lsof"
    participant Shell as "OpenShell (forward)"
    participant Registry as "Registry"

    User->>CLI: nemoclaw onboard [--control-ui-port N]
    CLI->>CLI: parse & validate controlUiPort
    CLI->>Onboard: runOnboardCommand(controlUiPort)
    Onboard->>Port: findFreeDashboardPort(preferred)
    Port->>OS: list forwarded ports & probe availability
    OS-->>Port: existing forwards / probe results
    Port-->>Onboard: effectivePort (or null)
    Onboard->>Shell: openshell forward start effectivePort <sandbox>
    Shell-->>Onboard: forward active (allocated port)
    Onboard->>Registry: updateSandbox(name, { dashboardPort: effectivePort })
    Registry-->>Onboard: persisted
    Onboard-->>CLI: onboard complete
    CLI-->>User: prints dashboard: http://127.0.0.1:effectivePort

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through ports both near and far,
Found free loops where conflicts were a scar.
I saved the port and cleaned the mess,
Two sandboxes now each have their address.
Hooray — dashboard joys for every jar! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: automatic allocation of a dashboard port when multiple sandboxes conflict on the default port.
Linked Issues check	✅ Passed	The PR fully addresses all coding requirements from issue `#2174`: auto-allocates free dashboard ports on conflict [ports.ts], persists the chosen port in sandbox registry [registry.ts], surfaces URLs in list/status commands [inventory-commands.ts, nemoclaw.ts], adds --control-ui-port flag [onboard-command.ts], and implements rollback logic [onboard.ts].
Out of Scope Changes check	✅ Passed	All changes are directly related to issue `#2174`: port allocation, persistence, CLI flag, inventory output, rollback logic, and E2E test verification. No unrelated modifications detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/2174-dashboard-port-autoalloc

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

src/lib/registry.ts (1)

25-26: Keep dashboardPort round-trippable through registration.

SandboxEntry now supports dashboardPort, but registerSandbox(...) currently drops it. Persisting it there keeps the type/storage contract consistent and avoids silent loss if any caller starts setting it during initial registration.

Proposed patch

diff --git a/src/lib/registry.ts b/src/lib/registry.ts
@@
     data.sandboxes[entry.name] = {
       name: entry.name,
@@
       disabledChannels:
         Array.isArray(entry.disabledChannels) && entry.disabledChannels.length > 0
           ? [...entry.disabledChannels]
           : undefined,
+      dashboardPort:
+        Number.isInteger(entry.dashboardPort) &&
+        (entry.dashboardPort as number) >= 1024 &&
+        (entry.dashboardPort as number) <= 65535
+          ? entry.dashboardPort
+          : undefined,
     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/registry.ts` around lines 25 - 26, registerSandbox currently drops
the new SandboxEntry.dashboardPort field during registration; update the
registerSandbox implementation to persist dashboardPort when creating or merging
the stored SandboxEntry. Specifically, ensure the code paths in registerSandbox
that construct the saved entry (and any merge/update logic) copy
sandbox.dashboardPort into the stored object (preserving nullable/optional
semantics), and update any object spread/assign calls that assemble the entry so
they include dashboardPort alongside existing fields to keep the type/storage
contract round-trippable.

src/lib/ports.test.ts (1)

96-105: Add an upper-bound test for preferred port near 65535.

A boundary regression test here would protect against accidental allocation of out-of-range ports when the preferred port is at/near the max.

Suggested test case

diff --git a/src/lib/ports.test.ts b/src/lib/ports.test.ts
@@
   it("returns null when the 10-port window is exhausted", () => {
@@
     expect(port).toBeNull();
   });
+
+  it("does not allocate ports above 65535", () => {
+    const port = findFreeDashboardPort(65535, {
+      probe: {
+        listForwardPorts: () => [65535],
+        probePortFree: () => true,
+      },
+    });
+    expect(port).toBeNull();
+  });
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/ports.test.ts` around lines 96 - 105, Add a boundary test for
findFreeDashboardPort that uses a preferred port at/near 65535 to ensure the
function does not return or consider ports >65535; create a test similar to the
existing "10-port window exhausted" case but set preferred port to 65535 (or
65534) and mock probe.listForwardPorts and probe.probePortFree so all candidate
ports in the search window are "held" or unavailable, then assert the result is
null (or within valid port range) to verify no out-of-range port is returned by
findFreeDashboardPort.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3457-3458: The code calls ensureDashboardForward(sandboxName,
chatUiUrl) with chatUiUrl built from the default port instead of using the
sandbox’s recorded dashboard port; change the logic so the requested URL is
seeded from registry.getSandbox(sandboxName)?.dashboardPort (use that port to
build chatUiUrl) unless an explicit user override is provided, then pass that
URL into ensureDashboardForward and continue to call
registry.updateSandbox(sandboxName, { dashboardPort: reusePort }); apply the
same fix for the other occurrences that call ensureDashboardForward (the blocks
around where registry.updateSandbox is used).
- Around line 3958-3984: The allocated dashboard port from
ensureDashboardForward (allocatedDashboardPort) is registered in
registry.registerSandbox but not propagated into the live onboarding flow that
builds the completion banner URL from CHAT_UI_URL/CONTROL_UI_PORT; update
createSandbox (or the surrounding onboarding function) to return the chosen port
or set the in-memory dashboard URL/CONTROL_UI_PORT to allocatedDashboardPort
before the final banner is rendered so the printed URL matches the port saved in
registry.registerSandbox; touch the code paths that call createSandbox (and the
completion banner formatter) to consume the returned port or read the updated
in-memory variable so the completion output uses allocatedDashboardPort
consistently.
- Around line 6104-6107: The code is passing a stub probe (probePortFree: () =>
true) to findFreeDashboardPort which ignores non-OpenShell listeners and can
select ports already used by other local processes; update the call so
findFreeDashboardPort performs a real availability check by either removing the
custom probe argument (so it uses its normal port-availability probe) or replace
probePortFree with an actual local-bind probe implementation that attempts to
bind the candidate port and returns true only if bind succeeds; apply the same
fix to the other occurrence around the block referencing
heldPorts/forwardEntries at the later lines.

In `@src/lib/ports.ts`:
- Around line 70-80: The loop in findFreeDashboardPort can generate ports above
65535 when preferred is near the upper bound; guard and clamp candidates to the
valid port range by first validating preferred (return null if > 65535 or < 0),
compute a safe max offset like min(PORT_WINDOW, 65535 - preferred) and iterate
only up to that offset, and ensure you use the same probe helpers
(probe.listForwardPorts and probe.probePortFree) when checking held/probed
ports; this prevents returning out‑of‑range ports while preserving existing
behavior.

---

Nitpick comments:
In `@src/lib/ports.test.ts`:
- Around line 96-105: Add a boundary test for findFreeDashboardPort that uses a
preferred port at/near 65535 to ensure the function does not return or consider
ports >65535; create a test similar to the existing "10-port window exhausted"
case but set preferred port to 65535 (or 65534) and mock probe.listForwardPorts
and probe.probePortFree so all candidate ports in the search window are "held"
or unavailable, then assert the result is null (or within valid port range) to
verify no out-of-range port is returned by findFreeDashboardPort.

In `@src/lib/registry.ts`:
- Around line 25-26: registerSandbox currently drops the new
SandboxEntry.dashboardPort field during registration; update the registerSandbox
implementation to persist dashboardPort when creating or merging the stored
SandboxEntry. Specifically, ensure the code paths in registerSandbox that
construct the saved entry (and any merge/update logic) copy
sandbox.dashboardPort into the stored object (preserving nullable/optional
semantics), and update any object spread/assign calls that assemble the entry so
they include dashboardPort alongside existing fields to keep the type/storage
contract round-trippable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6c7f0cd6-d20b-4e98-a1d9-55f03842de20

📥 Commits

Reviewing files that changed from the base of the PR and between b804db0 and 49dbf1c.

📒 Files selected for processing (10)

src/lib/inventory-commands.test.ts
src/lib/inventory-commands.ts
src/lib/onboard-command.test.ts
src/lib/onboard-command.ts
src/lib/onboard.ts
src/lib/ports.test.ts
src/lib/ports.ts
src/lib/registry.ts
src/nemoclaw.ts
test/install-preflight.test.ts

This reverts commit 682320a. Investigation showed the flake only reproduces when the local machine is already CPU-saturated (a stuck OneDrive sync service was burning a full core on my Mac). Raising the test timeout papers over that rather than fixing anything. Keeping the flake fix out of #2174 so the PR is purely about dashboard port auto-allocation. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

Five fixes from CodeRabbit's review, all within #2174 scope: - src/lib/ports.ts: clamp findFreeDashboardPort to valid range — if preferred is out of 1024-65535 return null, and break the loop when candidates would exceed 65535. + 2 boundary tests. - src/lib/registry.ts: registerSandbox now persists dashboardPort in the stored entry (body was silently dropping the field). - src/lib/onboard.ts: reuse paths seed chatUiUrl from the sandbox's stored dashboardPort via new reuseChatUiUrlFor helper, so re-onboarding an auto-allocated sandbox no longer ratchets its port upward on each run. - src/lib/onboard.ts: picker now uses a real lsof-based bind probe instead of () => true, so auto-alloc won't hand out a port held by a non-openshell process (Docker container, etc.). - src/lib/onboard.ts: getDashboardAccessInfo and getDashboardGuidanceLines read the stored dashboardPort as a precedence tier so the completion banner shows the port ensureDashboardForward actually allocated (precedence: caller option > CHAT_UI_URL env > stored > default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

coderabbitai

♻️ Duplicate comments (2)

src/lib/onboard.ts (2)
3543-3544: ⚠️ Potential issue | 🟠 Major

Use reuseChatUiUrlFor() on this fallback reuse path too.

This is the one remaining reuse branch that still seeds ensureDashboardForward() from chatUiUrl, so a sandbox already recorded on an auto-allocated port can still ratchet upward here when the backup path aborts.
Suggested fix
-        const reusePort = ensureDashboardForward(sandboxName, chatUiUrl);
+        const reuseUrl = reuseChatUiUrlFor(sandboxName, chatUiUrl);
+        const reusePort = ensureDashboardForward(sandboxName, reuseUrl);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 3543 - 3544, The fallback path currently
calls ensureDashboardForward(sandboxName, chatUiUrl) and seeds it directly with
chatUiUrl which can cause an auto-allocated port to be overwritten; instead,
call reuseChatUiUrlFor(sandboxName, chatUiUrl) to compute the proper URL/port
reuse value and pass that result into ensureDashboardForward, then update the
registry (registry.updateSandbox) with the returned dashboardPort as
before—i.e., replace the direct chatUiUrl argument with the value from
reuseChatUiUrlFor for the ensureDashboardForward invocation.
6099-6134: ⚠️ Potential issue | 🟠 Major

Requested-port probing still misses local listeners.

The availability check only runs when another sandbox already owns requestedPort, and it relies on lsof alone. If --control-ui-port or a stored dashboardPort points at a non-OpenShell listener, or non-root lsof returns empty for a root-owned socket, forward start can fail at Line 6168 while this function still returns and persists the dead port. Probe the requested port before forward start, and use a real bind fallback instead of treating empty lsof output as authoritative. Based on learnings, "Empty lsof string output is NOT treated as authoritative — non-root lsof silently drops root-owned sockets ... Empty lsof output must fall through to the net probe."

Also applies to: 6156-6177
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6099 - 6134, The port-probing logic
currently only runs when another sandbox owns requestedPort and treats an empty
lsof result as authoritative; update the flow so the requestedPort is always
probed before calling `forward start` (i.e., call the same probe used by
`findFreeDashboardPort` for `requestedPort` up-front) and change
`isPortBoundLocally` (and the `probePortFree` you pass into
`findFreeDashboardPort`) to return a tri-state/unknown (e.g., null/undefined)
when `runCapture([... "lsof" ...])` yields an empty string so the caller falls
back to a real bind/net-probe instead of assuming free; ensure the
`probePortFree` implementation then attempts an actual bind probe when lsof is
inconclusive so `chosen` never points at a dead/non-bindable port.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 3543-3544: The fallback path currently calls
ensureDashboardForward(sandboxName, chatUiUrl) and seeds it directly with
chatUiUrl which can cause an auto-allocated port to be overwritten; instead,
call reuseChatUiUrlFor(sandboxName, chatUiUrl) to compute the proper URL/port
reuse value and pass that result into ensureDashboardForward, then update the
registry (registry.updateSandbox) with the returned dashboardPort as
before—i.e., replace the direct chatUiUrl argument with the value from
reuseChatUiUrlFor for the ensureDashboardForward invocation.
- Around line 6099-6134: The port-probing logic currently only runs when another
sandbox owns requestedPort and treats an empty lsof result as authoritative;
update the flow so the requestedPort is always probed before calling `forward
start` (i.e., call the same probe used by `findFreeDashboardPort` for
`requestedPort` up-front) and change `isPortBoundLocally` (and the
`probePortFree` you pass into `findFreeDashboardPort`) to return a
tri-state/unknown (e.g., null/undefined) when `runCapture([... "lsof" ...])`
yields an empty string so the caller falls back to a real bind/net-probe instead
of assuming free; ensure the `probePortFree` implementation then attempts an
actual bind probe when lsof is inconclusive so `chosen` never points at a
dead/non-bindable port.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f65396b5-5843-47e3-ab22-8a085ab5fe69

📥 Commits

Reviewing files that changed from the base of the PR and between 49dbf1c and c1a68a2.

📒 Files selected for processing (4)

src/lib/onboard.ts
src/lib/ports.test.ts
src/lib/ports.ts
src/lib/registry.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/registry.ts

Extends test/e2e/test-double-onboard.sh with three #2174 regression checks: - Second-sandbox onboard output must log "allocated port" (auto-alloc fired) - nemoclaw list must show two distinct dashboard ports for A and B The existing script already exercises the two-sandbox flow (Phase 4) but didn't assert anything port-specific — a silent regression of the auto-alloc path would have passed. Also wires the script into nightly-e2e.yaml as a new double-onboard-e2e job. Uses the script's fake OpenAI endpoint so no NVIDIA_API_KEY is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-double-onboard.sh`:
- Around line 365-367: The test currently suppresses failures from the command
"run_nemoclaw list" by appending "|| true" which masks real regressions; change
the block around LIST_LOG/run_nemoclaw/list_output so that you run "run_nemoclaw
list >\"$LIST_LOG\" 2>&1" without "|| true", capture its exit code (e.g.,
rc=$?), read list_output="$(cat "$LIST_LOG")", and if rc is non-zero print or
dump "$LIST_LOG" for diagnostics and exit with that non-zero code to fail the
test; keep the temporary log capture (LIST_LOG) and the list_output parsing but
do not swallow the command's non-zero exit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a1953a5-16df-44e0-96aa-05e8b0e36aae

📥 Commits

Reviewing files that changed from the base of the PR and between c1a68a2 and d877ec2.

📒 Files selected for processing (2)

.github/workflows/nightly-e2e.yaml
test/e2e/test-double-onboard.sh

coderabbitai · 2026-04-24T19:50:41Z

+LIST_LOG="$(mktemp)"
+run_nemoclaw list >"$LIST_LOG" 2>&1 || true
+list_output="$(cat "$LIST_LOG")"


⚠️ Potential issue | 🟠 Major

Do not suppress nemoclaw list failures here.

Ignoring the exit code with || true can hide a real list regression while this block still passes on partial output. Keep parsing for diagnostics, but fail on non-zero exit.

Suggested fix

LIST_LOG="$(mktemp)" -run_nemoclaw list >"$LIST_LOG" 2>&1 || true +if run_nemoclaw list >"$LIST_LOG" 2>&1; then + list_exit=0 +else + list_exit=$? +fi list_output="$(cat "$LIST_LOG")" rm -f "$LIST_LOG" + +if [ "$list_exit" -eq 0 ]; then + pass "nemoclaw list succeeded" +else + fail "nemoclaw list exited $list_exit" +fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-double-onboard.sh` around lines 365 - 367, The test currently suppresses failures from the command "run_nemoclaw list" by appending "|| true" which masks real regressions; change the block around LIST_LOG/run_nemoclaw/list_output so that you run "run_nemoclaw list >\"$LIST_LOG\" 2>&1" without "|| true", capture its exit code (e.g., rc=$?), read list_output="$(cat "$LIST_LOG")", and if rc is non-zero print or dump "$LIST_LOG" for diagnostics and exit with that non-zero code to fail the test; keep the temporary log capture (LIST_LOG) and the list_output parsing but do not swallow the command's non-zero exit.

…to nightly" This reverts commit d877ec2. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

cjagwani · 2026-04-24T20:57:49Z

Superseded by #2454 — same changes, clean single-commit history rebased on latest main (integrates with #2398's dashboard-contract architecture and #2369's live-inference display).

cjagwani and others added 2 commits April 24, 2026 11:36

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

Comment thread src/lib/onboard.ts

Comment thread src/lib/onboard.ts

Comment thread src/lib/ports.ts

cjagwani self-assigned this Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Revert "test(e2e): add dashboard port auto-alloc assertions and wire …

c2f0cdc

…to nightly" This reverts commit d877ec2. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

cjagwani mentioned this pull request Apr 24, 2026

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict #2454

Closed

6 tasks

cjagwani closed this Apr 24, 2026

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2444

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2444
cjagwani wants to merge 6 commits into
mainfrom
fix/2174-dashboard-port-autoalloc

cjagwani commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

cjagwani commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cjagwani commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Verification

Type of Change

Verification

Notes for reviewer

AI Disclosure

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

cjagwani commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjagwani commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading