Skip to content

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2444

Closed
cjagwani wants to merge 6 commits into
mainfrom
fix/2174-dashboard-port-autoalloc
Closed

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2444
cjagwani wants to merge 6 commits into
mainfrom
fix/2174-dashboard-port-autoalloc

Conversation

@cjagwani

@cjagwani cjagwani commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Second nemoclaw onboard previously crashed mid-flight when port 18789 was held by the first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox alive while onboarding a short-lived second one. This PR auto-allocates the next free port on loopback conflict, persists it, surfaces it via list/status, and adds a --control-ui-port flag for pinning. Also closes the "leaks ghost sandbox" path by rolling back the openshell sandbox if forward setup or registry write throws.

Related Issue

Fixes #2174

Changes

  • src/lib/ports.tsfindFreeDashboardPort picker with 10-port window and injectable probe seam. Clamped to 1024-65535 with boundary guard. 6 unit tests.
  • src/lib/onboard.tsensureDashboardForward auto-allocates on loopback conflict and returns the chosen port; throws only on user-pinned CHAT_UI_URL or window exhaustion. Rollback try/catch around forward + registry deletes the openshell sandbox on failure so nemoclaw list and openshell sandbox list no longer drift. Picker uses a real lsof bind probe. Reuse paths seed chatUiUrl from the sandbox's stored dashboardPort via reuseChatUiUrlFor so re-onboarding an auto-allocated sandbox doesn't ratchet its port upward.
  • src/lib/registry.tsdashboardPort?: number on SandboxEntry, persisted in the stored entry by registerSandbox.
  • src/lib/inventory-commands.ts — dashboard URL in nemoclaw list and nemoclaw status. 2 unit tests.
  • src/nemoclaw.ts — dashboard URL in nemoclaw <name> status.
  • src/lib/onboard-command.ts--control-ui-port <n> flag (precedence: flag > CHAT_UI_URL env > stored port > auto-alloc > default). 2 unit tests.
  • src/lib/onboard.ts (dashboard display) — getDashboardAccessInfo and getDashboardGuidanceLines read the stored dashboardPort as a precedence tier so the onboard completion banner shows the actually-allocated port, not the default.
  • test/e2e/test-double-onboard.sh — three new assertions in the existing two-sandbox phase: second onboard must log "allocated port", and nemoclaw list must show two distinct dashboard URLs.
  • .github/workflows/nightly-e2e.yaml — new double-onboard-e2e job wires the script into nightly so regressions land within 24h.

Verification

Unit suite: 10 new tests, 1829 total passing locally (picker × 6, list/status display × 2, CLI flag × 2).

E2E: test/e2e/test-double-onboard.sh now asserts the #2174 auto-alloc path end-to-end. Runs nightly on an Ubuntu runner against a fake OpenAI endpoint — no NVIDIA_API_KEY needed. Regressions in the two-sandbox dance will auto-file a GitHub issue via the existing notify-on-failure job.

Manual repro: two back-to-back nemoclaw onboard --non-interactive — first takes 18789, second logs Dashboard port 18789 in use by 'alpha'; allocated port 18790 for 'beta', nemoclaw <name> status prints each port, both dashboards reachable.

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed

Notes for reviewer

AI Disclosure

  • AI-assisted — tool: Claude Code

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

cjagwani and others added 2 commits April 24, 2026 11:36
Root cause: install.sh sources scripts/install.sh (1374-line payload),
which runs resolve_installer_version() → resolve_repo_root() → git describe
at source time. In isolation the test takes ~0.9s. Under vitest's default
parallelism on macOS (up to 14 workers), fork()/execve() serializes and
each subprocess spawn balloons 50-200x, pushing total runtime to 5-7s and
blowing past the default 5000ms testTimeout.

The default timeout is correct for every other test in the file; this
one has a legitimately different workload (real bash + real subprocess
chain). Bumping its per-test timeout is the minimum-scope fix; caching
the payload's init subshells in scripts/install.sh is the general
improvement and is tracked for follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Second onboard previously crashed mid-flight when port 18789 was held by
the first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox
alive while onboarding a short-lived second one. The guard from v0.0.21
fixed silent forward stealing but left no recovery path — users had to set
CHAT_UI_URL manually and the assigned port was not discoverable post-hoc.

Changes:
- src/lib/ports.ts: add findFreeDashboardPort picker with 10-port window
  and injectable probe seam for tests.
- src/lib/onboard.ts: ensureDashboardForward now auto-allocates on loopback
  conflict and returns the chosen port; throws only on user-pinned
  CHAT_UI_URL or window exhaustion. Rollback try/catch around the
  forward + registry block deletes the openshell sandbox on failure so
  nemoclaw list and openshell sandbox list no longer drift.
- src/lib/registry.ts: persist dashboardPort on SandboxEntry so the
  assigned port survives across CLI invocations.
- src/lib/inventory-commands.ts: surface dashboard URL in nemoclaw list
  and nemoclaw status.
- src/nemoclaw.ts: surface dashboard URL in nemoclaw <name> status.
- src/lib/onboard-command.ts: add --control-ui-port <n> flag with
  precedence flag > CHAT_UI_URL env > auto-alloc > default.

Fixes #2174

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds CLI flag and runtime logic to select, persist, and surface a local dashboard port during sandbox onboarding; implements port-finding/probing, updates registry and display commands, and adds tests and e2e coverage for allocated dashboard URLs. Rollback/cleanup is added on onboarding failure.

Changes

Cohort / File(s) Summary
Port discovery
src/lib/ports.ts, src/lib/ports.test.ts
New PortProbe abstraction and findFreeDashboardPort(preferred, { probe }) which validates preferred port (1024–65535), scans a 10-port window, respects existing forwards and OS probes, and returns an available port or null. Tests added for selection logic and edge cases.
Onboard command CLI
src/lib/onboard-command.ts, src/lib/onboard-command.test.ts
Adds --control-ui-port <n> parsing/validation (1024–65535). OnboardCommandOptions gains `controlUiPort: number
Onboard core & forwarding
src/lib/onboard.ts
Refactors dashboard forwarding: ensureDashboardForward can choose/return an effective allocated port, avoids stealing user-pinned or non-loopback ports, consults OS/listeners, and returns the actual port. createSandbox now records dashboardPort, persists it via registry.updateSandbox, and performs rollback (stop forward, remove registry entry, delete sandbox) on post-creation failures.
Registry / model
src/lib/registry.ts
SandboxEntry gains optional dashboardPort?: number. registerSandbox/persistence now validate and persist dashboardPort only when integer within 1024–65535.
Display & CLI output
src/lib/inventory-commands.ts, src/lib/inventory-commands.test.ts, src/nemoclaw.ts
listSandboxesCommand and showStatusCommand print dashboard: http://127.0.0.1:<port> when sandbox has numeric dashboardPort. Tests added to assert dashboard URL rendering. sandboxStatus includes same output line.
E2E workflow & script
.github/workflows/nightly-e2e.yaml, test/e2e/test-double-onboard.sh
New nightly E2E job double-onboard-e2e added; e2e script now asserts two distinct dashboard ports after double onboard and extracts ports from nemoclaw list. Failure artifact upload added.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "Onboard CLI"
    participant Onboard as "Onboard Core"
    participant Port as "findFreeDashboardPort"
    participant OS as "OS / probe / lsof"
    participant Shell as "OpenShell (forward)"
    participant Registry as "Registry"

    User->>CLI: nemoclaw onboard [--control-ui-port N]
    CLI->>CLI: parse & validate controlUiPort
    CLI->>Onboard: runOnboardCommand(controlUiPort)
    Onboard->>Port: findFreeDashboardPort(preferred)
    Port->>OS: list forwarded ports & probe availability
    OS-->>Port: existing forwards / probe results
    Port-->>Onboard: effectivePort (or null)
    Onboard->>Shell: openshell forward start effectivePort <sandbox>
    Shell-->>Onboard: forward active (allocated port)
    Onboard->>Registry: updateSandbox(name, { dashboardPort: effectivePort })
    Registry-->>Onboard: persisted
    Onboard-->>CLI: onboard complete
    CLI-->>User: prints dashboard: http://127.0.0.1:effectivePort
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through ports both near and far,
Found free loops where conflicts were a scar.
I saved the port and cleaned the mess,
Two sandboxes now each have their address.
Hooray — dashboard joys for every jar! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: automatic allocation of a dashboard port when multiple sandboxes conflict on the default port.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #2174: auto-allocates free dashboard ports on conflict [ports.ts], persists the chosen port in sandbox registry [registry.ts], surfaces URLs in list/status commands [inventory-commands.ts, nemoclaw.ts], adds --control-ui-port flag [onboard-command.ts], and implements rollback logic [onboard.ts].
Out of Scope Changes check ✅ Passed All changes are directly related to issue #2174: port allocation, persistence, CLI flag, inventory output, rollback logic, and E2E test verification. No unrelated modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2174-dashboard-port-autoalloc

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
src/lib/registry.ts (1)

25-26: Keep dashboardPort round-trippable through registration.

SandboxEntry now supports dashboardPort, but registerSandbox(...) currently drops it. Persisting it there keeps the type/storage contract consistent and avoids silent loss if any caller starts setting it during initial registration.

Proposed patch
diff --git a/src/lib/registry.ts b/src/lib/registry.ts
@@
     data.sandboxes[entry.name] = {
       name: entry.name,
@@
       disabledChannels:
         Array.isArray(entry.disabledChannels) && entry.disabledChannels.length > 0
           ? [...entry.disabledChannels]
           : undefined,
+      dashboardPort:
+        Number.isInteger(entry.dashboardPort) &&
+        (entry.dashboardPort as number) >= 1024 &&
+        (entry.dashboardPort as number) <= 65535
+          ? entry.dashboardPort
+          : undefined,
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/registry.ts` around lines 25 - 26, registerSandbox currently drops
the new SandboxEntry.dashboardPort field during registration; update the
registerSandbox implementation to persist dashboardPort when creating or merging
the stored SandboxEntry. Specifically, ensure the code paths in registerSandbox
that construct the saved entry (and any merge/update logic) copy
sandbox.dashboardPort into the stored object (preserving nullable/optional
semantics), and update any object spread/assign calls that assemble the entry so
they include dashboardPort alongside existing fields to keep the type/storage
contract round-trippable.
src/lib/ports.test.ts (1)

96-105: Add an upper-bound test for preferred port near 65535.

A boundary regression test here would protect against accidental allocation of out-of-range ports when the preferred port is at/near the max.

Suggested test case
diff --git a/src/lib/ports.test.ts b/src/lib/ports.test.ts
@@
   it("returns null when the 10-port window is exhausted", () => {
@@
     expect(port).toBeNull();
   });
+
+  it("does not allocate ports above 65535", () => {
+    const port = findFreeDashboardPort(65535, {
+      probe: {
+        listForwardPorts: () => [65535],
+        probePortFree: () => true,
+      },
+    });
+    expect(port).toBeNull();
+  });
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/ports.test.ts` around lines 96 - 105, Add a boundary test for
findFreeDashboardPort that uses a preferred port at/near 65535 to ensure the
function does not return or consider ports >65535; create a test similar to the
existing "10-port window exhausted" case but set preferred port to 65535 (or
65534) and mock probe.listForwardPorts and probe.probePortFree so all candidate
ports in the search window are "held" or unavailable, then assert the result is
null (or within valid port range) to verify no out-of-range port is returned by
findFreeDashboardPort.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3457-3458: The code calls ensureDashboardForward(sandboxName,
chatUiUrl) with chatUiUrl built from the default port instead of using the
sandbox’s recorded dashboard port; change the logic so the requested URL is
seeded from registry.getSandbox(sandboxName)?.dashboardPort (use that port to
build chatUiUrl) unless an explicit user override is provided, then pass that
URL into ensureDashboardForward and continue to call
registry.updateSandbox(sandboxName, { dashboardPort: reusePort }); apply the
same fix for the other occurrences that call ensureDashboardForward (the blocks
around where registry.updateSandbox is used).
- Around line 3958-3984: The allocated dashboard port from
ensureDashboardForward (allocatedDashboardPort) is registered in
registry.registerSandbox but not propagated into the live onboarding flow that
builds the completion banner URL from CHAT_UI_URL/CONTROL_UI_PORT; update
createSandbox (or the surrounding onboarding function) to return the chosen port
or set the in-memory dashboard URL/CONTROL_UI_PORT to allocatedDashboardPort
before the final banner is rendered so the printed URL matches the port saved in
registry.registerSandbox; touch the code paths that call createSandbox (and the
completion banner formatter) to consume the returned port or read the updated
in-memory variable so the completion output uses allocatedDashboardPort
consistently.
- Around line 6104-6107: The code is passing a stub probe (probePortFree: () =>
true) to findFreeDashboardPort which ignores non-OpenShell listeners and can
select ports already used by other local processes; update the call so
findFreeDashboardPort performs a real availability check by either removing the
custom probe argument (so it uses its normal port-availability probe) or replace
probePortFree with an actual local-bind probe implementation that attempts to
bind the candidate port and returns true only if bind succeeds; apply the same
fix to the other occurrence around the block referencing
heldPorts/forwardEntries at the later lines.

In `@src/lib/ports.ts`:
- Around line 70-80: The loop in findFreeDashboardPort can generate ports above
65535 when preferred is near the upper bound; guard and clamp candidates to the
valid port range by first validating preferred (return null if > 65535 or < 0),
compute a safe max offset like min(PORT_WINDOW, 65535 - preferred) and iterate
only up to that offset, and ensure you use the same probe helpers
(probe.listForwardPorts and probe.probePortFree) when checking held/probed
ports; this prevents returning out‑of‑range ports while preserving existing
behavior.

---

Nitpick comments:
In `@src/lib/ports.test.ts`:
- Around line 96-105: Add a boundary test for findFreeDashboardPort that uses a
preferred port at/near 65535 to ensure the function does not return or consider
ports >65535; create a test similar to the existing "10-port window exhausted"
case but set preferred port to 65535 (or 65534) and mock probe.listForwardPorts
and probe.probePortFree so all candidate ports in the search window are "held"
or unavailable, then assert the result is null (or within valid port range) to
verify no out-of-range port is returned by findFreeDashboardPort.

In `@src/lib/registry.ts`:
- Around line 25-26: registerSandbox currently drops the new
SandboxEntry.dashboardPort field during registration; update the registerSandbox
implementation to persist dashboardPort when creating or merging the stored
SandboxEntry. Specifically, ensure the code paths in registerSandbox that
construct the saved entry (and any merge/update logic) copy
sandbox.dashboardPort into the stored object (preserving nullable/optional
semantics), and update any object spread/assign calls that assemble the entry so
they include dashboardPort alongside existing fields to keep the type/storage
contract round-trippable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6c7f0cd6-d20b-4e98-a1d9-55f03842de20

📥 Commits

Reviewing files that changed from the base of the PR and between b804db0 and 49dbf1c.

📒 Files selected for processing (10)
  • src/lib/inventory-commands.test.ts
  • src/lib/inventory-commands.ts
  • src/lib/onboard-command.test.ts
  • src/lib/onboard-command.ts
  • src/lib/onboard.ts
  • src/lib/ports.test.ts
  • src/lib/ports.ts
  • src/lib/registry.ts
  • src/nemoclaw.ts
  • test/install-preflight.test.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts
Comment thread src/lib/onboard.ts
Comment thread src/lib/ports.ts
This reverts commit 682320a.

Investigation showed the flake only reproduces when the local machine
is already CPU-saturated (a stuck OneDrive sync service was burning a
full core on my Mac). Raising the test timeout papers over that
rather than fixing anything. Keeping the flake fix out of #2174 so the
PR is purely about dashboard port auto-allocation.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cjagwani cjagwani self-assigned this Apr 24, 2026
Five fixes from CodeRabbit's review, all within #2174 scope:

- src/lib/ports.ts: clamp findFreeDashboardPort to valid range — if
  preferred is out of 1024-65535 return null, and break the loop when
  candidates would exceed 65535. + 2 boundary tests.
- src/lib/registry.ts: registerSandbox now persists dashboardPort in
  the stored entry (body was silently dropping the field).
- src/lib/onboard.ts: reuse paths seed chatUiUrl from the sandbox's
  stored dashboardPort via new reuseChatUiUrlFor helper, so
  re-onboarding an auto-allocated sandbox no longer ratchets its port
  upward on each run.
- src/lib/onboard.ts: picker now uses a real lsof-based bind probe
  instead of () => true, so auto-alloc won't hand out a port held by
  a non-openshell process (Docker container, etc.).
- src/lib/onboard.ts: getDashboardAccessInfo and getDashboardGuidanceLines
  read the stored dashboardPort as a precedence tier so the completion
  banner shows the port ensureDashboardForward actually allocated
  (precedence: caller option > CHAT_UI_URL env > stored > default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/lib/onboard.ts (2)

3543-3544: ⚠️ Potential issue | 🟠 Major

Use reuseChatUiUrlFor() on this fallback reuse path too.

This is the one remaining reuse branch that still seeds ensureDashboardForward() from chatUiUrl, so a sandbox already recorded on an auto-allocated port can still ratchet upward here when the backup path aborts.

Suggested fix
-        const reusePort = ensureDashboardForward(sandboxName, chatUiUrl);
+        const reuseUrl = reuseChatUiUrlFor(sandboxName, chatUiUrl);
+        const reusePort = ensureDashboardForward(sandboxName, reuseUrl);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 3543 - 3544, The fallback path currently
calls ensureDashboardForward(sandboxName, chatUiUrl) and seeds it directly with
chatUiUrl which can cause an auto-allocated port to be overwritten; instead,
call reuseChatUiUrlFor(sandboxName, chatUiUrl) to compute the proper URL/port
reuse value and pass that result into ensureDashboardForward, then update the
registry (registry.updateSandbox) with the returned dashboardPort as
before—i.e., replace the direct chatUiUrl argument with the value from
reuseChatUiUrlFor for the ensureDashboardForward invocation.

6099-6134: ⚠️ Potential issue | 🟠 Major

Requested-port probing still misses local listeners.

The availability check only runs when another sandbox already owns requestedPort, and it relies on lsof alone. If --control-ui-port or a stored dashboardPort points at a non-OpenShell listener, or non-root lsof returns empty for a root-owned socket, forward start can fail at Line 6168 while this function still returns and persists the dead port. Probe the requested port before forward start, and use a real bind fallback instead of treating empty lsof output as authoritative. Based on learnings, "Empty lsof string output is NOT treated as authoritative — non-root lsof silently drops root-owned sockets ... Empty lsof output must fall through to the net probe."

Also applies to: 6156-6177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6099 - 6134, The port-probing logic
currently only runs when another sandbox owns requestedPort and treats an empty
lsof result as authoritative; update the flow so the requestedPort is always
probed before calling `forward start` (i.e., call the same probe used by
`findFreeDashboardPort` for `requestedPort` up-front) and change
`isPortBoundLocally` (and the `probePortFree` you pass into
`findFreeDashboardPort`) to return a tri-state/unknown (e.g., null/undefined)
when `runCapture([... "lsof" ...])` yields an empty string so the caller falls
back to a real bind/net-probe instead of assuming free; ensure the
`probePortFree` implementation then attempts an actual bind probe when lsof is
inconclusive so `chosen` never points at a dead/non-bindable port.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 3543-3544: The fallback path currently calls
ensureDashboardForward(sandboxName, chatUiUrl) and seeds it directly with
chatUiUrl which can cause an auto-allocated port to be overwritten; instead,
call reuseChatUiUrlFor(sandboxName, chatUiUrl) to compute the proper URL/port
reuse value and pass that result into ensureDashboardForward, then update the
registry (registry.updateSandbox) with the returned dashboardPort as
before—i.e., replace the direct chatUiUrl argument with the value from
reuseChatUiUrlFor for the ensureDashboardForward invocation.
- Around line 6099-6134: The port-probing logic currently only runs when another
sandbox owns requestedPort and treats an empty lsof result as authoritative;
update the flow so the requestedPort is always probed before calling `forward
start` (i.e., call the same probe used by `findFreeDashboardPort` for
`requestedPort` up-front) and change `isPortBoundLocally` (and the
`probePortFree` you pass into `findFreeDashboardPort`) to return a
tri-state/unknown (e.g., null/undefined) when `runCapture([... "lsof" ...])`
yields an empty string so the caller falls back to a real bind/net-probe instead
of assuming free; ensure the `probePortFree` implementation then attempts an
actual bind probe when lsof is inconclusive so `chosen` never points at a
dead/non-bindable port.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f65396b5-5843-47e3-ab22-8a085ab5fe69

📥 Commits

Reviewing files that changed from the base of the PR and between 49dbf1c and c1a68a2.

📒 Files selected for processing (4)
  • src/lib/onboard.ts
  • src/lib/ports.test.ts
  • src/lib/ports.ts
  • src/lib/registry.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/registry.ts

Extends test/e2e/test-double-onboard.sh with three #2174 regression checks:
- Second-sandbox onboard output must log "allocated port" (auto-alloc fired)
- nemoclaw list must show two distinct dashboard ports for A and B

The existing script already exercises the two-sandbox flow (Phase 4) but
didn't assert anything port-specific — a silent regression of the auto-alloc
path would have passed. Also wires the script into nightly-e2e.yaml as a
new double-onboard-e2e job. Uses the script's fake OpenAI endpoint so no
NVIDIA_API_KEY is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-double-onboard.sh`:
- Around line 365-367: The test currently suppresses failures from the command
"run_nemoclaw list" by appending "|| true" which masks real regressions; change
the block around LIST_LOG/run_nemoclaw/list_output so that you run "run_nemoclaw
list >\"$LIST_LOG\" 2>&1" without "|| true", capture its exit code (e.g.,
rc=$?), read list_output="$(cat "$LIST_LOG")", and if rc is non-zero print or
dump "$LIST_LOG" for diagnostics and exit with that non-zero code to fail the
test; keep the temporary log capture (LIST_LOG) and the list_output parsing but
do not swallow the command's non-zero exit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a1953a5-16df-44e0-96aa-05e8b0e36aae

📥 Commits

Reviewing files that changed from the base of the PR and between c1a68a2 and d877ec2.

📒 Files selected for processing (2)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-double-onboard.sh

Comment thread test/e2e/test-double-onboard.sh Outdated
Comment on lines +365 to +367
LIST_LOG="$(mktemp)"
run_nemoclaw list >"$LIST_LOG" 2>&1 || true
list_output="$(cat "$LIST_LOG")"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not suppress nemoclaw list failures here.

Ignoring the exit code with || true can hide a real list regression while this block still passes on partial output. Keep parsing for diagnostics, but fail on non-zero exit.

Suggested fix
 LIST_LOG="$(mktemp)"
-run_nemoclaw list >"$LIST_LOG" 2>&1 || true
+if run_nemoclaw list >"$LIST_LOG" 2>&1; then
+  list_exit=0
+else
+  list_exit=$?
+fi
 list_output="$(cat "$LIST_LOG")"
 rm -f "$LIST_LOG"
+
+if [ "$list_exit" -eq 0 ]; then
+  pass "nemoclaw list succeeded"
+else
+  fail "nemoclaw list exited $list_exit"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-double-onboard.sh` around lines 365 - 367, The test currently
suppresses failures from the command "run_nemoclaw list" by appending "|| true"
which masks real regressions; change the block around
LIST_LOG/run_nemoclaw/list_output so that you run "run_nemoclaw list
>\"$LIST_LOG\" 2>&1" without "|| true", capture its exit code (e.g., rc=$?),
read list_output="$(cat "$LIST_LOG")", and if rc is non-zero print or dump
"$LIST_LOG" for diagnostics and exit with that non-zero code to fail the test;
keep the temporary log capture (LIST_LOG) and the list_output parsing but do not
swallow the command's non-zero exit.

…to nightly"

This reverts commit d877ec2.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cjagwani

Copy link
Copy Markdown
Contributor Author

Superseded by #2454 — same changes, clean single-commit history rebased on latest main (integrates with #2398's dashboard-contract architecture and #2369's live-inference display).

@cjagwani cjagwani closed this Apr 24, 2026
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

2 participants