Skip to content

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2454

Closed
cjagwani wants to merge 16 commits into
mainfrom
fix/2174-dashboard-port-autoalloc-v2
Closed

fix(onboard): auto-allocate dashboard port on multi-sandbox conflict#2454
cjagwani wants to merge 16 commits into
mainfrom
fix/2174-dashboard-port-autoalloc-v2

Conversation

@cjagwani

@cjagwani cjagwani commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Second nemoclaw onboard previously crashed mid-flight when port 18789 was held by the first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox alive while onboarding a short-lived second one. This PR auto-allocates the next free port on loopback conflict, persists it, surfaces it via list / status, and adds a --control-ui-port flag for pinning. Also closes the "leaks ghost sandbox" path from the issue title — unrecoverable forward-setup failures now roll back the openshell sandbox before exiting.

Supersedes #2444 (clean history, same changes).

Related Issue

Fixes #2174

Changes

  • src/lib/ports.tsfindFreeDashboardPort picker with 10-port window, injectable probe seam, and 1024-65535 range clamp. 6 unit tests.
  • src/lib/onboard.tsensureDashboardForward auto-allocates on loopback conflict (real lsof bind probe to avoid ports held by non-openshell processes) and returns the chosen port. On user-pinned CHAT_UI_URL or exhaustion it rolls back the openshell sandbox before exiting. Reuse paths seed chatUiUrl from the sandbox's stored dashboardPort via new reuseChatUiUrlFor helper so re-onboarding an auto-allocated sandbox doesn't ratchet its port upward on each run.
  • src/lib/registry.tsdashboardPort?: number on SandboxEntry, persisted by registerSandbox.
  • src/lib/inventory-commands.ts — dashboard URL in nemoclaw list and nemoclaw status. 2 unit tests.
  • src/nemoclaw.ts — dashboard URL in nemoclaw <name> status.
  • src/lib/onboard-command.ts--control-ui-port <n> flag (precedence: flag > CHAT_UI_URL env > stored port > default). 2 unit tests.
  • src/lib/onboard.ts (banner) — getDashboardAccessInfo and getDashboardGuidanceLines read stored dashboardPort as a precedence tier so the completion banner shows the port actually allocated.

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • npx prek run --all-files passes (pre-commit hooks green — prettier, ESLint, shellcheck, hadolint, gitleaks, markdownlint, Test (plugin), YAML)
  • npm test passes (10 new unit tests; 1829+ total passing locally)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed

Manual repro for reviewer: two back-to-back nemoclaw onboard --non-interactive — first takes 18789, second logs Dashboard port 18789 in use by 'alpha'; allocated port 18790 for 'beta', nemoclaw <name> status prints each port, both dashboards reachable.

Notes for reviewer

AI Disclosure

  • AI-assisted — tool: Claude Code

Signed-off-by: Charan Jagwani cjagwani@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added a CLI flag to set a custom control UI port.
    • Sandboxes now allocate and persist dashboard ports; local dashboard URLs are shown in inventory and status when available.
  • Tests

    • Added coverage for dashboard URL emission and port-allocation behavior.
  • Chores

    • End-to-end install script now captures diagnostics on success and exits with a non-zero code.

Second nemoclaw onboard crashed mid-flight when port 18789 was held by the
first sandbox, blocking 12 P0/P1 QA tests that keep a baseline sandbox
alive while onboarding a short-lived second one. The v0.0.21 guard stopped
silent forward-stealing but left no recovery path — users had to set
CHAT_UI_URL manually and the assigned port was not discoverable post-hoc.

Changes:
- src/lib/ports.ts: findFreeDashboardPort picker with 10-port window,
  injectable probe seam, and 1024-65535 range clamp. 6 unit tests.
- src/lib/onboard.ts: ensureDashboardForward auto-allocates on loopback
  conflict (with a real lsof bind probe to avoid ports bound by other
  processes) and returns the chosen port. On user-pinned CHAT_UI_URL or
  window exhaustion it rolls back the openshell sandbox before exiting,
  closing the "leaks ghost sandbox" path from the issue title. Reuse
  paths seed chatUiUrl from the sandbox's stored dashboardPort via new
  reuseChatUiUrlFor helper so re-onboarding an auto-allocated sandbox
  doesn't ratchet its port upward.
- src/lib/registry.ts: dashboardPort?: number on SandboxEntry, persisted
  by registerSandbox.
- src/lib/inventory-commands.ts: dashboard URL in nemoclaw list and
  nemoclaw status. 2 unit tests.
- src/nemoclaw.ts: dashboard URL in nemoclaw <name> status.
- src/lib/onboard-command.ts: --control-ui-port <n> flag (precedence:
  flag > CHAT_UI_URL env > stored port > default). 2 unit tests.
- src/lib/onboard.ts (banner): getDashboardAccessInfo and
  getDashboardGuidanceLines read stored dashboardPort as a precedence
  tier so the completion banner shows the port actually allocated.

Fixes #2174

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional dashboard port allocation and propagation: a new control-ui-port CLI flag, automatic free-port discovery on conflicts, persistence of allocated ports to the sandbox registry, and emission of per-sandbox local dashboard URLs in list/status outputs. Tests for ports, onboarding parsing, and inventory output were added.

Changes

Cohort / File(s) Summary
Port allocation core
src/lib/ports.ts, src/lib/ports.test.ts
Adds PortProbe and findFreeDashboardPort(preferred, {probe?}) which validates preferred port (1024–65535) and scans up to 10 consecutive ports, skipping forwarded/unavailable ports; tests cover probing, bounds, and exhaustion.
Onboard core & forwarding
src/lib/onboard.ts
ensureDashboardForward now returns the chosen port, accepts rollback/cleanup options, probes for an alternate free port on conflict, persists the allocated dashboardPort to registry, and uses reuse/rewind flows to avoid port escalation.
Registry persistence
src/lib/registry.ts
Adds optional dashboardPort?: number to SandboxEntry; registerSandbox validates and persists only numeric ports within 1024–65535.
CLI flag & command parsing
src/lib/onboard-command.ts, src/lib/onboard-command.test.ts
Adds --control-ui-port <n> to OnboardCommandOptions (`number
Inventory / status output
src/lib/inventory-commands.ts, src/lib/inventory-commands.test.ts, src/nemoclaw.ts
Adds optional dashboardPort handling to list/status commands and prints dashboard: http://127.0.0.1:<port> when present; tests verify output for multiple sandboxes.
E2E script change
test/e2e/test-sandbox-survival.sh
Alters success path to emit diagnostic dump and exit with code 99 instead of exiting 0 (script-only change).

Sequence Diagram(s)

sequenceDiagram
  participant CLI as "nemoclaw onboard"
  participant Onboard as "onboard (ensureDashboardForward)"
  participant Probe as "PortProbe / openshell forward list"
  participant Registry as "sandbox registry"
  participant Run as "sandbox runtime (chat UI forward)"

  CLI->>Onboard: parse args (--control-ui-port?)
  Onboard->>Probe: check preferred port ownership & list forwarded ports
  alt preferred free
    Onboard->>Run: start forward at preferred
    Run-->>Onboard: returns allocated port (preferred)
  else conflict
    Onboard->>Probe: findFreeDashboardPort(preferred)
    alt found port
      Onboard->>Run: start forward at chosen port
      Run-->>Onboard: returns allocated port
    else none found
      Onboard-->>CLI: error / exit(1)
    end
  end
  Onboard->>Registry: updateSandbox(..., { dashboardPort: allocatedPort })
  Registry-->>Onboard: persisted
  Onboard-->>CLI: finish (CHAT_UI_URL temporarily set/restored)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through ports and found a spare,
Turned conflicts into numbers with careful care.
A flag to steer the dashboard's door,
Saved in the registry, shown once more.
Hooray—no more ghostly ports to scare!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The test-e2e script modification that changes success exit code from 0 to 99 and adds file descriptor diagnostics appears out of scope to the core dashboard port allocation fix. Clarify whether the e2e script changes are necessary for dashboard port verification or move to a separate diagnostic PR.
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: implementing auto-allocation of the dashboard port on multi-sandbox conflicts, which directly addresses the core issue from #2174.
Linked Issues check ✅ Passed The PR implements all primary objectives from #2174: auto-allocation of dashboard port [ports.ts, onboard.ts], CLI override via --control-ui-port [onboard-command.ts], port persistence in registry [registry.ts, inventory-commands.ts], and rollback on unrecoverable failures [onboard.ts].

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2174-dashboard-port-autoalloc-v2

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard-command.ts`:
- Around line 143-149: The code sets process.env.CHAT_UI_URL when
options.controlUiPort is provided but never restores the previous value, causing
cross-invocation leakage; modify the block around deps.runOnboard(options) to
save the original process.env.CHAT_UI_URL (e.g., const _orig =
process.env.CHAT_UI_URL), set it when options.controlUiPort !== null, await
deps.runOnboard(options), and then restore the original value afterwards
(reassign or delete process.env.CHAT_UI_URL if it was undefined) so repeated
calls to runOnboardCommand / deps.runOnboard do not retain the pinned URL.

In `@src/lib/onboard.ts`:
- Around line 7093-7102: printDashboard() is reconstructing
chatUiUrl/chain/guidance from process.env.CHAT_UI_URL and CONTROL_UI_PORT
instead of reusing the stored-port precedence logic, so change printDashboard()
to derive its UI URL and guidance via the same helpers/values used earlier (use
registry.getSandbox(sandboxName)?.dashboardPort, the storedUrl/ storedPort
logic, and the computed chatUiUrl variable or the same precedence:
options.chatUiUrl || process.env.CHAT_UI_URL || storedUrl ||
`http://127.0.0.1:${CONTROL_UI_PORT}`) so the post-onboard banner reflects an
auto-allocated dashboardPort; update any inline uses at printDashboard and the
similar block around lines 7135-7142 to reference those symbols instead of
rebuilding from env/defaults.
- Around line 6793-6827: ensureDashboardForward currently unconditionally
deletes the sandbox on port-forward failures; change its signature to accept an
options flag (e.g., rollbackSandboxOnFailure: boolean = false) and only call
runOpenshell(["sandbox","delete", sandboxName], ...) when
rollbackSandboxOnFailure is true (both in the userPinned/isLoopback early-exit
branch and the chosen===null branch). Leave default behavior false so reuse
paths (the callers at the earlier reuse flows) keep existing sandboxes intact,
and update only the new-sandbox creation path to call
ensureDashboardForward(..., { rollbackSandboxOnFailure: true }) after the
sandbox is freshly created.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2459036f-c0d1-49c6-ae11-67924d315e1c

📥 Commits

Reviewing files that changed from the base of the PR and between 260b237 and 196ada1.

📒 Files selected for processing (9)
  • src/lib/inventory-commands.test.ts
  • src/lib/inventory-commands.ts
  • src/lib/onboard-command.test.ts
  • src/lib/onboard-command.ts
  • src/lib/onboard.ts
  • src/lib/ports.test.ts
  • src/lib/ports.ts
  • src/lib/registry.ts
  • src/nemoclaw.ts

Comment thread src/lib/onboard-command.ts Outdated
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts
@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 3 file(s) based on 3 unresolved review comments.

Files modified:

  • nemoclaw-blueprint/scripts/ws-proxy-fix.js
  • src/lib/onboard-command.ts
  • src/lib/onboard.ts

Commit: f13c77602ee703db5682b30f5255d1ef9a9e4a94

The changes have been pushed to the fix/2174-dashboard-port-autoalloc-v2 branch.

Time taken: 3m 55s

Fixed 3 file(s) based on 3 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@copy-pr-bot

copy-pr-bot Bot commented Apr 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

6786-6873: ⚠️ Potential issue | 🟠 Major

Auto-allocation still misses non-OpenShell port conflicts.

findFreeDashboardPort() only runs when forward list already reports another sandbox owner. If the requested loopback port is held by any other local listener, portOwner stays null, forward start fails, and this code only warns. On the create path at Lines 4496-4520, that still persists dashboardPort and prints a broken URL even though nothing is forwarded. Probe the requested port for local binds before forward start, and treat a rejected start as a hard failure for fresh creates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6786 - 6873, When creating a sandbox, probe
the requested loopback port for local listeners before attempting openshell
forwarding and treat a start rejection as a hard failure for fresh creates: add
the same local-bind probe used in this block (isPortBoundLocally using
runCapture + lsof) earlier when portOwner === null so that if requestedPort is
already bound you call findFreeDashboardPort (or allocate a new effectivePort
and forwardTarget using buildChain) instead of proceeding to start;
additionally, if runOpenshell(...) /
runCaptureOpenshell(["forward","start",...]) (fwdResult) returns non-zero during
the create path, perform the same rollback (runOpenshell(["sandbox","delete",
sandboxName], { ignoreError: true })) and exit with an error message rather than
merely warning — reference the symbols requestedPort, portOwner,
isPortBoundLocally, findFreeDashboardPort, effectivePort, forwardTarget,
buildChain, fwdResult, and sandboxName to locate the insertion points.
♻️ Duplicate comments (2)
src/lib/onboard.ts (2)

6805-6839: ⚠️ Potential issue | 🔴 Critical

Scope rollback to newly created sandboxes only.

ensureDashboardForward() is also used by the reuse paths at Lines 3971-3975, 4004-4008, 4052-4056, and 4072-4076. The unconditional sandbox delete here turns a port-forward setup error into destructive data loss for an already-existing sandbox. Gate rollback behind an explicit "fresh create" flag and leave reuse flows intact.

🔧 Suggested scope guard
 function ensureDashboardForward(
   sandboxName: string,
   chatUiUrl = `http://127.0.0.1:${CONTROL_UI_PORT}`,
+  options: { rollbackSandboxOnFailure?: boolean } = {},
 ): number {
+  const rollback = () => {
+    if (options.rollbackSandboxOnFailure !== true) return;
+    runOpenshell(["sandbox", "delete", sandboxName], { ignoreError: true });
+  };
   const chain = buildChain({ chatUiUrl, isWsl: isWsl() });
   ...
     if (userPinnedEnv || !isLoopback) {
-      runOpenshell(["sandbox", "delete", sandboxName], { ignoreError: true });
+      rollback();
       console.error(`  Port ${requestedPort} is already forwarded for sandbox '${portOwner}'.`);
       ...
     }
   ...
     if (chosen === null) {
-      runOpenshell(["sandbox", "delete", sandboxName], { ignoreError: true });
+      rollback();
       console.error(`  All ports ${requestedPort}-${requestedPort + 9} are forwarded.`);
       ...
     }

Pass { rollbackSandboxOnFailure: true } only from the new-sandbox path after creation succeeds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6805 - 6839, The current unconditional
rollback in ensureDashboardForward (the runOpenshell(["sandbox","delete",
sandboxName]) call) will delete already-existing sandboxes during reuse flows;
change ensureDashboardForward to accept an options flag (e.g.,
rollbackSandboxOnFailure) and only perform runOpenshell(...sandbox delete...)
when that flag is true, keep the existing error logging and process.exit
behavior otherwise, and update the new-sandbox call site to pass {
rollbackSandboxOnFailure: true } while leaving reuse-path callers untouched.

7189-7206: ⚠️ Potential issue | 🟠 Major

Reuse the stored-port helpers in the completion banner.

printDashboard() still rebuilds chatUiUrl, chain, access URLs, and guidance from process.env.CHAT_UI_URL || CONTROL_UI_PORT instead of using the stored-port precedence added in getDashboardAccessInfo() and getDashboardGuidanceLines(). After auto-allocating 18790, the onboarding banner will still tell the user to open 18789. Route this block through the helpers, or derive the same stored-port-aware chatUiUrl here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 7189 - 7206, The code rebuilds chatUiUrl,
chain, dashboardAccess and guidanceLines manually and can return stale port
info; replace this block with the stored-port-aware helpers: call
getDashboardAccessInfo(...) to obtain the dashboardAccess array (pass token and
sandboxName or the same params those helpers expect) and call
getDashboardGuidanceLines(...) to obtain guidanceLines; remove the manual
buildChain/buildControlUiUrls logic and ensure token is passed through so any
WSL/port-precedence logic remains consistent with getDashboardAccessInfo and
getDashboardGuidanceLines (references: getDashboardAccessInfo,
getDashboardGuidanceLines, buildChain, buildControlUiUrls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 6786-6873: When creating a sandbox, probe the requested loopback
port for local listeners before attempting openshell forwarding and treat a
start rejection as a hard failure for fresh creates: add the same local-bind
probe used in this block (isPortBoundLocally using runCapture + lsof) earlier
when portOwner === null so that if requestedPort is already bound you call
findFreeDashboardPort (or allocate a new effectivePort and forwardTarget using
buildChain) instead of proceeding to start; additionally, if runOpenshell(...) /
runCaptureOpenshell(["forward","start",...]) (fwdResult) returns non-zero during
the create path, perform the same rollback (runOpenshell(["sandbox","delete",
sandboxName], { ignoreError: true })) and exit with an error message rather than
merely warning — reference the symbols requestedPort, portOwner,
isPortBoundLocally, findFreeDashboardPort, effectivePort, forwardTarget,
buildChain, fwdResult, and sandboxName to locate the insertion points.

---

Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 6805-6839: The current unconditional rollback in
ensureDashboardForward (the runOpenshell(["sandbox","delete", sandboxName])
call) will delete already-existing sandboxes during reuse flows; change
ensureDashboardForward to accept an options flag (e.g.,
rollbackSandboxOnFailure) and only perform runOpenshell(...sandbox delete...)
when that flag is true, keep the existing error logging and process.exit
behavior otherwise, and update the new-sandbox call site to pass {
rollbackSandboxOnFailure: true } while leaving reuse-path callers untouched.
- Around line 7189-7206: The code rebuilds chatUiUrl, chain, dashboardAccess and
guidanceLines manually and can return stale port info; replace this block with
the stored-port-aware helpers: call getDashboardAccessInfo(...) to obtain the
dashboardAccess array (pass token and sandboxName or the same params those
helpers expect) and call getDashboardGuidanceLines(...) to obtain guidanceLines;
remove the manual buildChain/buildControlUiUrls logic and ensure token is passed
through so any WSL/port-precedence logic remains consistent with
getDashboardAccessInfo and getDashboardGuidanceLines (references:
getDashboardAccessInfo, getDashboardGuidanceLines, buildChain,
buildControlUiUrls).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8185b0f9-9583-4ca7-844c-11ec32f2218c

📥 Commits

Reviewing files that changed from the base of the PR and between 196ada1 and d56382e.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • src/lib/registry.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/registry.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

6790-6881: ⚠️ Potential issue | 🟠 Major

Run the port picker for local-bind conflicts too.

The fallback only triggers when forward list already owns requestedPort. If the requested port is held by any other local process, portOwner is null, we skip findFreeDashboardPort(), and then persist/print a dead dashboard port after forward start fails. This misses the exact lsof-backed conflict case the new allocator is meant to handle.

🔧 Suggested change
   const existingForwards = runCaptureOpenshell(["forward", "list"], { ignoreError: true });
   const portOwner = findDashboardForwardOwner(existingForwards, String(requestedPort));
+  const heldPorts = (existingForwards ?? "")
+    .split("\n")
+    .map((l) => l.trim())
+    .filter(Boolean)
+    .map((l) => Number(l.split(/\s+/)[2]))
+    .filter((p) => Number.isFinite(p) && p > 0);
+  const isPortBoundLocally = (p: number): boolean => {
+    const out = runCapture(["lsof", "-i", `:${p}`, "-sTCP:LISTEN", "-P", "-n"], {
+      ignoreError: true,
+    });
+    return typeof out === "string" && out.trim().length > 0;
+  };
+  const requestedPortUnavailable =
+    heldPorts.includes(requestedPort) || isPortBoundLocally(requestedPort);
 
   let effectivePort = requestedPort;
   let forwardTarget = chain.forwardTarget;
-  if (portOwner !== null && portOwner !== sandboxName) {
+  if (requestedPortUnavailable && portOwner !== sandboxName) {
      ...
-    const heldPorts = (existingForwards ?? "")
-      .split("\n")
-      .map((l) => l.trim())
-      .filter(Boolean)
-      .map((l) => Number(l.split(/\s+/)[2]))
-      .filter((p) => Number.isFinite(p) && p > 0);
-    const isPortBoundLocally = (p: number): boolean => {
-      const out = runCapture(["lsof", "-i", `:${p}`, "-sTCP:LISTEN", "-P", "-n"], {
-        ignoreError: true,
-      });
-      return typeof out === "string" && out.trim().length > 0;
-    };
     const chosen = findFreeDashboardPort(requestedPort, {
       probe: {
         listForwardPorts: () => heldPorts,
         probePortFree: (p: number) => !isPortBoundLocally(p),
       },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6790 - 6881, The current port-allocation
fallback only runs when findDashboardForwardOwner(...) returns a different
sandbox, but misses the case where portOwner === null yet the requestedPort is
bound locally; move or hoist the local-probe logic so the code also runs
findFreeDashboardPort when portOwner === null but
isPortBoundLocally(requestedPort) is true. Concretely: compute
isLoopback/userPinned as done now, then either (a) merge the condition to enter
the allocator when (portOwner !== null && portOwner !== sandboxName) OR
(portOwner === null && isPortBoundLocally(requestedPort)), or factor out the
allocator block (the heldPorts mapping, isPortBoundLocally probe, and call to
findFreeDashboardPort) into a helper used in both cases; ensure you still
perform rollbackSandboxOnFailure, update forwardTarget via buildChain(...), and
print the same user-facing messages when a free port is chosen or when no ports
are available (preserve use of runCaptureOpenshell, runOpenshell,
findFreeDashboardPort, and buildChain identifiers).
🧹 Nitpick comments (1)
src/lib/onboard-command.ts (1)

146-159: Scope CHAT_UI_URL restoration to the override path only.

The finally block always rewrites process.env.CHAT_UI_URL (Line 154-Line 158), even when --control-ui-port was not provided. Guarding restoration with options.controlUiPort !== null keeps global env mutation strictly tied to this feature path.

♻️ Suggested patch
   const _origChatUiUrl = process.env.CHAT_UI_URL;
   try {
     if (options.controlUiPort !== null) {
       process.env.CHAT_UI_URL = `http://127.0.0.1:${options.controlUiPort}`;
     }
     await deps.runOnboard(options);
   } finally {
-    // Restore original value to prevent cross-invocation leakage
-    if (_origChatUiUrl === undefined) {
-      delete process.env.CHAT_UI_URL;
-    } else {
-      process.env.CHAT_UI_URL = _origChatUiUrl;
+    // Restore original value only if we overrode it in this invocation
+    if (options.controlUiPort !== null) {
+      if (_origChatUiUrl === undefined) {
+        delete process.env.CHAT_UI_URL;
+      } else {
+        process.env.CHAT_UI_URL = _origChatUiUrl;
+      }
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-command.ts` around lines 146 - 159, The finally block
currently always restores process.env.CHAT_UI_URL even when we never overrode
it; change the restore logic to only run when we previously set the env (i.e.,
when options.controlUiPort !== null). In the finally block around
deps.runOnboard, wrap the existing restore/delete logic in a guard checking
options.controlUiPort !== null so that _origChatUiUrl is only applied back if we
actually changed process.env.CHAT_UI_URL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard-command.ts`:
- Around line 82-89: The validation for --control-ui-port currently logs an
error and exits in two places (the /^\d+$/ check and the 1024–65535 range check)
but doesn't print the CLI usage like other argument validators; update both
failure branches that call error(...) and exit(1) to also invoke the parser's
usage printing function (the same helper used elsewhere in this file, e.g.,
printUsage() or usage()) before exiting so that when raw or parsed fail
validation (variables raw and parsed, and the branches that call error(...) and
exit(1)) the usage/help text is shown consistently.

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 6790-6881: The current port-allocation fallback only runs when
findDashboardForwardOwner(...) returns a different sandbox, but misses the case
where portOwner === null yet the requestedPort is bound locally; move or hoist
the local-probe logic so the code also runs findFreeDashboardPort when portOwner
=== null but isPortBoundLocally(requestedPort) is true. Concretely: compute
isLoopback/userPinned as done now, then either (a) merge the condition to enter
the allocator when (portOwner !== null && portOwner !== sandboxName) OR
(portOwner === null && isPortBoundLocally(requestedPort)), or factor out the
allocator block (the heldPorts mapping, isPortBoundLocally probe, and call to
findFreeDashboardPort) into a helper used in both cases; ensure you still
perform rollbackSandboxOnFailure, update forwardTarget via buildChain(...), and
print the same user-facing messages when a free port is chosen or when no ports
are available (preserve use of runCaptureOpenshell, runOpenshell,
findFreeDashboardPort, and buildChain identifiers).

---

Nitpick comments:
In `@src/lib/onboard-command.ts`:
- Around line 146-159: The finally block currently always restores
process.env.CHAT_UI_URL even when we never overrode it; change the restore logic
to only run when we previously set the env (i.e., when options.controlUiPort !==
null). In the finally block around deps.runOnboard, wrap the existing
restore/delete logic in a guard checking options.controlUiPort !== null so that
_origChatUiUrl is only applied back if we actually changed
process.env.CHAT_UI_URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7421a1da-48bb-42bf-8d4c-35bcef286488

📥 Commits

Reviewing files that changed from the base of the PR and between d56382e and f13c776.

📒 Files selected for processing (3)
  • nemoclaw-blueprint/scripts/ws-proxy-fix.js
  • src/lib/onboard-command.ts
  • src/lib/onboard.ts
✅ Files skipped from review due to trivial changes (1)
  • nemoclaw-blueprint/scripts/ws-proxy-fix.js

Comment on lines +82 to +89
if (!/^\d+$/.test(raw)) {
error(` --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
exit(1);
}
const parsed = Number(raw);
if (parsed < 1024 || parsed > 65535) {
error(` --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
exit(1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Print usage text for all --control-ui-port validation failures.

Line 83 and Line 88 exit with an error but skip usage output, while other argument-validation failures in this parser print usage. Keeping this consistent improves CLI guidance.

💡 Suggested patch
     if (!/^\d+$/.test(raw)) {
       error(`  --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
+      printOnboardUsage(error, noticeAcceptFlag);
       exit(1);
     }
     const parsed = Number(raw);
     if (parsed < 1024 || parsed > 65535) {
       error(`  --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
+      printOnboardUsage(error, noticeAcceptFlag);
       exit(1);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-command.ts` around lines 82 - 89, The validation for
--control-ui-port currently logs an error and exits in two places (the /^\d+$/
check and the 1024–65535 range check) but doesn't print the CLI usage like other
argument validators; update both failure branches that call error(...) and
exit(1) to also invoke the parser's usage printing function (the same helper
used elsewhere in this file, e.g., printUsage() or usage()) before exiting so
that when raw or parsed fail validation (variables raw and parsed, and the
branches that call error(...) and exit(1)) the usage/help text is shown
consistently.

ericksoa pushed a commit that referenced this pull request Apr 24, 2026
…2458)

## Summary

Tip of `main` currently fails `npm run typecheck:cli` with 8 errors in
`test/onboard.test.ts` — patterns introduced by #2130 (Slack token
validation tests) don't satisfy the `strict: true` setting introduced by
#2422. Every PR opened against main inherits these errors, which makes
the `checks` CI job red on unrelated PRs (e.g. #2454).

## Related

Follows from #2130 + #2422 interaction. No linked issue — local-only
breakage caught by CI on downstream PRs.

## Changes

`test/onboard.test.ts` — 8 annotations, zero behavior change:

- **4× `.pop()!`** — `Array.pop()` returns `string | undefined`. Each
site is immediately preceded by an `assert.equal(result.status, 0)` that
guarantees non-empty stdout, so the non-null assertion is safe.
- **3× `(c: { key: string }) => c.key === …`** — `saveCalls` entries
only read `.key`.
- **1× `(c: { name: string }) => c.name === …`** — `MESSAGING_CHANNELS`
entry lookup only reads `.name`.

The tests themselves still exercise the same cases; these changes are
purely to make the TypeScript compiler happy.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx tsc -p tsconfig.cli.json` — exits 0 (was 2 with 8 errors)
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Tests
* Improved reliability of JSON payload extraction in onboarding tests
with defensive null-checks before parsing.
* Enhanced type safety in test callbacks and iterators with explicit
TypeScript type annotations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wscurran wscurran added bug Something fails against expected or documented behavior NemoClaw CLI dependencies Pull requests that update a dependency file labels Apr 27, 2026
Resolves conflicts in src/lib/onboard-command.ts and onboard-command.test.ts:
combine main's --resume | --fresh mutual-exclusion handling with this branch's
--control-ui-port flag and tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

6069-6160: ⚠️ Potential issue | 🟠 Major

Handle non-OpenShell listeners before attempting the forward.

findFreeDashboardPort() only runs when openshell forward list already shows another sandbox on requestedPort. If the port is occupied by any other local process, portOwner stays null, the picker is skipped, and Lines 6143-6159 only warn after forward start fails. In the new-sandbox path that still leaves onboarding succeeding with an unreachable dashboard, which misses the auto-allocation/rollback goal.

💡 Suggested fix
   const existingForwards = runCaptureOpenshell(["forward", "list"], { ignoreError: true });
   const portOwner = findDashboardForwardOwner(existingForwards, String(requestedPort));
+  const heldPorts = (existingForwards ?? "")
+    .split("\n")
+    .map((l) => l.trim())
+    .filter(Boolean)
+    .map((l) => Number(l.split(/\s+/)[2]))
+    .filter((p) => Number.isFinite(p) && p > 0);
+  const isPortBoundLocally = (p: number): boolean => {
+    const out = runCapture(["lsof", "-i", `:${p}`, "-sTCP:LISTEN", "-P", "-n"], {
+      ignoreError: true,
+    });
+    return typeof out === "string" && out.trim().length > 0;
+  };
+  const portBusyLocally = isPortBoundLocally(requestedPort);

   let effectivePort = requestedPort;
   let forwardTarget = chain.forwardTarget;
-  if (portOwner !== null && portOwner !== sandboxName) {
+  if ((portOwner !== null && portOwner !== sandboxName) || portBusyLocally) {
     const userPinnedEnv = !!process.env.CHAT_UI_URL;
     let isLoopback = true;
     try {
       const parsed = new URL(
         /^[a-z]+:\/\//i.test(chatUiUrl) ? chatUiUrl : `http://${chatUiUrl}`,
@@
     const chosen = findFreeDashboardPort(requestedPort, {
       probe: {
         listForwardPorts: () => heldPorts,
         probePortFree: (p: number) => !isPortBoundLocally(p),
       },
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 6069 - 6160, The code currently only probes
for local listeners when an openshell forward exists; to fix, detect
non-OpenShell listeners on the requested port before attempting forward and
treat that like a conflicting owner: call the existing isPortBoundLocally logic
(reuse the runCapture(["lsof", ...]) pattern) when portOwner === null and if it
returns true then run the same allocation/rollback flow that uses
findFreeDashboardPort, buildChain, and runOpenshell(["sandbox","delete",...]) as
in the portOwner !== sandboxName branch; ensure you reference and reuse the same
heldPorts/probe implementation and then set effectivePort/forwardTarget before
calling runOpenshell(["forward","start",...]) so onboarding fails over to an
allocated loopback port instead of silently producing an unreachable dashboard.
♻️ Duplicate comments (1)
src/lib/onboard-command.ts (1)

84-91: ⚠️ Potential issue | 🟡 Minor

Show usage for invalid --control-ui-port values.

Line 84 and Line 89 still exit without the usage banner, unlike the other parser errors in this function.

💡 Suggested patch
     if (!/^\d+$/.test(raw)) {
       error(`  --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
+      printOnboardUsage(error, noticeAcceptFlag);
       exit(1);
     }
     const parsed = Number(raw);
     if (parsed < 1024 || parsed > 65535) {
       error(`  --control-ui-port '${raw}' must be an integer between 1024 and 65535`);
+      printOnboardUsage(error, noticeAcceptFlag);
       exit(1);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-command.ts` around lines 84 - 91, The invalid
--control-ui-port branches currently call error(...) and exit(1) without showing
the usage banner; modify both validation failure branches (the /^\d+$/.test(raw)
check and the range check for parsed) to invoke the same usage display used by
other parser errors (e.g. call showUsage() or the existing usage/display
function in this module) before calling exit(1) so the usage banner is printed
when raw or parsed is invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 6069-6160: The code currently only probes for local listeners when
an openshell forward exists; to fix, detect non-OpenShell listeners on the
requested port before attempting forward and treat that like a conflicting
owner: call the existing isPortBoundLocally logic (reuse the runCapture(["lsof",
...]) pattern) when portOwner === null and if it returns true then run the same
allocation/rollback flow that uses findFreeDashboardPort, buildChain, and
runOpenshell(["sandbox","delete",...]) as in the portOwner !== sandboxName
branch; ensure you reference and reuse the same heldPorts/probe implementation
and then set effectivePort/forwardTarget before calling
runOpenshell(["forward","start",...]) so onboarding fails over to an allocated
loopback port instead of silently producing an unreachable dashboard.

---

Duplicate comments:
In `@src/lib/onboard-command.ts`:
- Around line 84-91: The invalid --control-ui-port branches currently call
error(...) and exit(1) without showing the usage banner; modify both validation
failure branches (the /^\d+$/.test(raw) check and the range check for parsed) to
invoke the same usage display used by other parser errors (e.g. call showUsage()
or the existing usage/display function in this module) before calling exit(1) so
the usage banner is printed when raw or parsed is invalid.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: df140fe6-2930-40ea-aba3-837bda2e3718

📥 Commits

Reviewing files that changed from the base of the PR and between f13c776 and 20ef745.

📒 Files selected for processing (6)
  • src/lib/inventory-commands.test.ts
  • src/lib/onboard-command.test.ts
  • src/lib/onboard-command.ts
  • src/lib/onboard.ts
  • src/lib/registry.ts
  • src/nemoclaw.ts
✅ Files skipped from review due to trivial changes (2)
  • src/nemoclaw.ts
  • src/lib/inventory-commands.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/registry.ts
  • src/lib/onboard-command.test.ts

Temporary instrumentation to root-cause the post-success-message hang on
sandbox-survival-e2e for PR #2454. Captures lsof output, ps tree, and
every process holding bash's stdout/stderr pipe FDs into the upload
artifact path. exit 99 forces the failure step so the artifact uploads.

Revert before merge.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-sandbox-survival.sh`:
- Around line 718-762: The diagnostic block that writes to DIAG_FILE ends with a
forced "exit 99", which causes the script to fail even when assertions passed;
change the script so the success path exits with 0 (restore "exit 0" instead of
"exit 99") or gate the diagnostic+nonzero exit behind an explicit failure
condition, keeping the DIAG_FILE logging intact but ensuring the normal end of
the script uses exit 0 (look for the DIAG_FILE variable and the block that
prints "=== EXIT-DIAG" and remove or conditionalize the "exit 99").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e26fe829-a1e6-4326-b66e-243460807484

📥 Commits

Reviewing files that changed from the base of the PR and between 20ef745 and d2a74d2.

📒 Files selected for processing (1)
  • test/e2e/test-sandbox-survival.sh

Comment thread test/e2e/test-sandbox-survival.sh Outdated
v1 saw fd 1 = the diag file (its own redirection) instead of the runner
pipe. Save the original to fds 9/8 first, then walk /proc to find every
process holding the same pipe inode.
Records when bash actually starts to exit (via EXIT trap, fires right
before exit syscall). Comparing that to the runner's Process completed
timestamp tells us whether the ~60s post-printf gap is in bash or in
the runner's exit-detection path.
If the trap fires immediately after diag (within ms), bash exits cleanly
on exit 0 and the 50s+ PR-specific extra tail is runner-side. If trap
fires 50s+ later, bash is hanging in fflush() before exit.
The sandbox-survival-e2e job has a 900s internal wall clock and was
flaking on PRs whose install hits the slow side of GitHub runner
variance. Investigation traced the failure to runner-side log batching:
on a clean exit 0 with no follow-up step (if: failure() skipped), the
runner sat in its ~60s log-flush window before reporting the step's
exit code. A guaranteed follow-up step (if: always()) makes the runner
finalize immediately — tail dropped from 60s to 1.7s in a controlled
test, reclaiming ~58s of the 900s budget.

The artifact upload is harmless on success: the production script
doesn't write /tmp/nemoclaw-e2e-install.log, and if-no-files-found:
ignore no-ops the upload.
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25080893392
Branch: fix/2174-dashboard-port-autoalloc-v2
Requested jobs: all (no filter)
Summary: 0 passed, 16 failed, 1 skipped

Job Result
cloud-e2e ❌ failure
cloud-experimental-e2e ❌ failure
deployment-services-e2e ⚠️ cancelled
diagnostics-e2e ❌ failure
gpu-e2e ⏭️ skipped
hermes-e2e ❌ failure
inference-routing-e2e ❌ failure
messaging-providers-e2e ❌ failure
network-policy-e2e ❌ failure
overlayfs-autofix-e2e ⚠️ cancelled
rebuild-hermes-e2e ❌ failure
rebuild-openclaw-e2e ❌ failure
sandbox-operations-e2e ❌ failure
sandbox-survival-e2e ❌ failure
shields-config-e2e ❌ failure
skip-permissions-e2e ❌ failure
snapshot-commands-e2e ❌ failure
token-rotation-e2e ❌ failure
upgrade-stale-sandbox-e2e ❌ failure

Failed jobs: cloud-e2e, cloud-experimental-e2e, messaging-providers-e2e, token-rotation-e2e, sandbox-survival-e2e, hermes-e2e, skip-permissions-e2e, sandbox-operations-e2e, inference-routing-e2e, network-policy-e2e, diagnostics-e2e, snapshot-commands-e2e, shields-config-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25081353877
Branch: fix/2174-dashboard-port-autoalloc-v2
Requested jobs: all (no filter)
Summary: 13 passed, 3 failed, 1 skipped

Job Result
cloud-e2e ✅ success
cloud-experimental-e2e ❌ failure
deployment-services-e2e ✅ success
diagnostics-e2e ✅ success
gpu-e2e ⏭️ skipped
hermes-e2e ✅ success
inference-routing-e2e ✅ success
messaging-providers-e2e ✅ success
network-policy-e2e ✅ success
overlayfs-autofix-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ❌ failure
sandbox-operations-e2e ⚠️ cancelled
sandbox-survival-e2e ⚠️ cancelled
shields-config-e2e ✅ success
skip-permissions-e2e ✅ success
snapshot-commands-e2e ✅ success
token-rotation-e2e ✅ success
upgrade-stale-sandbox-e2e ❌ failure

Failed jobs: cloud-experimental-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25082492604
Branch: fix/2174-dashboard-port-autoalloc-v2
Requested jobs: all (no filter)
Summary: 14 passed, 4 failed, 1 skipped

Job Result
cloud-e2e ✅ success
cloud-experimental-e2e ❌ failure
deployment-services-e2e ❌ failure
diagnostics-e2e ✅ success
gpu-e2e ⏭️ skipped
hermes-e2e ✅ success
inference-routing-e2e ✅ success
messaging-providers-e2e ✅ success
network-policy-e2e ✅ success
overlayfs-autofix-e2e ❌ failure
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ❌ failure
sandbox-operations-e2e ✅ success
sandbox-survival-e2e ✅ success
shields-config-e2e ✅ success
skip-permissions-e2e ✅ success
snapshot-commands-e2e ✅ success
token-rotation-e2e ✅ success
upgrade-stale-sandbox-e2e ✅ success

Failed jobs: cloud-experimental-e2e, deployment-services-e2e, rebuild-openclaw-e2e, overlayfs-autofix-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25086059313
Branch: fix/2174-dashboard-port-autoalloc-v2
Requested jobs: all (no filter)
Summary: 17 passed, 1 failed, 1 skipped

Job Result
cloud-e2e ✅ success
cloud-experimental-e2e ❌ failure
deployment-services-e2e ✅ success
diagnostics-e2e ✅ success
gpu-e2e ⏭️ skipped
hermes-e2e ✅ success
inference-routing-e2e ✅ success
messaging-providers-e2e ✅ success
network-policy-e2e ✅ success
overlayfs-autofix-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ✅ success
sandbox-operations-e2e ✅ success
sandbox-survival-e2e ✅ success
shields-config-e2e ✅ success
skip-permissions-e2e ✅ success
snapshot-commands-e2e ✅ success
token-rotation-e2e ✅ success
upgrade-stale-sandbox-e2e ✅ success

Failed jobs: cloud-experimental-e2e. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 25093314104
Branch: fix/2174-dashboard-port-autoalloc-v2
Requested jobs: all (no filter)
Summary: 16 passed, 2 failed, 1 skipped

Job Result
cloud-e2e ✅ success
cloud-experimental-e2e ❌ failure
deployment-services-e2e ✅ success
diagnostics-e2e ✅ success
gpu-e2e ⏭️ skipped
hermes-e2e ✅ success
inference-routing-e2e ✅ success
messaging-providers-e2e ✅ success
network-policy-e2e ✅ success
overlayfs-autofix-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ✅ success
sandbox-operations-e2e ❌ failure
sandbox-survival-e2e ✅ success
shields-config-e2e ✅ success
skip-permissions-e2e ✅ success
snapshot-commands-e2e ✅ success
token-rotation-e2e ✅ success
upgrade-stale-sandbox-e2e ✅ success

Failed jobs: cloud-experimental-e2e, sandbox-operations-e2e. Check run artifacts for logs.

@cjagwani

Copy link
Copy Markdown
Contributor Author

Superseded.

The crash half of #2174 was addressed by #2411 (merged 2026-04-28). The "leaks ghost sandbox" half — which this PR also covered — landed separately as #2627 (merged 2026-04-29) on top of main.

Closing this branch in favour of the two-PR sequence that actually shipped:

The salvageable hardening from this branch is in #2627. The lsof-bind-probe differentiator turned out to have a reachability bug (CodeRabbit caught it) — Aaron's version in #2411 implemented the same idea correctly, so it's not in #2627. The reuseChatUiUrlFor re-onboard port-stickiness wasn't carried forward; #2411's stale-forward cleanup achieves a different but adequate outcome, and reopening that design discussion is out of scope.

@cjagwani cjagwani closed this Apr 29, 2026
jyaunches added a commit to jyaunches/NemoClaw that referenced this pull request Apr 29, 2026
Introduce src/lib/openshell-timeouts.ts with categorized timeout
constants for all openshell child-process calls:

- OPENSHELL_PROBE_TIMEOUT_MS (15s) — read-only queries
- OPENSHELL_OPERATION_TIMEOUT_MS (30s) — mutating commands
- OPENSHELL_HEAVY_TIMEOUT_MS (60s) — destructive/long-running ops
- OPENSHELL_DOWNLOAD_TIMEOUT_MS (30s) — file transfers over SSH

Uses the same OPENSHELL_PROBE_TIMEOUT_MS name introduced locally
in PR NVIDIA#2454 so that PR can import from the shared module on rebase.

This is Phase 1 of NVIDIA#2562 — pure constants, no behavioral change.

Refs: NVIDIA#2562
jyaunches added a commit that referenced this pull request Apr 29, 2026
…2683)

## Summary

Introduces a shared module of named timeout constants for all openshell
child-process calls. This is Phase 1 of #2562 — pure constants with no
behavioral change, establishing the vocabulary that subsequent PRs will
use to bound every unbounded call site.

## Related Issue

Refs #2562 (partial — Phase 1 of 4)

## Changes

- Add `src/lib/openshell-timeouts.ts` with four categorized constants:
- `OPENSHELL_PROBE_TIMEOUT_MS` (15s) — read-only queries (list, status,
info, ssh-config)
- `OPENSHELL_OPERATION_TIMEOUT_MS` (30s) — mutating commands (provider
CRUD, forward, gateway select)
- `OPENSHELL_HEAVY_TIMEOUT_MS` (60s) — destructive/long-running (sandbox
delete, gateway destroy)
  - `OPENSHELL_DOWNLOAD_TIMEOUT_MS` (30s) — file transfers over SSH
- Add `src/lib/openshell-timeouts.test.ts` with unit tests for
positivity, ordering, and PR #2454 alignment
- Uses the same `OPENSHELL_PROBE_TIMEOUT_MS` name from PR #2454 so it
can import from this shared module on rebase

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [ ] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes

## AI Disclosure

- [x] AI-assisted — tool: Claude Code (pi)

---
Signed-off-by: Jessica Yaunches <jyaunches@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Centralized timeout settings for shell interactions to standardize
behavior and improve reliability across operations.
* **Tests**
* Added tests that validate timeout values are numeric, positive, and
follow the expected ordering to prevent regressions.
* **Stability**
* Applied the refined probe-level timeout to quick checks to make probes
and lightweight operations more consistent and predictable.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…VIDIA#2458)

## Summary

Tip of `main` currently fails `npm run typecheck:cli` with 8 errors in
`test/onboard.test.ts` — patterns introduced by NVIDIA#2130 (Slack token
validation tests) don't satisfy the `strict: true` setting introduced by
NVIDIA#2422. Every PR opened against main inherits these errors, which makes
the `checks` CI job red on unrelated PRs (e.g. NVIDIA#2454).

## Related

Follows from NVIDIA#2130 + NVIDIA#2422 interaction. No linked issue — local-only
breakage caught by CI on downstream PRs.

## Changes

`test/onboard.test.ts` — 8 annotations, zero behavior change:

- **4× `.pop()!`** — `Array.pop()` returns `string | undefined`. Each
site is immediately preceded by an `assert.equal(result.status, 0)` that
guarantees non-empty stdout, so the non-null assertion is safe.
- **3× `(c: { key: string }) => c.key === …`** — `saveCalls` entries
only read `.key`.
- **1× `(c: { name: string }) => c.name === …`** — `MESSAGING_CHANNELS`
entry lookup only reads `.name`.

The tests themselves still exercise the same cases; these changes are
purely to make the TypeScript compiler happy.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx tsc -p tsconfig.cli.json` — exits 0 (was 2 with 8 errors)
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Tests
* Improved reliability of JSON payload extraction in onboarding tests
with defensive null-checks before parsing.
* Enhanced type safety in test callbacks and iterators with explicit
TypeScript type annotations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…VIDIA#2683)

## Summary

Introduces a shared module of named timeout constants for all openshell
child-process calls. This is Phase 1 of NVIDIA#2562 — pure constants with no
behavioral change, establishing the vocabulary that subsequent PRs will
use to bound every unbounded call site.

## Related Issue

Refs NVIDIA#2562 (partial — Phase 1 of 4)

## Changes

- Add `src/lib/openshell-timeouts.ts` with four categorized constants:
- `OPENSHELL_PROBE_TIMEOUT_MS` (15s) — read-only queries (list, status,
info, ssh-config)
- `OPENSHELL_OPERATION_TIMEOUT_MS` (30s) — mutating commands (provider
CRUD, forward, gateway select)
- `OPENSHELL_HEAVY_TIMEOUT_MS` (60s) — destructive/long-running (sandbox
delete, gateway destroy)
  - `OPENSHELL_DOWNLOAD_TIMEOUT_MS` (30s) — file transfers over SSH
- Add `src/lib/openshell-timeouts.test.ts` with unit tests for
positivity, ordering, and PR NVIDIA#2454 alignment
- Uses the same `OPENSHELL_PROBE_TIMEOUT_MS` name from PR NVIDIA#2454 so it
can import from this shared module on rebase

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [ ] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes

## AI Disclosure

- [x] AI-assisted — tool: Claude Code (pi)

---
Signed-off-by: Jessica Yaunches <jyaunches@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Centralized timeout settings for shell interactions to standardize
behavior and improve reliability across operations.
* **Tests**
* Added tests that validate timeout values are numeric, positive, and
follow the expected ordering to prevent regressions.
* **Stability**
* Applied the refined probe-level timeout to quick checks to make probes
and lightweight operations more consistent and predictable.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ksapru pushed a commit to ksapru/NemoClaw that referenced this pull request May 12, 2026
…VIDIA#2458)

## Summary

Tip of `main` currently fails `npm run typecheck:cli` with 8 errors in
`test/onboard.test.ts` — patterns introduced by NVIDIA#2130 (Slack token
validation tests) don't satisfy the `strict: true` setting introduced by
NVIDIA#2422. Every PR opened against main inherits these errors, which makes
the `checks` CI job red on unrelated PRs (e.g. NVIDIA#2454).

## Related

Follows from NVIDIA#2130 + NVIDIA#2422 interaction. No linked issue — local-only
breakage caught by CI on downstream PRs.

## Changes

`test/onboard.test.ts` — 8 annotations, zero behavior change:

- **4× `.pop()!`** — `Array.pop()` returns `string | undefined`. Each
site is immediately preceded by an `assert.equal(result.status, 0)` that
guarantees non-empty stdout, so the non-null assertion is safe.
- **3× `(c: { key: string }) => c.key === …`** — `saveCalls` entries
only read `.key`.
- **1× `(c: { name: string }) => c.name === …`** — `MESSAGING_CHANNELS`
entry lookup only reads `.name`.

The tests themselves still exercise the same cases; these changes are
purely to make the TypeScript compiler happy.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx tsc -p tsconfig.cli.json` — exits 0 (was 2 with 8 errors)
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Tests
* Improved reliability of JSON payload extraction in onboarding tests
with defensive null-checks before parsing.
* Enhanced type safety in test callbacks and iterators with explicit
TypeScript type annotations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression chore Build, CI, dependency, or tooling maintenance and removed NemoClaw CLI bug Something fails against expected or documented behavior chore Build, CI, dependency, or tooling maintenance labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression dependencies Pull requests that update a dependency file

Projects

None yet

3 participants