Skip to content

fix(onboard): strengthen dashboard port detection and roll back on forward-start failure#3313

Merged
cv merged 20 commits into
mainfrom
fix/3260-retry-dashboard-forward-on-port-failure
May 14, 2026
Merged

fix(onboard): strengthen dashboard port detection and roll back on forward-start failure#3313
cv merged 20 commits into
mainfrom
fix/3260-retry-dashboard-forward-on-port-failure

Conversation

@laitingsheng

@laitingsheng laitingsheng commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

On the createSandbox path, nemoclaw onboard can return a dashboard port that was free at preflight but bound by something else by the time openshell forward start runs (TOCTOU during the multi-minute build), or a port that lsof failed to flag on the first probe. The result is a freshly-built sandbox baked with an unreachable CHAT_UI_URL. This PR strengthens the host-side port detection, fails fast on forward-start error with a clean rollback, and tightens the test seam so the new behaviour is covered without spawning real probes.

Related Issue

Fixes #3260.

Changes

  • Strengthen isPortBoundOnHost with a layered probe chain — direct lsof, sudo -n lsof (catches root-owned listeners that unprivileged lsof misses on macOS), then a synchronous Node net bind via spawnSync (mirrors what openshell forward start actually attempts). Any positive signal short-circuits.
  • Fail-fast on TOCTOU during the multi-minute build: when ensureDashboardForward gets a non-zero status from openshell forward start on the createSandbox path (rollbackSandboxOnFailure: true), delete the just-built sandbox via buildOrphanedSandboxRollbackMessage and exit non-zero with a clear retry message instead of returning a port that is already known to be unreachable. The reuse paths keep the existing soft-warn UX.
  • Export findAvailableDashboardPort and add an injectable isPortBoundCheck test seam so the new unit tests don't have to spawn real lsof / Node probes.
  • Behavioural unit tests for findAvailableDashboardPort (port reservation semantics with stub probes) plus source-shape guards for the strengthened detection chain and rollback branch — extended in test/onboard.test.ts.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • `npx prek run --all-files` passes
  • `npm test` passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • `make docs` builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Implemented proactive detection of occupied host ports to prevent conflicts during sandbox creation and improve dashboard port allocation reliability.
  • Bug Fixes

    • Strengthened sandbox rollback behavior when port conflicts occur with automatic rollback enabled.
    • Enhanced failure diagnostics to report which processes occupy conflicting ports.

Review Change Stack

… forward-start failure

Closes #3260.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 9, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a dedicated dashboard-port allocator with layered host probing, integrates it into onboard.ts, tightens rollback for TOCTOU and forward-start port-conflict failures, and expands tests for allocation, host-probe sequencing, and rollback behaviors.

Changes

Dashboard Port Allocation with Multi-Stage Conflict Detection

Layer / File(s) Summary
Module bootstrap & helpers
src/lib/onboard/dashboard-port.ts
Module header, runCapture interop, ANSI stripping, and isLiveForwardStatus predicate.
Multi-Stage Port Detection Chain
src/lib/onboard/dashboard-port.ts
probePortBoundSync implements a synchronous Node bind probe; isPortBoundOnHost layers lsofsudo -n lsof → bind probe with optimistic fallback and injectable test seam.
Port Allocation with Occupancy Tracking
src/lib/onboard/dashboard-port.ts
getOccupiedPorts parses openshell forward list; findAvailableDashboardPort selects preferred or scans configured range avoiding re-probes, and throws a detailed exhaustion error including host-bound blockers.
Integrate allocator into sandbox creation
src/lib/onboard.ts
Imports findAvailableDashboardPort, getOccupiedPorts, isLiveForwardStatus from new module; removes inline implementations; re-exports allocator for tests.
Stronger rollback on port conflicts and forward failure
src/lib/onboard.ts
ensureDashboardForward deletes sandbox and exits when allocator returns a different port than the baked-in preferred port (TOCTOU) if rollback enabled; classifies forward start failures and deletes sandbox on host-port conflict when rollback enabled.
Test helper and runtime guards
test/onboard.test.ts
Extends OnboardTestInternals with findAvailableDashboardPort helper and updates the runtime type-guard and destructuring.
Functional and regression tests
test/onboard.test.ts
Adds a findAvailableDashboardPort port-conflict test suite, asserts probe-chain composition, and tests ensureDashboardForward rollback on forward-start failure and TOCTOU reallocation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I probed the ports with careful paws,
Three checks to stop collision's flaws.
When sandboxes clash mid-creation,
Rollback hops in to save the station,
Tests snug as carrots guard allocation.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the two main changes: strengthening dashboard port detection and rolling back on forward-start failure, which align perfectly with the core objectives.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from issue #3260: host-side port binding detection via layered probes (lsof → sudo lsof → net.bind), auto-allocation of free ports within the 18789-18799 range, fail-fast rollback on forward-start failures during sandbox creation, and prevented broken sandboxes via stricter port validation.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to dashboard port detection and rollback logic as required by issue #3260; new module, onboard refactoring, test additions, and exports all directly support the objectives with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/3260-retry-dashboard-forward-on-port-failure

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

8980-8999: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include host-bound ports in the “range exhausted” error.

If every candidate is skipped by isPortBoundCheck, the failure text built below still only lists OpenShell forwards, so users can get “all dashboard ports are occupied” with an empty owner list. Please track host-bound candidates during the scan and include them in the thrown error so the remediation stays actionable.

Suggested direction
 function findAvailableDashboardPort(
   sandboxName: string,
   preferredPort: number,
   forwardListOutput: string | null,
   isPortBoundCheck: (port: number) => boolean = isPortBoundOnHost,
 ): number {
   const occupied = getOccupiedPorts(forwardListOutput);
+  const hostBoundPorts = new Set<number>();
   const preferredStr = String(preferredPort);
   const owner = occupied.get(preferredStr) ?? null;
   // If this sandbox already owns the forward, keep it.
   if (owner === sandboxName) return preferredPort;
   // If no forward claims the port, also check the host so we don't collide
   // with non-OpenShell processes.
-  if (owner === null && !isPortBoundCheck(preferredPort)) return preferredPort;
+  if (owner === null) {
+    if (!isPortBoundCheck(preferredPort)) return preferredPort;
+    hostBoundPorts.add(preferredPort);
+  }

   for (let p = DASHBOARD_PORT_RANGE_START; p <= DASHBOARD_PORT_RANGE_END; p++) {
     const pStr = String(p);
     const pOwner = occupied.get(pStr) ?? null;
     if (pOwner === sandboxName) return p;
-    if (pOwner === null && !isPortBoundCheck(p)) return p;
+    if (pOwner === null) {
+      if (!isPortBoundCheck(p)) return p;
+      hostBoundPorts.add(p);
+    }
   }

-  const owners = [...occupied.entries()]
+  const owners = [
+    ...[...occupied.entries()]
+      .filter(
+        ([p]) => Number(p) >= DASHBOARD_PORT_RANGE_START && Number(p) <= DASHBOARD_PORT_RANGE_END,
+      )
+      .map(([p, s]) => `  ${p} → ${s}`),
+    ...[...hostBoundPorts].map((p) => `  ${p} → non-OpenShell host listener`),
+  ]
-    .filter(
-      ([p]) => Number(p) >= DASHBOARD_PORT_RANGE_START && Number(p) <= DASHBOARD_PORT_RANGE_END,
-    )
-    .map(([p, s]) => `  ${p} → ${s}`)
     .join("\n");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 8980 - 8999, In findAvailableDashboardPort,
currently only OpenShell-forward owners from occupied (map from
getOccupiedPorts) are collected for the "range exhausted" error, so when
candidates are skipped by isPortBoundCheck(host-bound) the owner list can be
empty; update the scanning loop in findAvailableDashboardPort to also record
host-bound ports (e.g., push p into a hostBoundPorts array when pOwner === null
&& isPortBoundCheck(p) returns true) as you iterate, and when throwing the final
error include both the occupied owners (from occupied) and the collected
hostBoundPorts so the message shows which ports are bound by the host and which
are claimed by sandboxes (use the existing occupied/owner variables and
isPortBoundCheck to determine and report both lists).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 9105-9129: The rollback branch unconditionally fabricates a
port-conflict Error losing the real `forward start` output; change it to
preserve and inspect the actual `fwdResult` output (from the result returned by
`runOpenshell`/the `forward start` call) and only create the EADDRINUSE-style
Error when the captured stderr/stdout contains an EADDRINUSE/EADDRINUSE-like
text; otherwise construct `err` using the original command output (include
stdout/stderr or result.message) so
`buildOrphanedSandboxRollbackMessage(sandboxName, err, ...)` shows the real
forward failure; keep existing uses of `rollbackSandboxOnFailure`, `actualPort`,
`cliName()`, `sandboxName`, and `runOpenshell()` and ensure you propagate the
preserved output into the delete/rollback logging path.

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 8980-8999: In findAvailableDashboardPort, currently only
OpenShell-forward owners from occupied (map from getOccupiedPorts) are collected
for the "range exhausted" error, so when candidates are skipped by
isPortBoundCheck(host-bound) the owner list can be empty; update the scanning
loop in findAvailableDashboardPort to also record host-bound ports (e.g., push p
into a hostBoundPorts array when pOwner === null && isPortBoundCheck(p) returns
true) as you iterate, and when throwing the final error include both the
occupied owners (from occupied) and the collected hostBoundPorts so the message
shows which ports are bound by the host and which are claimed by sandboxes (use
the existing occupied/owner variables and isPortBoundCheck to determine and
report both lists).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0dd8bf16-a464-470a-ac2c-ffd730358a4f

📥 Commits

Reviewing files that changed from the base of the PR and between c4aaec3 and 3c62e66.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard.test.ts

Comment thread src/lib/onboard.ts
…d ports

Addresses CodeRabbit findings on #3260.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng marked this pull request as ready for review May 9, 2026 15:04

@cv cv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract these functions out of onboard.ts, please? This file is getting far too big.

@cv cv closed this May 12, 2026
@cv cv reopened this May 12, 2026
…dashboard-port.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…ilure

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…ecomes host-bound during build

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, double-onboard-e2e
Optional E2E: onboard-resume-e2e, onboard-repair-e2e, dashboard-remote-bind-e2e, launchable-smoke-e2e

Dispatch hint: cloud-onboard-e2e,double-onboard-e2e

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e: Exercises the full non-interactive onboard create path (sandbox build → ensureDashboardForward with rollbackSandboxOnFailure=true → dashboard forward start). This is the exact path whose rollback semantics changed in this PR; a regression here would leave users with orphaned sandboxes or false 'port unreachable' exits on every fresh onboard.
  • double-onboard-e2e: Onboards twice and explicitly stops/reuses dashboard forwards on port 18789, exercising getOccupiedPorts / findAvailableDashboardPort against real openshell forward list output and the stale-forward cleanup branch of ensureDashboardForward. Directly covers the moved/refactored allocator code.

Optional E2E

  • onboard-resume-e2e: Stops the dashboard forward on 18789 and re-runs onboard via the resume path. Exercises ensureDashboardForward on a re-onboard where the create-path rollback should NOT fire — useful confidence check that the rollback gating on rollbackSandboxOnFailure is correct.
  • onboard-repair-e2e: Repair-path onboarding goes through ensureDashboardForward without rollback semantics; verifies the warn-and-continue branch of the actualPort != preferredPort handler still works after the refactor.
  • dashboard-remote-bind-e2e: Restarts the dashboard forward and asserts the bind interface — touches the same openshell forward start --background invocation the PR replaced with FD-based stdio. Worth running to confirm the diagnostic-file plumbing didn't regress remote-bind behaviour.
  • launchable-smoke-e2e: End-to-end install + onboard smoke that includes port-forward liveness checks; provides a broader signal that the allocator/forward-start refactor doesn't regress the user-visible install-and-go path.

New E2E recommendations

  • onboarding/dashboard-port-toctou (medium): The PR's headline behaviour ([macOS][Onboard] nemoclaw onboard silently assigns dashboard 18789 when port already held — no auto-allocation; sandbox created in broken state #3260) is a TOCTOU window: a non-OpenShell process binds the chosen dashboard port DURING the multi-minute sandbox build, after findAvailableDashboardPort cleared it. No existing E2E forces that race — they all run on clean hosts where the port stays free. Without coverage, the rollback path (delete sandbox + exit 1 with port-conflict messaging) is only validated by source-shape regex assertions in onboard.test.ts.
  • onboarding/host-bind-detection-sudo (low): isPortBoundOnHost now escalates via sudo -n lsof to catch root-owned listeners (docker-proxy on macOS). No existing E2E pre-binds a port as root and asserts the allocator skips it; behaviour will silently fall back to the Node bind probe if sudo escalation fails on the runner.
    • Suggested test: E2E that binds a dashboard-range port as root (or via docker-proxy) on a Linux runner, runs nemoclaw onboard, and asserts the resulting CHAT_UI_URL points at a different port — not at the root-bound one.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,double-onboard-e2e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/onboard.test.ts (1)

9265-9341: ⚡ Quick win

Harden source-shape assertions to reduce regex brittleness.

A few assertions are tightly coupled to formatting/tokenization (e.g., exact array literal text) or use broad multiline capture (actualPort !== preferredPort block), which can become flaky on harmless refactors.

Proposed tightening (scope-first + semantic checks)
-const mismatchBranch = source.match(
-  /if \(actualPort !== preferredPort\) \{[\s\S]*?\n  \}/,
-);
-assert.ok(mismatchBranch, "Expected actualPort !== preferredPort branch in ensureDashboardForward");
-const branchBody = mismatchBranch[0];
+const ensureFnStart = source.indexOf("function ensureDashboardForward");
+assert.ok(ensureFnStart !== -1, "ensureDashboardForward not found");
+const ensureFnBody = source.slice(ensureFnStart, source.indexOf("\nfunction ", ensureFnStart + 1) === -1 ? source.length : source.indexOf("\nfunction ", ensureFnStart + 1));
+const mismatchStart = ensureFnBody.indexOf("if (actualPort !== preferredPort)");
+assert.ok(mismatchStart !== -1, "Expected actualPort !== preferredPort branch in ensureDashboardForward");
+const branchBody = ensureFnBody.slice(mismatchStart, ensureFnBody.indexOf("return actualPort;", mismatchStart));
-assert.match(source, /\["lsof", "-i", `:\$\{port\}`, "-sTCP:LISTEN", "-P", "-n"\]/);
+assert.match(source, /lsof/);
+assert.match(source, /-sTCP:LISTEN/);
+assert.match(source, /probePortBoundSync/);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard.test.ts` around lines 9265 - 9341, The tests use brittle,
formatting-sensitive regexes; relax them by matching semantic tokens instead of
exact array literal formatting and by narrowing the multiline capture for the
actualPort mismatch branch. Update assertions in this file to (1) for
isPortBoundOnHost/probePortBoundSync/EADDRINUSE, assert presence of
"isPortBoundOnHost", "lsof" and each flag ("-i", "-sTCP:LISTEN", "-P", "-n")
separately (and a separate assertion for the sudo variant) rather than the exact
array text, (2) for ensureDashboardForward rollback checks, locate the
actualPort !== preferredPort branch by matching the conditional anchor "if
(actualPort !== preferredPort)" then assert the branchBody contains the key
strings/behaviours ("if (rollbackSandboxOnFailure)", "became host-bound during
sandbox build", the runOpenshell delete call pattern,
"buildOrphanedSandboxRollbackMessage", "process.exit(1)") and separately assert
the reuse-path warning text ("is taken. Using port .* instead") without
depending on a wide single multiline regex capture. Use case-insensitive checks
for EADDRINUSE text where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/onboard.test.ts`:
- Around line 9265-9341: The tests use brittle, formatting-sensitive regexes;
relax them by matching semantic tokens instead of exact array literal formatting
and by narrowing the multiline capture for the actualPort mismatch branch.
Update assertions in this file to (1) for
isPortBoundOnHost/probePortBoundSync/EADDRINUSE, assert presence of
"isPortBoundOnHost", "lsof" and each flag ("-i", "-sTCP:LISTEN", "-P", "-n")
separately (and a separate assertion for the sudo variant) rather than the exact
array text, (2) for ensureDashboardForward rollback checks, locate the
actualPort !== preferredPort branch by matching the conditional anchor "if
(actualPort !== preferredPort)" then assert the branchBody contains the key
strings/behaviours ("if (rollbackSandboxOnFailure)", "became host-bound during
sandbox build", the runOpenshell delete call pattern,
"buildOrphanedSandboxRollbackMessage", "process.exit(1)") and separately assert
the reuse-path warning text ("is taken. Using port .* instead") without
depending on a wide single multiline regex capture. Use case-insensitive checks
for EADDRINUSE text where appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1fad900b-74b4-434c-8a5c-2f1413e3de7c

📥 Commits

Reviewing files that changed from the base of the PR and between 024a24a and 30f301c.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/dashboard-port.ts
  • test/onboard.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard.ts

@laitingsheng laitingsheng marked this pull request as draft May 12, 2026 02:35
@laitingsheng laitingsheng marked this pull request as ready for review May 12, 2026 02:57
@laitingsheng laitingsheng added fix platform: macos Affects macOS, including Apple Silicon and removed priority: medium labels May 12, 2026
@laitingsheng laitingsheng marked this pull request as draft May 12, 2026 11:14
@laitingsheng laitingsheng removed platform: macos Affects macOS, including Apple Silicon NemoClaw CLI labels May 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25778615451
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

Comment thread src/lib/onboard.ts Fixed
Comment thread src/lib/onboard.ts Fixed
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25779926291
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

Comment thread src/lib/onboard/forward-start.ts Fixed
Comment thread src/lib/onboard/forward-start.ts Fixed
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25780108062
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-resume-e2e ✅ success

Comment thread src/lib/onboard.ts Fixed
Comment thread src/lib/onboard.ts Fixed
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25780952174
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25781050208
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25781126807
Branch: fix/3260-retry-dashboard-forward-on-port-failure
Requested jobs: double-onboard-e2e,onboard-resume-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-resume-e2e ✅ success

@cv cv added v0.0.42 and removed v0.0.41 labels May 14, 2026
@cv cv merged commit 2bed2c6 into main May 14, 2026
69 checks passed
@miyoungc miyoungc mentioned this pull request May 14, 2026
12 tasks
miyoungc added a commit that referenced this pull request May 14, 2026
## Summary
Refreshes the NemoClaw documentation for the local `main` changes
included in the 0.0.42 release. The update adds release notes, updates
the affected user-facing setup and troubleshooting pages, bumps docs
metadata to 0.0.42, and regenerates the matching user skills.

## Changes
- #3537 -> `docs/reference/commands.md`,
`docs/reference/troubleshooting.md`: Documented host-level status
fields, cloudflared state-specific recovery hints, and Local Ollama auth
proxy status diagnostics.
- #3454 -> `docs/get-started/prerequisites.md`,
`docs/get-started/quickstart.md`: Documented macOS Docker-driver
onboarding and removed the expectation that standard macOS setup needs a
VM driver helper.
- #3514 -> `docs/inference/use-local-inference.md`: Documented
compatible-endpoint retry behavior for reasoning-only smoke responses.
- #3448 -> `docs/reference/commands.md`,
`docs/manage-sandboxes/messaging-channels.md`: Documented canonical
channel names and policy preset hints after `channels add`.
- #3520 -> `docs/about/release-notes.md`: Captured clearer GPU recovery
and uninstall wording in the 0.0.42 release notes.
- #3313 -> `docs/get-started/quickstart.md`,
`docs/reference/troubleshooting.md`: Documented stronger dashboard port
detection and rollback when a forward cannot start.
- #3502 -> `docs/about/release-notes.md`: Captured batched onboarding
policy preset application in the 0.0.42 release notes.
- #3505 -> `docs/reference/troubleshooting.md`: Documented the top-level
Colima socket path.
- #3421 -> `docs/about/release-notes.md`: Captured idempotent installer
shim logging in the 0.0.42 release notes.
- Updated `docs/project.json`, `docs/versions1.json`, and regenerated
`.agents/skills/nemoclaw-user-*` outputs.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes - v0.0.42

* **Documentation**
  * Enhanced macOS onboarding guidance for Docker gateway setup
  * Improved dashboard port conflict handling with automatic rollback
* Better local Ollama inference diagnostics and authentication proxy
checks
  * Clarified status command output and recovery procedures
  * Refined messaging channel setup documentation

* **Chores**
  * Version bump to 0.0.42

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3540)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression and removed NemoClaw CLI labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS][Onboard] nemoclaw onboard silently assigns dashboard 18789 when port already held — no auto-allocation; sandbox created in broken state

6 participants