fix(onboard): probe TCP ports without requiring nc by Dongni-Yang · Pull Request #5012 · NVIDIA/NemoClaw

Dongni-Yang · 2026-06-09T03:09:28Z

Summary

waitForPort() in src/lib/core/wait.ts probed TCP ports by shelling out to nc -z. On hosts where nc (netcat) is not installed — minimal Linux distros (e.g. CachyOS, per #4974) and Windows — spawnSync returns ENOENT with status: null, so every probe reads as "port not ready". Ollama onboarding then aborts with the misleading error:

Ollama auth proxy did not become ready on :11435 within timeout.

even though the proxy started fine.

Fix

Keep nc -z as the probe when it is present, and fall back to a short-lived Node subprocess (process.execPath -e <script>) that opens a TCP connection when nc is unavailable (ENOENT). Any host that can run NemoClaw has Node by definition, so this closes the gap on minimal Linux distros and Windows while leaving behavior unchanged where nc exists (keeping existing onboarding test stubs valid). The port is passed as an argv value and never interpolated into the script, so it can never be treated as code.

This mirrors the existing spawn(process.execPath, …) pattern already used in src/lib/hermes-tool-gateway-broker.ts.

Tests

test/wait.test.ts adds coverage for waitForPort (previously untested):

returns true for a listening port with PATH emptied — exercising the nc-absent fallback (this case fails on the old implementation)
returns false when nothing is listening

Notes

Scope is limited to waitForPort / the nc dependency per the issue. waitForHttp still uses curl (far more universally present, and it relies on the NO_PROXY loopback env from [macOS][Onboard] Ollama readiness check fails when HTTP_PROXY is set — waitForHttp does not inject NO_PROXY for localhost probes #4181).

Fixes #4974

Summary by CodeRabbit

Bug Fixes
- Improved port availability detection to prefer the external probe but gracefully fall back to an internal TCP check when external tools are unavailable, reducing false negatives and improving reliability across environments.
Tests
- Added regression tests verifying successful detection when external tools are absent and correct timeout behavior when no service is listening.

Signed-off-by: Dongni Yang dongniy@nvidia.com

waitForPort shelled out to `nc -z`, which is not installed on many hosts (minimal Linux distros such as CachyOS, and Windows). With nc missing, spawnSync returns ENOENT (status null), so every probe read as "not ready" and Ollama onboarding aborted with the misleading "Ollama auth proxy did not become ready on :11435 within timeout". Probe by connecting from a short-lived Node subprocess (process.execPath -e) instead, removing the external-tool dependency entirely. The port is passed as argv so it can never be treated as code. Fixes #4974 Signed-off-by: Dongni Yang <dongniy@nvidia.com>

coderabbitai · 2026-06-09T03:09:42Z

📝 Walkthrough

Walkthrough

waitForPort now prefers nc -z 127.0.0.1 <port> but falls back to spawning a short-lived Node subprocess running an inline TCP probe script that connects to 127.0.0.1:<port> and uses the subprocess exit code to indicate reachability. Tests added validate both success and failure cases with PATH cleared.

Changes

TCP Probe Implementation and Validation

Layer / File(s)	Summary
Probe script constant and docs `src/lib/core/wait.ts`	Add `TCP_PROBE_SCRIPT` (inline Node TCP connect script) and update `waitForPort` JSDoc to document preferring `nc` with a Node fallback.
waitForPort runtime probe `src/lib/core/wait.ts`	Modify polling probe to run `nc -z` first and, if nc did not produce a numeric status, spawn `process.execPath -e TCP_PROBE_SCRIPT <port>` with a timeout and use `probe.status === 0` as readiness verdict.
Regression tests `test/wait.test.ts`	Import `createServer`/`AddressInfo` and `waitForPort`; add tests that clear `process.env.PATH` and assert `waitForPort` returns true when a loopback server listens and false when the port is closed.

sequenceDiagram
  participant Test
  participant waitForPort
  participant ncShell
  participant NodeProbe
  participant LoopbackServer
  Test->>LoopbackServer: createServer on port
  Test->>waitForPort: call waitForPort(port) with PATH cleared
  alt nc available and numeric status
    waitForPort->>ncShell: run `nc -z 127.0.0.1 <port>`
    ncShell-->>waitForPort: numeric exit status (0/1)
  else nc missing or non-numeric status
    waitForPort->>NodeProbe: spawn `node -e TCP_PROBE_SCRIPT <port>`
    NodeProbe->>LoopbackServer: net.connect 127.0.0.1:port
    LoopbackServer-->>NodeProbe: accept or refuse
    NodeProbe-->>waitForPort: exit 0 (success) or 1 (failure)
  end
  waitForPort-->>Test: return true or false

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

area: networking, bug-fix

Suggested reviewers

cv

Poem

🐰 I hopped a small script through Node's bright door,
No nc needed on this local shore,
One quick connect, a blink and a cheer,
Ports answered true, or stayed far, my dear —
A tiny rabbit test, and onboarding's clear.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(onboard): probe TCP ports without requiring nc' clearly describes the main change: making port probing work without the nc dependency.
Linked Issues check	✅ Passed	The changes implement the objective from `#4974` by adding a fallback Node-based TCP probe when nc is unavailable, making port verification work on systems without nc installed.
Out of Scope Changes check	✅ Passed	All changes are scoped to waitForPort implementation and its tests; no unrelated modifications are present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4974-port-probe-without-nc

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T03:11:42Z

E2E Advisor Recommendation

Required E2E: gpu-repo-local-ollama-openclaw
Optional E2E: ollama-proxy-e2e, wsl-repo-cloud-openclaw

Dispatch hint: gpu-repo-local-ollama-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

gpu-repo-local-ollama-openclaw (high): This is the existing scenario that best exercises the impacted real user flow: install/onboard with NEMOCLAW_PROVIDER=ollama, start local Ollama, start the auth proxy through the CLI path that calls waitForPort(), create the sandbox, and verify local inference through the sandbox/proxy route.

Optional E2E

ollama-proxy-e2e (medium): Useful adjacent confidence for the Ollama auth proxy runtime, token handling, and real inference path, although it starts/probes the proxy from the E2E script rather than directly exercising the CLI waitForPort() readiness gate.
wsl-repo-cloud-openclaw (high): Optional platform confidence because the fallback was added for nc-missing/Windows-adjacent environments. This scenario validates the WSL install/onboard path, but it does not specifically cover local Ollama auth proxy startup.

New E2E recommendations

local Ollama auth proxy readiness without nc (medium): Existing E2E coverage does not appear to force PATH to hide nc while running the real CLI onboarding/proxy startup path. The new unit test covers waitForPort() directly, but an E2E regression could still occur in startOllamaAuthProxy() integration.
- Suggested test: Add a focused regression E2E that starts the real Ollama auth proxy through the CLI/library path with PATH sanitized so nc is unavailable, then asserts readiness succeeds via the Node fallback.

Dispatch hint

Workflow: .github/workflows/e2e-scenarios.yaml
jobs input: gpu-repo-local-ollama-openclaw

github-actions · 2026-06-09T03:11:43Z

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

gpu-repo-local-ollama-openclaw: src/lib/core/wait.ts changes waitForPort, which is exercised by the Ollama auth proxy startup path; the only dispatchable scenario covering the ollama-proxy suite is gpu-repo-local-ollama-openclaw.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

None.

Relevant changed files

src/lib/core/wait.ts

github-actions · 2026-06-09T03:11:49Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Consider writing more tests for

**Runtime validation** — Ollama auth proxy startup detects a listening proxy when nc is unavailable or PATH is empty.. The added unit tests directly cover the utility-level regression and negative path. Because this utility gates Ollama onboarding/runtime readiness, targeted runtime validation would further increase confidence without being required by the diff.
**Runtime validation** — waitForPort returns false without throwing for invalid, non-integer, NaN, negative, zero, and out-of-range port arguments.. The added unit tests directly cover the utility-level regression and negative path. Because this utility gates Ollama onboarding/runtime readiness, targeted runtime validation would further increase confidence without being required by the diff.
**Acceptance clause:** Reproduction Steps: "1. Run `nemoclaw onboard`" — add test evidence or identify existing coverage. The changed utility is the one used by Ollama onboarding readiness, and the unit test exercises the missing-nc condition. The PR does not add an end-to-end onboarding test, so this reproduction step is covered indirectly.
**Acceptance clause:** Reproduction Steps: "2. Select `ollama`" — add test evidence or identify existing coverage. Ollama proxy startup depends on waitForPort, but the added tests target waitForPort directly rather than driving the interactive Ollama onboarding flow.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

The prior commit replaced the nc probe outright, which broke existing tests that stub spawnSync on `cmd === "nc"` to simulate the Ollama proxy port becoming ready (onboard-ollama-autostart, onboard-selection). Keep nc as the primary probe so that contract holds, and use the Node TCP subprocess only as a fallback when nc is unavailable (ENOENT). Signed-off-by: Dongni Yang <dongniy@nvidia.com>

github-actions · 2026-06-09T03:37:40Z

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27182241041
Target ref: fix/4974-port-probe-without-nc
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job	Result
gpu-e2e	⏭️ skipped

## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests.  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples.  --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>

wscurran added the bug-fix PR fixes a bug or regression label Jun 9, 2026

Merge branch 'main' into fix/4974-port-probe-without-nc

4aa5345

Dongni-Yang added the v0.0.62 Release target label Jun 9, 2026

cv approved these changes Jun 9, 2026

View reviewed changes

cv merged commit ac3d69c into main Jun 9, 2026
110 of 111 checks passed

cv deleted the fix/4974-port-probe-without-nc branch June 9, 2026 06:51

This was referenced Jun 9, 2026

fix(onboard): diagnose Ollama auth-proxy port conflicts and recover startup #5040

Merged

chore(biome): cover and format all TypeScript files #5020

Merged

miyoungc mentioned this pull request Jun 10, 2026

docs: refresh v0.0.62 release docs #5157

Merged

Conversation

Dongni-Yang commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Tests

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ⚠️ No requested jobs ran

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dongni-Yang commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading