Skip to content

fix(onboard): probe TCP ports without requiring nc#5012

Merged
cv merged 3 commits into
mainfrom
fix/4974-port-probe-without-nc
Jun 9, 2026
Merged

fix(onboard): probe TCP ports without requiring nc#5012
cv merged 3 commits into
mainfrom
fix/4974-port-probe-without-nc

Conversation

@Dongni-Yang

@Dongni-Yang Dongni-Yang commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

waitForPort() in src/lib/core/wait.ts probed TCP ports by shelling out to nc -z. On hosts where nc (netcat) is not installed — minimal Linux distros (e.g. CachyOS, per #4974) and Windows — spawnSync returns ENOENT with status: null, so every probe reads as "port not ready". Ollama onboarding then aborts with the misleading error:

Ollama auth proxy did not become ready on :11435 within timeout.

even though the proxy started fine.

Fix

Keep nc -z as the probe when it is present, and fall back to a short-lived Node subprocess (process.execPath -e <script>) that opens a TCP connection when nc is unavailable (ENOENT). Any host that can run NemoClaw has Node by definition, so this closes the gap on minimal Linux distros and Windows while leaving behavior unchanged where nc exists (keeping existing onboarding test stubs valid). The port is passed as an argv value and never interpolated into the script, so it can never be treated as code.

This mirrors the existing spawn(process.execPath, …) pattern already used in src/lib/hermes-tool-gateway-broker.ts.

Tests

test/wait.test.ts adds coverage for waitForPort (previously untested):

  • returns true for a listening port with PATH emptied — exercising the nc-absent fallback (this case fails on the old implementation)
  • returns false when nothing is listening

Notes

Fixes #4974

Summary by CodeRabbit

  • Bug Fixes

    • Improved port availability detection to prefer the external probe but gracefully fall back to an internal TCP check when external tools are unavailable, reducing false negatives and improving reliability across environments.
  • Tests

    • Added regression tests verifying successful detection when external tools are absent and correct timeout behavior when no service is listening.

Signed-off-by: Dongni Yang dongniy@nvidia.com

waitForPort shelled out to `nc -z`, which is not installed on many hosts
(minimal Linux distros such as CachyOS, and Windows). With nc missing,
spawnSync returns ENOENT (status null), so every probe read as "not ready"
and Ollama onboarding aborted with the misleading "Ollama auth proxy did
not become ready on :11435 within timeout".

Probe by connecting from a short-lived Node subprocess (process.execPath
-e) instead, removing the external-tool dependency entirely. The port is
passed as argv so it can never be treated as code.

Fixes #4974

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

waitForPort now prefers nc -z 127.0.0.1 <port> but falls back to spawning a short-lived Node subprocess running an inline TCP probe script that connects to 127.0.0.1:<port> and uses the subprocess exit code to indicate reachability. Tests added validate both success and failure cases with PATH cleared.

Changes

TCP Probe Implementation and Validation

Layer / File(s) Summary
Probe script constant and docs
src/lib/core/wait.ts
Add TCP_PROBE_SCRIPT (inline Node TCP connect script) and update waitForPort JSDoc to document preferring nc with a Node fallback.
waitForPort runtime probe
src/lib/core/wait.ts
Modify polling probe to run nc -z first and, if nc did not produce a numeric status, spawn process.execPath -e TCP_PROBE_SCRIPT <port> with a timeout and use probe.status === 0 as readiness verdict.
Regression tests
test/wait.test.ts
Import createServer/AddressInfo and waitForPort; add tests that clear process.env.PATH and assert waitForPort returns true when a loopback server listens and false when the port is closed.
sequenceDiagram
  participant Test
  participant waitForPort
  participant ncShell
  participant NodeProbe
  participant LoopbackServer
  Test->>LoopbackServer: createServer on port
  Test->>waitForPort: call waitForPort(port) with PATH cleared
  alt nc available and numeric status
    waitForPort->>ncShell: run `nc -z 127.0.0.1 <port>`
    ncShell-->>waitForPort: numeric exit status (0/1)
  else nc missing or non-numeric status
    waitForPort->>NodeProbe: spawn `node -e TCP_PROBE_SCRIPT <port>`
    NodeProbe->>LoopbackServer: net.connect 127.0.0.1:port
    LoopbackServer-->>NodeProbe: accept or refuse
    NodeProbe-->>waitForPort: exit 0 (success) or 1 (failure)
  end
  waitForPort-->>Test: return true or false
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

area: networking, bug-fix

Suggested reviewers

  • cv

Poem

🐰 I hopped a small script through Node's bright door,
No nc needed on this local shore,
One quick connect, a blink and a cheer,
Ports answered true, or stayed far, my dear —
A tiny rabbit test, and onboarding's clear.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(onboard): probe TCP ports without requiring nc' clearly describes the main change: making port probing work without the nc dependency.
Linked Issues check ✅ Passed The changes implement the objective from #4974 by adding a fallback Node-based TCP probe when nc is unavailable, making port verification work on systems without nc installed.
Out of Scope Changes check ✅ Passed All changes are scoped to waitForPort implementation and its tests; no unrelated modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4974-port-probe-without-nc

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gpu-repo-local-ollama-openclaw
Optional E2E: ollama-proxy-e2e, wsl-repo-cloud-openclaw

Dispatch hint: gpu-repo-local-ollama-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • gpu-repo-local-ollama-openclaw (high): This is the existing scenario that best exercises the impacted real user flow: install/onboard with NEMOCLAW_PROVIDER=ollama, start local Ollama, start the auth proxy through the CLI path that calls waitForPort(), create the sandbox, and verify local inference through the sandbox/proxy route.

Optional E2E

  • ollama-proxy-e2e (medium): Useful adjacent confidence for the Ollama auth proxy runtime, token handling, and real inference path, although it starts/probes the proxy from the E2E script rather than directly exercising the CLI waitForPort() readiness gate.
  • wsl-repo-cloud-openclaw (high): Optional platform confidence because the fallback was added for nc-missing/Windows-adjacent environments. This scenario validates the WSL install/onboard path, but it does not specifically cover local Ollama auth proxy startup.

New E2E recommendations

  • local Ollama auth proxy readiness without nc (medium): Existing E2E coverage does not appear to force PATH to hide nc while running the real CLI onboarding/proxy startup path. The new unit test covers waitForPort() directly, but an E2E regression could still occur in startOllamaAuthProxy() integration.
    • Suggested test: Add a focused regression E2E that starts the real Ollama auth proxy through the CLI/library path with PATH sanitized so nc is unavailable, then asserts readiness succeeds via the Node fallback.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: gpu-repo-local-ollama-openclaw

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: gpu-repo-local-ollama-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • gpu-repo-local-ollama-openclaw: src/lib/core/wait.ts changes waitForPort, which is exercised by the Ollama auth proxy startup path; the only dispatchable scenario covering the ollama-proxy suite is gpu-repo-local-ollama-openclaw.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/core/wait.ts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Consider writing more tests for
  • **Runtime validation** — Ollama auth proxy startup detects a listening proxy when nc is unavailable or PATH is empty.. The added unit tests directly cover the utility-level regression and negative path. Because this utility gates Ollama onboarding/runtime readiness, targeted runtime validation would further increase confidence without being required by the diff.
  • **Runtime validation** — waitForPort returns false without throwing for invalid, non-integer, NaN, negative, zero, and out-of-range port arguments.. The added unit tests directly cover the utility-level regression and negative path. Because this utility gates Ollama onboarding/runtime readiness, targeted runtime validation would further increase confidence without being required by the diff.
  • **Acceptance clause:** Reproduction Steps: "1. Run `nemoclaw onboard`" — add test evidence or identify existing coverage. The changed utility is the one used by Ollama onboarding readiness, and the unit test exercises the missing-nc condition. The PR does not add an end-to-end onboarding test, so this reproduction step is covered indirectly.
  • **Acceptance clause:** Reproduction Steps: "2. Select `ollama`" — add test evidence or identify existing coverage. Ollama proxy startup depends on waitForPort, but the added tests target waitForPort directly rather than driving the interactive Ollama onboarding flow.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

The prior commit replaced the nc probe outright, which broke existing
tests that stub spawnSync on `cmd === "nc"` to simulate the Ollama proxy
port becoming ready (onboard-ollama-autostart, onboard-selection). Keep
nc as the primary probe so that contract holds, and use the Node TCP
subprocess only as a fallback when nc is unavailable (ENOENT).

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ⚠️ No requested jobs ran

Run: 27182241041
Target ref: fix/4974-port-probe-without-nc
Requested jobs: gpu-e2e
Summary: 0 passed, 0 failed, 1 skipped

Job Result
gpu-e2e ⏭️ skipped

@Dongni-Yang Dongni-Yang added the v0.0.62 Release target label Jun 9, 2026
@cv cv merged commit ac3d69c into main Jun 9, 2026
110 of 111 checks passed
@cv cv deleted the fix/4974-port-probe-without-nc branch June 9, 2026 06:51
jyaunches pushed a commit that referenced this pull request Jun 10, 2026
## Summary
- Add v0.0.62 release notes from Discussion #5100 and link release
highlights to the relevant docs pages.
- Document the release's GPU sandbox recreation, sandbox-side local
inference verification, and Hermes dashboard port guard in the command
and inference references.
- Refresh generated NemoClaw user skills for the release-prep docs set.

## Source Summary
- #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU
recreation behavior for Linux Docker-driver sandboxes.
- #5024 -> `docs/inference/use-local-inference.mdx`: Document
sandbox-runtime verification of the `inference.local` local inference
route.
- #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra
device-node group propagation for sandbox CUDA initialization.
- #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`:
Summarize onboarding and recovery reliability fixes, including the
reserved Hermes API port guard.
- #5017 and #5043 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Summarize mutable OpenClaw config
recovery and host-side `agents list` coverage.
- #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes
upstream metadata visibility and WhatsApp QR rendering reliability.
- #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`:
Refresh generated user-skill references from the current docs source.

## Skipped
- #5019 -> skipped for new prose because it touched
`openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`.
Existing source docs remain the source for generated skill
synchronization.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning)
- Pre-commit hooks passed during commit, including docs-to-skills
verification, markdown lint, gitleaks, and skills YAML tests.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added `nemoclaw <name> agents list` command.
* v0.0.62 release notes added summarizing onboarding and recovery
improvements.

* **Bug Fixes**
* Improved GPU sandbox onboarding reliability (NVIDIA CDI path,
Jetson/Tegra device handling).
* Better local inference verification and recovery for Linux
Docker-driver GPU sandboxes.
  * Quieter/earlier handling of onboarding drift and port collisions.

* **Documentation**
* Expanded GPU passthrough, inference verification, writable paths
(`/dev/pts`), port 8642 restriction, and command examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When onboarding to ollama, verifying the proxy fails because nc isn't installed.

3 participants