fix(onboard): probe TCP ports without requiring nc#5012
Conversation
waitForPort shelled out to `nc -z`, which is not installed on many hosts (minimal Linux distros such as CachyOS, and Windows). With nc missing, spawnSync returns ENOENT (status null), so every probe read as "not ready" and Ollama onboarding aborted with the misleading "Ollama auth proxy did not become ready on :11435 within timeout". Probe by connecting from a short-lived Node subprocess (process.execPath -e) instead, removing the external-tool dependency entirely. The port is passed as argv so it can never be treated as code. Fixes #4974 Signed-off-by: Dongni Yang <dongniy@nvidia.com>
📝 WalkthroughWalkthroughwaitForPort now prefers ChangesTCP Probe Implementation and Validation
sequenceDiagram
participant Test
participant waitForPort
participant ncShell
participant NodeProbe
participant LoopbackServer
Test->>LoopbackServer: createServer on port
Test->>waitForPort: call waitForPort(port) with PATH cleared
alt nc available and numeric status
waitForPort->>ncShell: run `nc -z 127.0.0.1 <port>`
ncShell-->>waitForPort: numeric exit status (0/1)
else nc missing or non-numeric status
waitForPort->>NodeProbe: spawn `node -e TCP_PROBE_SCRIPT <port>`
NodeProbe->>LoopbackServer: net.connect 127.0.0.1:port
LoopbackServer-->>NodeProbe: accept or refuse
NodeProbe-->>waitForPort: exit 0 (success) or 1 (failure)
end
waitForPort-->>Test: return true or false
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
The prior commit replaced the nc probe outright, which broke existing tests that stub spawnSync on `cmd === "nc"` to simulate the Ollama proxy port becoming ready (onboard-ollama-autostart, onboard-selection). Keep nc as the primary probe so that contract holds, and use the Node TCP subprocess only as a fallback when nc is unavailable (ENOENT). Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
waitForPort()insrc/lib/core/wait.tsprobed TCP ports by shelling out tonc -z. On hosts wherenc(netcat) is not installed — minimal Linux distros (e.g. CachyOS, per #4974) and Windows —spawnSyncreturnsENOENTwithstatus: null, so every probe reads as "port not ready". Ollama onboarding then aborts with the misleading error:even though the proxy started fine.
Fix
Keep
nc -zas the probe when it is present, and fall back to a short-lived Node subprocess (process.execPath -e <script>) that opens a TCP connection whenncis unavailable (ENOENT). Any host that can run NemoClaw has Node by definition, so this closes the gap on minimal Linux distros and Windows while leaving behavior unchanged wherencexists (keeping existing onboarding test stubs valid). The port is passed as anargvvalue and never interpolated into the script, so it can never be treated as code.This mirrors the existing
spawn(process.execPath, …)pattern already used insrc/lib/hermes-tool-gateway-broker.ts.Tests
test/wait.test.tsadds coverage forwaitForPort(previously untested):truefor a listening port withPATHemptied — exercising thenc-absent fallback (this case fails on the old implementation)falsewhen nothing is listeningNotes
waitForPort/ thencdependency per the issue.waitForHttpstill usescurl(far more universally present, and it relies on theNO_PROXYloopback env from [macOS][Onboard] Ollama readiness check fails when HTTP_PROXY is set — waitForHttp does not inject NO_PROXY for localhost probes #4181).Fixes #4974
Summary by CodeRabbit
Bug Fixes
Tests
Signed-off-by: Dongni Yang dongniy@nvidia.com