fix(onboard): front Ollama with the auth proxy on WSL native Docker too#3732
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughDetects whether sandbox containers can reach host loopback, centralizes topology decisions, chooses Ollama container ports and proxy gating by capability, always binds Ollama to 127.0.0.1 on start, and repairs systemd loopback overrides on Linux. ChangesOllama Loopback Networking
Sequence DiagramsequenceDiagram
participant Onboard as Onboard
participant Topology as local_inference_topology
participant Platform as platform.containerCanReachHostLoopback
participant Systemd as ensure/mergeOllamaLoopbackSystemdOverride
participant OllamaDaemon as OllamaService
participant AuthProxy as AuthenticatedProxy
Onboard->>Topology: getContainerRuntime()
Topology->>Platform: containerCanReachHostLoopback(runtime, opts)
Platform-->>Topology: canReachLoopback (bool)
Topology-->>Onboard: shouldFrontOllamaWithProxy()
Onboard->>Systemd: repairLocalInferenceSystemdOverrideOrExit(provider)
Systemd-->>OllamaDaemon: ensure loopback binding and restart
Onboard->>OllamaDaemon: start `ollama serve` with OLLAMA_HOST=127.0.0.1:PORT
Onboard->>AuthProxy: start/configure (only if shouldFrontOllamaWithProxy() == true)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 7017-7020: The setupInference flow still gates proxy credential
wiring on isWsl(), causing credentials to be set incorrectly when proxy startup
moved to reachability-based logic; update setupInference to use the same
reachability check (containerCanReachHostLoopback(getContainerRuntime()))
instead of isWsl() when deciding whether to wire proxy tokens/credentials for
the provider (the proxy-token wiring logic inside setupInference should follow
the reachability gate used where proxy startup is decided, not the old isWsl()
gate). Ensure references to containerCanReachHostLoopback, getContainerRuntime,
setupInference, and any existing proxy token/credential variables are updated so
the provider receives proxy credentials whenever the reachability check
indicates the proxy will be used.
- Around line 7003-7006: Two identical explanatory comment blocks about the
Ollama loopback / Docker Desktop-on-WSL behavior are duplicated and inflate the
onboard entrypoint file; collapse them into a single short comment mentioning
that Ollama is loopback-only and reachable via the authenticated proxy on
OLLAMA_PROXY_PORT except for Docker Desktop on WSL, remove the duplicated
paragraph at the second occurrence (the other similar block around the later
Ollama proxy usage), and leave all surrounding code and behavior (proxy ports
and authentication) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2ec86a76-f847-4be8-950d-d176a2898416
📒 Files selected for processing (5)
src/lib/onboard.tssrc/lib/onboard/ollama-systemd.tssrc/lib/platform.tstest/onboard-selection.test.tstest/platform.test.ts
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🚀 Docs preview ready! |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)
7606-7606:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the same reachability gate for the fallback proxy recovery.
This hunk switches credential wiring to
shouldFrontOllamaWithProxy(), but Lines 7578-7581 still decide whether to start the recovery proxy with!isWsl(). On native dockerd-on-WSL, that means validation can fail and exit before the proxy is ever started, even though this branch now expects the proxy path.Suggested fix
- if (!isWsl()) { + if (shouldFrontOllamaWithProxy()) { ensureOllamaAuthProxy(); proxyReady = isProxyHealthy(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` at line 7606, The reachability gate for starting the fallback recovery proxy must match the new credential branch: replace the legacy check that uses !isWsl() (the condition that decides whether to start/skip the recovery proxy and whether to run validation) with shouldFrontOllamaWithProxy() so the code that starts the recovery proxy and the validation path use the same predicate; update the logic around the recovery proxy startup and any early-exit validation that currently runs under !isWsl() to instead use shouldFrontOllamaWithProxy() (referencing the existing shouldFrontOllamaWithProxy() call and the recovery proxy start/validation code paths) so validation won’t fail/exit before the proxy is started.
🧹 Nitpick comments (1)
docs/inference/use-local-inference.md (1)
99-99: ⚡ Quick winRemove unnecessary bold (LLM pattern detected).
The bold on
**not**is emphatic emphasis rather than a UI label, parameter name, or genuine warning. This is a common LLM-generated pattern that should be avoided.Suggested fix
-**Native Docker installed inside a WSL distro** (the more common WSL2 setup with `docker-ce`) does **not** see the host's loopback from its bridge network, so the proxy runs there too. +Native Docker installed inside a WSL distro (the more common WSL2 setup with `docker-ce`) does not see the host's loopback from its bridge network, so the proxy runs there too.As per coding guidelines: Bold is reserved for UI labels, parameter names, and genuine warnings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/inference/use-local-inference.md` at line 99, In the sentence containing "Native Docker installed inside a WSL distro (the more common WSL2 setup with `docker-ce`) does **not** see the host's loopback from its bridge network, so the proxy runs there too.", remove the unnecessary bold markup around "not" so it reads "...does not see the host's loopback...", preserving the inline code span for `docker-ce` and leaving the rest of the sentence unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/inference/use-local-inference.md`:
- Line 98: The sentence "The proxy is skipped only when Docker Desktop on WSL
bridges the host's loopback into containers via `host.docker.internal`" uses
passive voice; change it to active voice by naming the actor (e.g., "NemoClaw
skips the proxy only when Docker Desktop on WSL bridges the host's loopback into
containers via `host.docker.internal`, because raw Ollama on `127.0.0.1:11434`
is directly reachable and the auth proxy adds no value.") Replace the passive
sentence in use-local-inference.md with a similar active-voice construction that
identifies the actor (NemoClaw or the application) and preserves the existing
technical details.
In `@src/lib/onboard.ts`:
- Around line 9765-9771: The resume fast-path currently calls
ensureOllamaLoopbackSystemdOverride({ isNonInteractive }) for provider ===
"ollama-local" but ignores a failure result; update the block to check the
return value and abort the resume when ensureOllamaLoopbackSystemdOverride
returns "failed" (match the fatal behavior used in the other Ollama paths) by
propagating the error/returning failure or throwing so re-onboard stops if the
loopback repair did not succeed.
---
Duplicate comments:
In `@src/lib/onboard.ts`:
- Line 7606: The reachability gate for starting the fallback recovery proxy must
match the new credential branch: replace the legacy check that uses !isWsl()
(the condition that decides whether to start/skip the recovery proxy and whether
to run validation) with shouldFrontOllamaWithProxy() so the code that starts the
recovery proxy and the validation path use the same predicate; update the logic
around the recovery proxy startup and any early-exit validation that currently
runs under !isWsl() to instead use shouldFrontOllamaWithProxy() (referencing the
existing shouldFrontOllamaWithProxy() call and the recovery proxy
start/validation code paths) so validation won’t fail/exit before the proxy is
started.
---
Nitpick comments:
In `@docs/inference/use-local-inference.md`:
- Line 99: In the sentence containing "Native Docker installed inside a WSL
distro (the more common WSL2 setup with `docker-ce`) does **not** see the host's
loopback from its bridge network, so the proxy runs there too.", remove the
unnecessary bold markup around "not" so it reads "...does not see the host's
loopback...", preserving the inline code span for `docker-ce` and leaving the
rest of the sentence unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cb8b00e8-7933-4c46-9b55-21b78e2a5433
📒 Files selected for processing (6)
docs/inference/use-local-inference.mdsrc/lib/inference/local.test.tssrc/lib/inference/local.tssrc/lib/onboard.tssrc/lib/onboard/local-inference-topology.tssrc/lib/onboard/ollama-systemd.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard/ollama-systemd.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26044567282
|
…doc to active voice Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/onboard.ts (1)
7609-7623:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the same reachability gate for proxy recovery and token wiring.
Line 7609 now keys proxy credentials off
shouldFrontOllamaWithProxy(), but Line 7581 still only probes/starts the proxy behind!isWsl(). On native dockerd-in-WSL that leaves the recovery path on the old heuristic, so local validation can fail and abort before the proxy path is ever considered.Suggested fix
if (!validation.ok) { // The container reachability check uses Docker's --add-host host-gateway, // which may not work on all Docker configurations (e.g., Brev, rootless). // The real sandbox uses k3s CoreDNS + NodeHosts — a different path. // Try to start/restart the auth proxy before probing — this recovers // from stale or missing proxy processes before we decide to abort. - if (!isWsl()) { + if (shouldFrontOllamaWithProxy()) { ensureOllamaAuthProxy(); proxyReady = isProxyHealthy(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 7609 - 7623, The proxy recovery probe is gated by the old isWsl() heuristic while token wiring uses shouldFrontOllamaWithProxy(), causing mismatched behavior; change the recovery/start condition to use shouldFrontOllamaWithProxy() so both paths align: where the code currently checks !isWsl() before calling ensureOllamaAuthProxy()/probing the proxy (the block that sets proxyReady), update that condition to shouldFrontOllamaWithProxy(), ensure proxyReady is set the same way, and keep the later token wiring logic that calls getOllamaProxyToken(), ensureOllamaAuthProxy(), and persistAndProbeOllamaProxy() unchanged so both recovery and token wiring use the same reachability gate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/lib/onboard.ts`:
- Around line 7609-7623: The proxy recovery probe is gated by the old isWsl()
heuristic while token wiring uses shouldFrontOllamaWithProxy(), causing
mismatched behavior; change the recovery/start condition to use
shouldFrontOllamaWithProxy() so both paths align: where the code currently
checks !isWsl() before calling ensureOllamaAuthProxy()/probing the proxy (the
block that sets proxyReady), update that condition to
shouldFrontOllamaWithProxy(), ensure proxyReady is set the same way, and keep
the later token wiring logic that calls getOllamaProxyToken(),
ensureOllamaAuthProxy(), and persistAndProbeOllamaProxy() unchanged so both
recovery and token wiring use the same reachability gate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e9d229bf-d4dc-48a5-a10b-c9de0647da19
📒 Files selected for processing (3)
docs/inference/use-local-inference.mdsrc/lib/onboard.tssrc/lib/onboard/local-inference-topology.ts
✅ Files skipped from review due to trivial changes (1)
- docs/inference/use-local-inference.md
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
|
🌿 Preview your docs: https://nvidia-preview-pr-3732.docs.buildwithfern.com/nemoclaw |
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
Selective E2E Results — ✅ All requested jobs passedRun: 26074876520
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/onboard.test.ts (1)
925-927: ⚡ Quick winAdd an explicit assertion that the proxy token never appears in argv.
This test already validates credential-name wiring, but it should also fail if the literal token is accidentally embedded in the command string.
Suggested test hardening
assert.match(providerCommand.command, /--credential NEMOCLAW_OLLAMA_PROXY_TOKEN/); assert.equal(providerCommand.env?.NEMOCLAW_OLLAMA_PROXY_TOKEN, "proxy-token"); + assert.doesNotMatch(providerCommand.command, /proxy-token/);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard.test.ts` around lines 925 - 927, Add an assertion ensuring the literal proxy token never appears in the generated command argv; specifically, after the existing checks on providerCommand.command and providerCommand.env (references: providerCommand.command and providerCommand.env), add a negative assertion that providerCommand.command (or providerCommand.argv if present) does not include the string "proxy-token" (e.g., assert.doesNotMatch or assert.ok(!...includes("proxy-token"))), so the test fails if the secret token is accidentally embedded into the command line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/onboard.test.ts`:
- Around line 925-927: Add an assertion ensuring the literal proxy token never
appears in the generated command argv; specifically, after the existing checks
on providerCommand.command and providerCommand.env (references:
providerCommand.command and providerCommand.env), add a negative assertion that
providerCommand.command (or providerCommand.argv if present) does not include
the string "proxy-token" (e.g., assert.doesNotMatch or
assert.ok(!...includes("proxy-token"))), so the test fails if the secret token
is accidentally embedded into the command line.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1c29666b-ed56-4e04-8663-a2947b1e0070
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard.ts
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-double-onboard-e2e | ⏭️ skipped |
| gpu-e2e | ⏭️ skipped |
ericksoa
left a comment
There was a problem hiding this comment.
Approved after adversarial review and follow-up fixes. The WSL native Docker Ollama proxy recovery path now uses the same topology predicate as token wiring, and the regression test covers proxy recovery without leaking the proxy token into argv. CI and CodeRabbit are green on head 04bd603.
## Summary Refreshes the NemoClaw docs for v0.0.46 by updating version metadata, release notes, and generated user skills. The refresh also keeps public docs aligned with the docs skip list by removing non-public experimental references from the generated output. ## Related Issue None. ## Changes - #3744 and #3824 -> `docs/about/release-notes.mdx`: Added Windows bootstrap and WSL express install coverage for v0.0.46. - #3392 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/commands.mdx`, `docs/reference/network-policies.mdx`, and policy examples: Refreshed public messaging channel docs around WhatsApp and matching policy presets. - #3742, #3767, #3732, #3786, #3777, and #3808 -> `docs/about/release-notes.mdx`: Added release-note coverage for Hermes managed tools, Bedrock Runtime endpoint detection, WSL Ollama proxying, Model Router Python fallback, plugin command registration, and tool-catalog latency improvements. - #3124 -> `docs/about/release-notes.mdx`: Added release-note coverage for hosted uninstall flag guidance. - Generated `nemoclaw-user-*` skills from the updated MDX docs for the v0.0.46 release. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verification notes: - Commit hooks passed, including markdownlint, gitleaks, docs-to-skills verification, env-var docs, and skills YAML checks. - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` passed. - `bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-links --local-only --with-skills` passed. - `git diff --check` passed. - `make docs` was attempted but blocked before MDX validation because `npx` received HTTP 403 fetching `fern-api` from npm. --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Released v0.0.46: improved Windows setup, WhatsApp messaging support, Hermes sandbox/tool routing, Anthropic endpoint compatibility, Ollama proxy routing, model-router fallback, OpenClaw plugin/backup compatibility, sandbox build tooling fixes, and updated uninstall flag behavior. * **Documentation** * Removed WeChat from messaging flows and presets across guides and CLI docs; clarified onboarding and channel setup for WhatsApp. Clarified runtime mutability and filesystem (Landlock) behavior — some changes require sandbox rebuilds; prefer host-side commands for durable config. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3911?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Closes a long-standing assumption that WSL2 sandboxes always reach the host's loopback directly. That's only true under Docker Desktop; with native dockerd installed inside the WSL distro the sandbox container goes through the Docker bridge gateway (typically
172.17.0.1), which can't see Ollama bound to127.0.0.1. Front Ollama with the existing auth proxy in that topology, and harden the systemd loopback override so legacy0.0.0.0:11434drop-ins are removed rather than just shadowed by an appended127.0.0.1:11434line.Related Issue
Fixes #3695
Fixes #3342
Changes
containerCanReachHostLoopback(runtime, opts)predicate insrc/lib/platform.ts. Returnstrueonly when WSL detection is positive anddocker inforeportsdocker-desktop(the only topology where container-to-host loopback works without a proxy).src/lib/onboard.ts: replace the fourisWsl()gates in the plain-Ollama andinstall-ollamabranches withcontainerCanReachHostLoopback(getContainerRuntime()). The manualOLLAMA_HOST=127.0.0.1:<port>fallback launch is now unconditional (loopback is the right policy everywhere); the auth proxy starts in every topology except Docker Desktop on WSL.src/lib/onboard/ollama-systemd.ts:ensureOllamaLoopbackSystemdOverride. The function now also runs on WSL systemd-backed Ollama (Win11 22H2+ enables systemd in WSL by default).mergeOllamaLoopbackSystemdOverrideto strip any existing non-commentOLLAMA_HOST=line in[Service]before re-appending the loopback line. systemd's "last value wins" rule for repeatedEnvironment=would usually pick the appended override, but legacy0.0.0.0drop-ins from older NemoClaw versions occasionally survived the restart on some hosts. Removing the stale line keeps the policy unambiguous and addresses [DGX Spark][Onboard] Re-onboard does not repair legacy 0.0.0.0 Ollama systemd override to loopback-only #3342 even on the resume / non-interactive fast-paths that skipsetupNim.test/platform.test.tscovers all four(isWsl × runtime)cells plus non-Docker runtimes.test/onboard-selection.test.tsfixture had aString.rawescape bug (join("\\n")produced literal backslash-n instead of newlines) that silently masked the merge behaviour for the past few releases. Fixture corrected; the assertion now requires the legacyOLLAMA_HOST=0.0.0.0:11434to be removed, with adjacent settings (OLLAMA_MODELS,HTTPS_PROXY) preserved.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Tests