fix(onboard): keep managed Ollama loopback-only#3039
Conversation
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughNon-WSL Ollama startup/install flows now bind Ollama to loopback (127.0.0.1:) instead of 0.0.0.0. Linux systemd installs apply a drop‑in override to force loopback and abort onboarding on failure. Docs, inline comments, tests, and e2e script comments updated; WSL behavior unchanged. ChangesOllama Loopback-Only Binding Migration
Sequence DiagramsequenceDiagram
actor Container as Docker Container
participant Proxy as Authenticated Reverse Proxy
participant Ollama as Ollama (127.0.0.1:11434)
Container->>Proxy: HTTP request with Bearer token
Proxy->>Proxy: Validate token
Proxy->>Ollama: Forward request to loopback
Ollama->>Proxy: Response
Proxy->>Container: Return response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
6194-6206:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail closed if the systemd loopback override cannot be applied.
This path currently ignores override/restart failures, then treats plain reachability as success. That can leave a previously broad bind in place while onboarding continues.
🔧 Suggested guard
- runShell( + const overrideResult = runShell( `sudo install -D -m 0644 ${shellQuote(tmpDropIn)} ${shellQuote("/etc/systemd/system/ollama.service.d/override.conf")} && sudo systemctl daemon-reload && sudo systemctl restart ollama`, { ignoreError: true }, ); cleanupTempDir(tmpDropIn, "nemoclaw-ollama-override"); + if (overrideResult.status !== 0) { + console.error(" Failed to apply Ollama systemd loopback override."); + console.error(" Refusing to continue with a potentially non-loopback Ollama bind."); + process.exit(1); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 6194 - 6206, The code currently calls runShell(..., { ignoreError: true }) to install the systemd override and immediately continues; change this to fail closed by checking the result of runShell (or removing ignoreError) and aborting onboarding if the install/restart fails: capture the runShell return/error when installing the override in the same block where cleanupTempDir(tmpDropIn, "nemoclaw-ollama-override") and the subsequent probe loop run, and if runShell indicates failure then log/throw/exit (so the loop that calls findReachableOllamaHost() is not treated as success when the override failed). Ensure the guard references the same symbols: runShell, cleanupTempDir, and findReachableOllamaHost, and perform a proper failure path (throw or process.exit) before entering the retry probe loop.
🧹 Nitpick comments (1)
docs/inference/use-local-inference.md (1)
79-79: ⚡ Quick winUse active voice for this sentence.
Line 79 is passive (“is never exposed”). Rephrase to active voice to match docs style requirements (for example, “On that native path, NemoClaw never exposes Ollama without authentication.”).
As per coding guidelines, "Active voice required. Flag passive constructions."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/inference/use-local-inference.md` at line 79, Replace the passive sentence "On that native path, Ollama itself is never exposed without authentication." with an active-voice construction that names the agent (e.g., "On that native path, NemoClaw never exposes Ollama without authentication.") so it conforms to the active-voice guideline; update the sentence wording wherever the same phrasing appears in the document to use a similar active formulation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 6194-6206: The code currently calls runShell(..., { ignoreError:
true }) to install the systemd override and immediately continues; change this
to fail closed by checking the result of runShell (or removing ignoreError) and
aborting onboarding if the install/restart fails: capture the runShell
return/error when installing the override in the same block where
cleanupTempDir(tmpDropIn, "nemoclaw-ollama-override") and the subsequent probe
loop run, and if runShell indicates failure then log/throw/exit (so the loop
that calls findReachableOllamaHost() is not treated as success when the override
failed). Ensure the guard references the same symbols: runShell, cleanupTempDir,
and findReachableOllamaHost, and perform a proper failure path (throw or
process.exit) before entering the retry probe loop.
---
Nitpick comments:
In `@docs/inference/use-local-inference.md`:
- Line 79: Replace the passive sentence "On that native path, Ollama itself is
never exposed without authentication." with an active-voice construction that
names the agent (e.g., "On that native path, NemoClaw never exposes Ollama
without authentication.") so it conforms to the active-voice guideline; update
the sentence wording wherever the same phrasing appears in the document to use a
similar active formulation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 68c2f02e-62e8-47ef-a5d5-ef7d489362a1
📒 Files selected for processing (6)
docs/inference/use-local-inference.mdsrc/lib/local-inference.tssrc/lib/onboard.tstest/e2e/test-gpu-double-onboard.shtest/e2e/test-gpu-e2e.shtest/onboard-selection.test.ts
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/onboard-selection.test.ts`:
- Line 3879: The test is brittle because credentials.prompt is hard-coded to
return "7" (menu-index dependent); change the stub for credentials.prompt in the
"systemd failure" test to compute and return the correct menu index at runtime
instead of a literal "7" — for example, inspect the provider choice list used by
the onboarding flow (or the array that builds the menu) to find the index of the
desired provider (e.g., the "systemd" or known provider display name) and return
that index as a string; update the stub in the test (the credentials.prompt
replacement) to use that computed index so the test selects the intended
provider regardless of menu ordering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c94add35-4a01-4551-8632-132cf2fe13fb
📒 Files selected for processing (3)
docs/inference/use-local-inference.mdsrc/lib/onboard.tstest/onboard-selection.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/inference/use-local-inference.md
- src/lib/onboard.ts
Selective E2E Results — ✅ All requested jobs passedRun: 25381402324
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
6276-6293:⚠️ Potential issue | 🟠 Major | ⚡ Quick winVerify the listen address after the systemd restart.
This only checks that Ollama is reachable again. A daemon still bound to
0.0.0.0:${OLLAMA_PORT}will also satisfyfindReachableOllamaHost(), so onboarding can continue while raw Ollama remains externally exposed. The restart path should inspect the actual listener and fail unless it is loopback-only.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 6276 - 6293, After restarting systemd we only probe reachability via findReachableOllamaHost(), which will be true even if Ollama is bound to 0.0.0.0; instead, after the restart loop (and before continuing) call a check that inspects the actual listener address/port (e.g. by parsing ss/netstat output or using a local connect+getsockname) to ensure the process is only bound to loopback: use the same restart path around runShell()/overrideResult/tmpDropIn and, after findReachableOllamaHost() succeeds, verify the listener is 127.0.0.1:OLLAMA_PORT (fail and process.exit(1) if not) so onboarding refuses to continue with a non-loopback bind.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Line 6277: The current install command writes tmpDropIn to
/etc/systemd/system/ollama.service.d/override.conf which can overwrite
user-managed drop-ins; change it to install to a NemoClaw-specific filename
(e.g. /etc/systemd/system/ollama.service.d/nemoclaw.conf) instead of
override.conf, update the string that builds the shell command where tmpDropIn
is used, and add a one-time cleanup step that removes the old NemoClaw-managed
override (if present) rather than unconditionally clobbering override.conf;
ensure you still run systemctl daemon-reload and systemctl restart ollama after
the install.
---
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 6276-6293: After restarting systemd we only probe reachability via
findReachableOllamaHost(), which will be true even if Ollama is bound to
0.0.0.0; instead, after the restart loop (and before continuing) call a check
that inspects the actual listener address/port (e.g. by parsing ss/netstat
output or using a local connect+getsockname) to ensure the process is only bound
to loopback: use the same restart path around
runShell()/overrideResult/tmpDropIn and, after findReachableOllamaHost()
succeeds, verify the listener is 127.0.0.1:OLLAMA_PORT (fail and process.exit(1)
if not) so onboarding refuses to continue with a non-loopback bind.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eaf3932d-24dc-4ffb-a182-daeaff9dec87
📒 Files selected for processing (2)
docs/inference/use-local-inference.mdsrc/lib/onboard.ts
| fs.writeFileSync(tmpDropIn, dropInBody, { mode: 0o644 }); | ||
| runShell( | ||
| const overrideResult = runShell( | ||
| `sudo install -D -m 0644 ${shellQuote(tmpDropIn)} ${shellQuote("/etc/systemd/system/ollama.service.d/override.conf")} && sudo systemctl daemon-reload && sudo systemctl restart ollama`, |
There was a problem hiding this comment.
Avoid clobbering an existing override.conf.
Writing directly to /etc/systemd/system/ollama.service.d/override.conf will replace any user-managed drop-in already living at that path. That can silently discard unrelated Ollama service customizations. A NemoClaw-specific drop-in filename is safer, with legacy cleanup limited to the old NemoClaw-managed override.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/onboard.ts` at line 6277, The current install command writes
tmpDropIn to /etc/systemd/system/ollama.service.d/override.conf which can
overwrite user-managed drop-ins; change it to install to a NemoClaw-specific
filename (e.g. /etc/systemd/system/ollama.service.d/nemoclaw.conf) instead of
override.conf, update the string that builds the shell command where tmpDropIn
is used, and add a one-time cleanup step that removes the old NemoClaw-managed
override (if present) rather than unconditionally clobbering override.conf;
ensure you still run systemctl daemon-reload and systemctl restart ollama after
the install.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/onboard-selection.test.ts (1)
850-855:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid hard-coded provider indices in the new Ollama tests.
Both stubs still assume the target provider stays at option
7. That makes the tests menu-order dependent again; adding or reordering providers can select the wrong path and hide regressions. Please derive the choice from the rendered menu, like thefindInstallOllamaChoice()pattern used below.Also applies to: 3730-3731, 3767-3772
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 850 - 855, The test currently hard-codes provider menu selections by returning "7" from the credentials.prompt stub (credentials.prompt), which is brittle; replace those hard-coded indices with logic that inspects the rendered menu text collected in messages and computes the correct choice string like the existing findInstallOllamaChoice() helper does. Update the stub(s) that push messages and shift answers to parse messages (or call findInstallOllamaChoice(messages)) to derive the numeric choice for the target provider and return that instead of "7"; apply the same change to the other occurrences referenced (around the findInstallOllamaChoice usage locations) so tests select the provider dynamically from the menu output.
🧹 Nitpick comments (2)
test/onboard-selection.test.ts (2)
838-845: ⚡ Quick winAssert the auth-proxy launch too, not just the raw Ollama bind.
Right now this test only inspects
shellCommands, so it still passes if the proxy is never started. Capturingchild_process.spawnarguments and assertingollama-auth-proxyis launched after the loopbackollama servewould make the test match its name and cover the container-access guarantee from this PR.Also applies to: 867-874, 911-922
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 838 - 845, The test currently stubs child_process.spawn but never asserts it was called with the auth-proxy, so update the test to capture spawn invocations (wrap/spy child_process.spawn) and assert that after the loopback "ollama serve" invocation the code launches "ollama-auth-proxy" with the expected arguments (e.g., verify the spawn call sequence and that one call's command/first arg contains "ollama-auth-proxy"); also update the other test stubs that override child_process.spawn/spawnSync in the same file (the other two test blocks) to use the same capture-and-assert pattern so each test verifies the proxy was actually started in addition to inspecting shellCommands.
3867-3940: ⚡ Quick winAdd a happy-path systemd override test alongside this fail-closed case.
This covers the refusal path, but it will not catch regressions where the systemd branch “succeeds” with the wrong bind or incorrectly falls back to manual
ollama serve. A companion success-path test that records the generated override/restart flow and asserts loopback-only configuration would close the remaining coverage gap here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 3867 - 3940, Add a companion happy-path test to the existing "fails closed when the Linux systemd loopback override cannot be applied" case: create a new test that uses the same script scaffolding (reusing setupNim via require(${onboardPath}) and the runner/credentials/platform stubs) but change runner.runShell/runner.runCapture behavior to simulate a successful systemd override (e.g., return { status: 0 } for the "sudo install -D -m 0644" and any restart commands, and ensure systemctl shows the override applied), run setupNim("systemd-success-test", null) and assert process exit status is 0 and that stdout/stderr contains the expected messages indicating the override file was written, systemctl daemon-reload/restart were invoked, and that the configured bind is loopback-only (match strings produced by setupNim that indicate 127.0.0.1 or "loopback" in the override). Use the same helper names (setupNim, runner.runShell, runner.runCapture, platform.isWsl, credentials.prompt) so the test mirrors the fail case but verifies the successful branch and generated override content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/onboard-selection.test.ts`:
- Around line 850-855: The test currently hard-codes provider menu selections by
returning "7" from the credentials.prompt stub (credentials.prompt), which is
brittle; replace those hard-coded indices with logic that inspects the rendered
menu text collected in messages and computes the correct choice string like the
existing findInstallOllamaChoice() helper does. Update the stub(s) that push
messages and shift answers to parse messages (or call
findInstallOllamaChoice(messages)) to derive the numeric choice for the target
provider and return that instead of "7"; apply the same change to the other
occurrences referenced (around the findInstallOllamaChoice usage locations) so
tests select the provider dynamically from the menu output.
---
Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 838-845: The test currently stubs child_process.spawn but never
asserts it was called with the auth-proxy, so update the test to capture spawn
invocations (wrap/spy child_process.spawn) and assert that after the loopback
"ollama serve" invocation the code launches "ollama-auth-proxy" with the
expected arguments (e.g., verify the spawn call sequence and that one call's
command/first arg contains "ollama-auth-proxy"); also update the other test
stubs that override child_process.spawn/spawnSync in the same file (the other
two test blocks) to use the same capture-and-assert pattern so each test
verifies the proxy was actually started in addition to inspecting shellCommands.
- Around line 3867-3940: Add a companion happy-path test to the existing "fails
closed when the Linux systemd loopback override cannot be applied" case: create
a new test that uses the same script scaffolding (reusing setupNim via
require(${onboardPath}) and the runner/credentials/platform stubs) but change
runner.runShell/runner.runCapture behavior to simulate a successful systemd
override (e.g., return { status: 0 } for the "sudo install -D -m 0644" and any
restart commands, and ensure systemctl shows the override applied), run
setupNim("systemd-success-test", null) and assert process exit status is 0 and
that stdout/stderr contains the expected messages indicating the override file
was written, systemctl daemon-reload/restart were invoked, and that the
configured bind is loopback-only (match strings produced by setupNim that
indicate 127.0.0.1 or "loopback" in the override). Use the same helper names
(setupNim, runner.runShell, runner.runCapture, platform.isWsl,
credentials.prompt) so the test mirrors the fail case but verifies the
successful branch and generated override content.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 24d751c3-7f53-4550-a01f-2662473a17c7
📒 Files selected for processing (1)
test/onboard-selection.test.ts
Summary
Validation
Signed-off-by: Aaron Erickson aerickson@nvidia.com
Summary by CodeRabbit
Bug Fixes
Documentation
Tests