fix(onboard): repair existing Ollama loopback override by yimoj · Pull Request #3384 · NVIDIA/NemoClaw

yimoj · 2026-05-12T04:43:04Z

Summary

Repair Local Ollama onboarding so existing native Linux systemd installs get the same loopback override as fresh installs. This keeps raw Ollama bound to localhost and repairs stale or missing OLLAMA_HOST drop-ins before the auth proxy is used.

Related Issue

Fixes #3337

Changes

Extracted Ollama systemd loopback override handling into a shared onboarding helper.
Applied the helper to both existing/running Ollama and install-Ollama provider paths.
Preserved existing systemd override settings while appending the NemoClaw loopback OLLAMA_HOST repair.
Added targeted onboarding tests for existing systemd Ollama, already-loopback repair, preserved override settings, and fail-closed override failure.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

New Features
- Improved Linux Ollama setup: enforce and validate loopback binding via a systemd drop-in when applicable, prefer safe service restart, and only fall back to manual start when necessary; explicit ready/failed outcomes.
Tests
- Added Linux onboarding tests covering override installation, preservation of existing settings, restart/reprobe behavior, and failure scenarios.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

yimoj · 2026-05-12T04:47:09Z

@coderabbitai review

coderabbitai · 2026-05-12T04:47:25Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-12T04:48:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a Linux-only systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf with idempotent merge logic, a helper to apply/restart and re-probe loopback reachability, integrates this into provider-selection and install flows, and adds Vitest coverage for apply/repair/idempotency and failure cases.

Changes

Ollama Systemd Loopback Override

Layer / File(s)	Summary
Systemd Loopback Override Helper Functions `src/lib/onboard.ts`	Adds `OLLAMA_SYSTEMD_OVERRIDE_PATH` and implements `mergeOllamaLoopbackSystemdOverride()` and `ensureOllamaLoopbackSystemdOverride()` to idempotently write/merge the `[Service]` `Environment="OLLAMA_HOST=127.0.0.1:${OLLAMA_PORT}"` line, install the drop-in, restart the `ollama` systemd unit, clear cached reachability, and re-probe loopback readiness (returns `not-applicable`, `ready`, or `failed`).
Provider Selection Flow Integration `src/lib/onboard.ts`	Calls `ensureOllamaLoopbackSystemdOverride()` during Ollama provider selection to recompute `ollamaReady` from post-override reachability; exits onboarding if override application fails; manual `ollama serve` start is attempted only if readiness remains unsatisfied.
Installation Path Integration `src/lib/onboard.ts`	Replaces prior inline systemd detection/override code in the Linux `install-ollama` path with `ensureOllamaLoopbackSystemdOverride()` (after a short sleep), exiting on `failed`; falls back to manual `ollama serve` only when override is `not-applicable` and loopback probe still fails.
Test Coverage `test/onboard-selection.test.ts`	Adds `writeOllamaToolCallingCurl()` curl mock helper and Vitest tests validating: drop-in install and `systemctl` restart, repair while preserving unrelated override lines, already-loopback idempotency without manual daemon start and re-probe, and two fail-closed cases (restart unreachable, drop-in install failure). Also updates one failure mock expectation to `install -D -m 0644`.

Sequence Diagram(s)

sequenceDiagram
  participant OnboardCLI
  participant FileSystem
  participant Systemd
  participant OllamaService
  participant NetworkProbe
  OnboardCLI->>FileSystem: ensure/read/merge override -> /etc/systemd/system/ollama.service.d/override.conf
  OnboardCLI->>Systemd: systemctl daemon-reload
  OnboardCLI->>Systemd: systemctl --no-block restart ollama
  Systemd->>OllamaService: restart request
  OnboardCLI->>OllamaService: clear cached host reachability
  OnboardCLI->>NetworkProbe: probe 127.0.0.1:${OLLAMA_PORT}
  NetworkProbe->>OnboardCLI: reachability result (ready/failed)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#3420: Updates to Linux install-ollama flow and systemd/override expectations that overlap with this PR's install/override refactor.

Suggested reviewers

ericksoa

Poem

🐰 I nudged the service to stay near,
wrote a drop-in, soft and clear.
Systemd wakes, the socket sings,
tests approve the humble things.
hop, apply, the loopback cheers.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(onboard): repair existing Ollama loopback override' accurately describes the main change—fixing the onboarding logic to repair/apply the Ollama systemd loopback override for existing installations.
Linked Issues check	✅ Passed	The PR implements all requirements from issue `#3337`: adds systemd drop-in override creation/repair at the specified path with correct OLLAMA_HOST value, applies logic to both fresh and existing installs, includes visible transcript messaging, and adds comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to Ollama loopback override handling in onboarding (src/lib/onboard.ts) and corresponding test updates (test/onboard-selection.test.ts). No extraneous changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

test/onboard-selection.test.ts (2)

1256-1271: ⚡ Quick win

Assert the visible repair transcript in the already-loopback path.

This is the silent-skip regression called out in #3337, but the test only checks side effects. Please also assert that this path prints Configuring Ollama systemd loopback override... so the required user-visible confirmation cannot regress.

Suggested assertion

     assert.ok(
       payload.shellCommands.some((command: string) =>
         command.includes("/etc/systemd/system/ollama.service.d/override.conf"),
       ),
       "already-loopback systemd Ollama still needs the persistent drop-in",
     );
+    assert.ok(
+      payload.lines.some((line: string) =>
+        line.includes("Configuring Ollama systemd loopback override"),
+      ),
+      "already-loopback repair should emit the visible loopback-override transcript",
+    );
     assert.ok(
       !payload.shellCommands.some((command: string) =>
         command.includes("OLLAMA_HOST=127.0.0.1:11434 ollama serve"),
       ),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1256 - 1271, Add an assertion to
verify the user-visible repair transcript for the already-loopback path: after
the existing assertions that inspect payload.shellCommands and
payload.tagsProbeCount in onboard-selection.test.ts, assert that
payload.transcript (or the equivalent transcript/log field used in this test)
includes the string "Configuring Ollama systemd loopback override..." so the
visible confirmation cannot regress; use an assert.ok check with a clear failure
message referencing the already-loopback path.

1196-1216: ⚡ Quick win

Make the re-probe assertion order-aware.

tagsProbeCount >= 2 does not prove a /api/tags probe happened after the systemd restart. Two early probes would still pass. Record an event timeline and assert that at least one tags probe occurs after the restart command.

Suggested tightening

-let tagsProbeCount = 0;
+const events = [];
 const shellCommands = [];
 
 runner.runCapture = (command) => {
   const cmd = Array.isArray(command) ? command.join(" ") : command;
   if (cmd.includes("command -v ollama")) return "/usr/bin/ollama";
   if (cmd.includes("127.0.0.1:11434/api/tags")) {
-    tagsProbeCount += 1;
+    events.push("tags");
     return JSON.stringify({ models: [{ name: "qwen3:8b" }] });
   }
@@
 runner.runShell = (command) => {
   shellCommands.push(command);
+  if (command.includes("sudo systemctl restart ollama")) events.push("restart");
   return { status: 0 };
 };
@@
-    originalLog(JSON.stringify({ result, lines, shellCommands, tagsProbeCount }));
+    originalLog(JSON.stringify({ result, lines, shellCommands, events }));
@@
-    assert.ok(
-      payload.tagsProbeCount >= 2,
-      "should re-probe after the systemd restart instead of trusting a stale loopback cache",
-    );
+    const restartIndex = payload.events.indexOf("restart");
+    assert.ok(restartIndex >= 0, "expected a systemd restart");
+    assert.ok(
+      payload.events.slice(restartIndex + 1).includes("tags"),
+      "should re-probe after the systemd restart instead of trusting a stale loopback cache",
+    );

Also applies to: 1268-1271

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1196 - 1216, The current test
only counts tags probes via tagsProbeCount, which can’t prove a probe happened
after the restart; change the stubbing in runner.runCapture and runner.runShell
to record ordered events (e.g., push timestamped strings into shellCommands or a
new eventTimeline array inside runner.runCapture when the "/api/tags" branch
runs and inside runner.runShell when the systemd restart command is issued),
keep tagsProbeCount if you want but append entries like "tags-probe" and
"systemctl-restart" to the timeline, then replace the existing assertion
(tagsProbeCount >= 2) with an order-aware assertion that verifies at least one
"tags-probe" event appears after the "systemctl-restart" event in the recorded
timeline; update the analogous checks at 1268-1271 the same way.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8204-8208: The code currently allows falling back to the unmanaged
"ollama serve" path after ensureOllamaLoopbackSystemdOverride() has modified a
systemd unit by setting ollamaReady based on findReachableOllamaHost(); instead,
if ensureOllamaLoopbackSystemdOverride() returns true and
findReachableOllamaHost() returns null (probe failed), abort onboarding
immediately (e.g., return an error or throw) rather than continuing to the
background daemon path; update the block around ollamaRunning/ollamaReady (and
the analogous block that uses the same pattern elsewhere) so that a touched
systemd override + failed post-restart probe triggers an abort/early return
instead of falling back to unmanaged start.
- Around line 7242-7245: The privileged shell call using runShell with
interactive sudo can block in non-interactive/CI; change the command to use
non-interactive sudo (add sudo -n) and make systemctl restart non-blocking (add
--no-block), and pass a short timeout to runShell (e.g., 20–30s) so it fails
fast instead of hanging; update handling around overrideResult to treat timeout
or sudo-failure as a recoverable error (log and continue or surface a clear
error) and reference overrideResult, runShell, tmpDropIn, and
OLLAMA_SYSTEMD_OVERRIDE_PATH when making the changes.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1256-1271: Add an assertion to verify the user-visible repair
transcript for the already-loopback path: after the existing assertions that
inspect payload.shellCommands and payload.tagsProbeCount in
onboard-selection.test.ts, assert that payload.transcript (or the equivalent
transcript/log field used in this test) includes the string "Configuring Ollama
systemd loopback override..." so the visible confirmation cannot regress; use an
assert.ok check with a clear failure message referencing the already-loopback
path.
- Around line 1196-1216: The current test only counts tags probes via
tagsProbeCount, which can’t prove a probe happened after the restart; change the
stubbing in runner.runCapture and runner.runShell to record ordered events
(e.g., push timestamped strings into shellCommands or a new eventTimeline array
inside runner.runCapture when the "/api/tags" branch runs and inside
runner.runShell when the systemd restart command is issued), keep tagsProbeCount
if you want but append entries like "tags-probe" and "systemctl-restart" to the
timeline, then replace the existing assertion (tagsProbeCount >= 2) with an
order-aware assertion that verifies at least one "tags-probe" event appears
after the "systemctl-restart" event in the recorded timeline; update the
analogous checks at 1268-1271 the same way.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec1c1168-228f-4f8f-928d-a26b158f23ef

📥 Commits

Reviewing files that changed from the base of the PR and between 4b47ab4 and e41c559.

📒 Files selected for processing (2)

src/lib/onboard.ts
test/onboard-selection.test.ts

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/onboard-selection.test.ts (1)

1380-1413: ⚡ Quick win

Also assert that override-apply failure never falls back to ollama serve.

This branch is meant to fail closed, but the test currently only checks stderr text. A regression that starts an unmanaged daemon before throwing would still pass here. Mirroring the manual-start sentinel used in the restart-failure test would close that gap.

Suggested fix

 runner.runShell = (command) => {
+  if (command.includes("ollama serve")) console.error("manual-start");
   if (command.includes("install -D -m 0644")) return { status: 1 };
   return { status: 0 };
 };
@@
     assert.equal(result.status, 1);
     assert.match(result.stderr, /Failed to apply Ollama systemd loopback override/);
     assert.match(result.stderr, /Refusing to continue/);
+    assert.doesNotMatch(result.stderr, /manual-start/);
   });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1380 - 1413, The test must also
assert that a failed override-apply never falls back to spawning an unmanaged
Ollama daemon; after running the script that calls setupNim, add an assertion
(mirroring the restart-failure test's "manual-start" sentinel) that the
manual-start sentinel file was NOT created in the test's HOME/tmpDir area so we
guarantee no unmanaged "ollama serve" was started; locate this check alongside
the existing assertions on result and stderr (referencing variables result and
tmpDir and the setupNim invocation).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/onboard-selection.test.ts`:
- Around line 1119-1120: The install-command regex only matches when the command
begins with "sudo"; update the matcher used to set installedBody so it also
accepts bare installs by making the "sudo ( -n)?" portion optional (e.g., allow
an optional "sudo" + optional "-n" and whitespace before "install"), then keep
extracting the first capture group into installedBody as before (adjust the
regex used where `const match = command.match(...)` and `if (match)
installedBody = fs.readFileSync(match[1], "utf8");`).

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1380-1413: The test must also assert that a failed override-apply
never falls back to spawning an unmanaged Ollama daemon; after running the
script that calls setupNim, add an assertion (mirroring the restart-failure
test's "manual-start" sentinel) that the manual-start sentinel file was NOT
created in the test's HOME/tmpDir area so we guarantee no unmanaged "ollama
serve" was started; locate this check alongside the existing assertions on
result and stderr (referencing variables result and tmpDir and the setupNim
invocation).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42efda51-7359-41bc-923d-350182bd9f33

📥 Commits

Reviewing files that changed from the base of the PR and between e41c559 and 49f7f63.

📒 Files selected for processing (2)

src/lib/onboard.ts
test/onboard-selection.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/onboard.ts

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

prekshivyas

LGTM.

Verifies the intended PR #3039 contract: existing-running-Ollama path now applies the systemd loopback drop-in, with a real merge step that preserves user-set Environment= lines and relies on systemd's last-wins for the OLLAMA_HOST override. The ActiveEnterTimestampMonotonic + MainPID poll correctly distinguishes a real restart from "still up". All error paths fail closed instead of silently spawning a duplicate ollama serve.

Test coverage is strong — 5 cases covering merge, idempotent reapply, restart-failure, and install-failure. Non-blocking suggestions for a follow-up if useful:

Tight timeout headroom in runShell({ timeout: 45_000 }) vs ~30s poll + daemon-reload — consider 60_000.
The "Failed to inspect existing Ollama systemd override" error could name sudo -n non-interactive failure as a likely cause.
Optional test assertion to lock in the single-desired-line invariant: assert.equal((installedBody.match(/OLLAMA_HOST=127\.0\.0\.1:11434/g) || []).length, 1).

None block the merge.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6650-6748: Extract the Ollama systemd helper functions into a new
module: move ensureOllamaLoopbackSystemdOverride and
mergeOllamaLoopbackSystemdOverride (and any helper constants they rely on such
as OLLAMA_PORT, OLLAMA_SYSTEMD_OVERRIDE_PATH and utility calls like
secureTempFile, runShell, runCapture, cleanupTempDir, resetOllamaHostCache,
findReachableOllamaHost, sleep, isWsl, isNonInteractive, shellQuote) into a new
file (e.g., an onboarding helper module) preserving exact behavior, export
ensureOllamaLoopbackSystemdOverride from that module, and replace the
implementation here with a simple import and call site that uses the same
symbol; ensure all referenced utilities are imported into the new module and
that exported types/values match so callers (e.g., calls to
ensureOllamaLoopbackSystemdOverride) compile and behavior remains identical.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1cd99aeb-e0ab-46c3-b0ba-20fdc4930596

📥 Commits

Reviewing files that changed from the base of the PR and between c5ed492 and 8defe06.

📒 Files selected for processing (2)

src/lib/onboard.ts
test/onboard-selection.test.ts

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

…ack-existing Signed-off-by: Aaron Erickson <aerickson@nvidia.com> # Conflicts: # test/onboard-selection.test.ts

ericksoa

Approved after follow-up hardening: helper extracted out of onboard entrypoint, CodeRabbit/budget issue resolved, local build and full onboard-selection suite passed, and relevant Ollama proxy E2E passed.

fix(onboard): repair existing Ollama loopback override

e41c559

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

Comment thread src/lib/onboard.ts

fix(onboard): harden Ollama systemd restart handling

49f7f63

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread test/onboard-selection.test.ts Outdated

test(onboard): harden Ollama systemd failure coverage

169d30b

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

yimoj added the v0.0.40 label May 12, 2026

cv assigned prekshivyas May 12, 2026

Merge branch 'main' into fix/3337-ollama-loopback-existing

c5ed492

prekshivyas approved these changes May 12, 2026

View reviewed changes

wscurran added fix provider: ollama Ollama local model provider behavior labels May 12, 2026

Merge main into fix/3337-ollama-loopback-existing

8defe06

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

ericksoa added 2 commits May 12, 2026 21:44

fix(onboard): harden Ollama systemd follow-up

44359fe

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

Merge remote-tracking branch 'origin/main' into fix/3337-ollama-loopb…

34917c5

…ack-existing Signed-off-by: Aaron Erickson <aerickson@nvidia.com> # Conflicts: # test/onboard-selection.test.ts

ericksoa approved these changes May 13, 2026

View reviewed changes

ericksoa merged commit 1b7357b into NVIDIA:main May 13, 2026
21 checks passed

This was referenced May 14, 2026

fix(onboard): allow sudo prompts in express install #3492

Merged

fix(onboard): front Ollama with the auth proxy on WSL native Docker too #3732

Merged

wscurran added area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression and removed Local Models labels Jun 3, 2026

Conversation

yimoj commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

yimoj commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yimoj commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading