Skip to content

fix(onboard): repair existing Ollama loopback override#3384

Merged
ericksoa merged 7 commits into
NVIDIA:mainfrom
yimoj:fix/3337-ollama-loopback-existing
May 13, 2026
Merged

fix(onboard): repair existing Ollama loopback override#3384
ericksoa merged 7 commits into
NVIDIA:mainfrom
yimoj:fix/3337-ollama-loopback-existing

Conversation

@yimoj

@yimoj yimoj commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Repair Local Ollama onboarding so existing native Linux systemd installs get the same loopback override as fresh installs. This keeps raw Ollama bound to localhost and repairs stale or missing OLLAMA_HOST drop-ins before the auth proxy is used.

Related Issue

Fixes #3337

Changes

  • Extracted Ollama systemd loopback override handling into a shared onboarding helper.
  • Applied the helper to both existing/running Ollama and install-Ollama provider paths.
  • Preserved existing systemd override settings while appending the NemoClaw loopback OLLAMA_HOST repair.
  • Added targeted onboarding tests for existing systemd Ollama, already-loopback repair, preserved override settings, and fail-closed override failure.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • New Features

    • Improved Linux Ollama setup: enforce and validate loopback binding via a systemd drop-in when applicable, prefer safe service restart, and only fall back to manual start when necessary; explicit ready/failed outcomes.
  • Tests

    • Added Linux onboarding tests covering override installation, preservation of existing settings, restart/reprobe behavior, and failure scenarios.

Review Change Stack

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj

yimoj commented May 12, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Linux-only systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf with idempotent merge logic, a helper to apply/restart and re-probe loopback reachability, integrates this into provider-selection and install flows, and adds Vitest coverage for apply/repair/idempotency and failure cases.

Changes

Ollama Systemd Loopback Override

Layer / File(s) Summary
Systemd Loopback Override Helper Functions
src/lib/onboard.ts
Adds OLLAMA_SYSTEMD_OVERRIDE_PATH and implements mergeOllamaLoopbackSystemdOverride() and ensureOllamaLoopbackSystemdOverride() to idempotently write/merge the [Service] Environment="OLLAMA_HOST=127.0.0.1:${OLLAMA_PORT}" line, install the drop-in, restart the ollama systemd unit, clear cached reachability, and re-probe loopback readiness (returns not-applicable, ready, or failed).
Provider Selection Flow Integration
src/lib/onboard.ts
Calls ensureOllamaLoopbackSystemdOverride() during Ollama provider selection to recompute ollamaReady from post-override reachability; exits onboarding if override application fails; manual ollama serve start is attempted only if readiness remains unsatisfied.
Installation Path Integration
src/lib/onboard.ts
Replaces prior inline systemd detection/override code in the Linux install-ollama path with ensureOllamaLoopbackSystemdOverride() (after a short sleep), exiting on failed; falls back to manual ollama serve only when override is not-applicable and loopback probe still fails.
Test Coverage
test/onboard-selection.test.ts
Adds writeOllamaToolCallingCurl() curl mock helper and Vitest tests validating: drop-in install and systemctl restart, repair while preserving unrelated override lines, already-loopback idempotency without manual daemon start and re-probe, and two fail-closed cases (restart unreachable, drop-in install failure). Also updates one failure mock expectation to install -D -m 0644.

Sequence Diagram(s)

sequenceDiagram
  participant OnboardCLI
  participant FileSystem
  participant Systemd
  participant OllamaService
  participant NetworkProbe
  OnboardCLI->>FileSystem: ensure/read/merge override -> /etc/systemd/system/ollama.service.d/override.conf
  OnboardCLI->>Systemd: systemctl daemon-reload
  OnboardCLI->>Systemd: systemctl --no-block restart ollama
  Systemd->>OllamaService: restart request
  OnboardCLI->>OllamaService: clear cached host reachability
  OnboardCLI->>NetworkProbe: probe 127.0.0.1:${OLLAMA_PORT}
  NetworkProbe->>OnboardCLI: reachability result (ready/failed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3420: Updates to Linux install-ollama flow and systemd/override expectations that overlap with this PR's install/override refactor.

Suggested reviewers

  • ericksoa

Poem

🐰 I nudged the service to stay near,
wrote a drop-in, soft and clear.
Systemd wakes, the socket sings,
tests approve the humble things.
hop, apply, the loopback cheers.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(onboard): repair existing Ollama loopback override' accurately describes the main change—fixing the onboarding logic to repair/apply the Ollama systemd loopback override for existing installations.
Linked Issues check ✅ Passed The PR implements all requirements from issue #3337: adds systemd drop-in override creation/repair at the specified path with correct OLLAMA_HOST value, applies logic to both fresh and existing installs, includes visible transcript messaging, and adds comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to Ollama loopback override handling in onboarding (src/lib/onboard.ts) and corresponding test updates (test/onboard-selection.test.ts). No extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test/onboard-selection.test.ts (2)

1256-1271: ⚡ Quick win

Assert the visible repair transcript in the already-loopback path.

This is the silent-skip regression called out in #3337, but the test only checks side effects. Please also assert that this path prints Configuring Ollama systemd loopback override... so the required user-visible confirmation cannot regress.

Suggested assertion
     assert.ok(
       payload.shellCommands.some((command: string) =>
         command.includes("/etc/systemd/system/ollama.service.d/override.conf"),
       ),
       "already-loopback systemd Ollama still needs the persistent drop-in",
     );
+    assert.ok(
+      payload.lines.some((line: string) =>
+        line.includes("Configuring Ollama systemd loopback override"),
+      ),
+      "already-loopback repair should emit the visible loopback-override transcript",
+    );
     assert.ok(
       !payload.shellCommands.some((command: string) =>
         command.includes("OLLAMA_HOST=127.0.0.1:11434 ollama serve"),
       ),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1256 - 1271, Add an assertion to
verify the user-visible repair transcript for the already-loopback path: after
the existing assertions that inspect payload.shellCommands and
payload.tagsProbeCount in onboard-selection.test.ts, assert that
payload.transcript (or the equivalent transcript/log field used in this test)
includes the string "Configuring Ollama systemd loopback override..." so the
visible confirmation cannot regress; use an assert.ok check with a clear failure
message referencing the already-loopback path.

1196-1216: ⚡ Quick win

Make the re-probe assertion order-aware.

tagsProbeCount >= 2 does not prove a /api/tags probe happened after the systemd restart. Two early probes would still pass. Record an event timeline and assert that at least one tags probe occurs after the restart command.

Suggested tightening
-let tagsProbeCount = 0;
+const events = [];
 const shellCommands = [];
 
 runner.runCapture = (command) => {
   const cmd = Array.isArray(command) ? command.join(" ") : command;
   if (cmd.includes("command -v ollama")) return "/usr/bin/ollama";
   if (cmd.includes("127.0.0.1:11434/api/tags")) {
-    tagsProbeCount += 1;
+    events.push("tags");
     return JSON.stringify({ models: [{ name: "qwen3:8b" }] });
   }
@@
 runner.runShell = (command) => {
   shellCommands.push(command);
+  if (command.includes("sudo systemctl restart ollama")) events.push("restart");
   return { status: 0 };
 };
@@
-    originalLog(JSON.stringify({ result, lines, shellCommands, tagsProbeCount }));
+    originalLog(JSON.stringify({ result, lines, shellCommands, events }));
@@
-    assert.ok(
-      payload.tagsProbeCount >= 2,
-      "should re-probe after the systemd restart instead of trusting a stale loopback cache",
-    );
+    const restartIndex = payload.events.indexOf("restart");
+    assert.ok(restartIndex >= 0, "expected a systemd restart");
+    assert.ok(
+      payload.events.slice(restartIndex + 1).includes("tags"),
+      "should re-probe after the systemd restart instead of trusting a stale loopback cache",
+    );

Also applies to: 1268-1271

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1196 - 1216, The current test
only counts tags probes via tagsProbeCount, which can’t prove a probe happened
after the restart; change the stubbing in runner.runCapture and runner.runShell
to record ordered events (e.g., push timestamped strings into shellCommands or a
new eventTimeline array inside runner.runCapture when the "/api/tags" branch
runs and inside runner.runShell when the systemd restart command is issued),
keep tagsProbeCount if you want but append entries like "tags-probe" and
"systemctl-restart" to the timeline, then replace the existing assertion
(tagsProbeCount >= 2) with an order-aware assertion that verifies at least one
"tags-probe" event appears after the "systemctl-restart" event in the recorded
timeline; update the analogous checks at 1268-1271 the same way.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8204-8208: The code currently allows falling back to the unmanaged
"ollama serve" path after ensureOllamaLoopbackSystemdOverride() has modified a
systemd unit by setting ollamaReady based on findReachableOllamaHost(); instead,
if ensureOllamaLoopbackSystemdOverride() returns true and
findReachableOllamaHost() returns null (probe failed), abort onboarding
immediately (e.g., return an error or throw) rather than continuing to the
background daemon path; update the block around ollamaRunning/ollamaReady (and
the analogous block that uses the same pattern elsewhere) so that a touched
systemd override + failed post-restart probe triggers an abort/early return
instead of falling back to unmanaged start.
- Around line 7242-7245: The privileged shell call using runShell with
interactive sudo can block in non-interactive/CI; change the command to use
non-interactive sudo (add sudo -n) and make systemctl restart non-blocking (add
--no-block), and pass a short timeout to runShell (e.g., 20–30s) so it fails
fast instead of hanging; update handling around overrideResult to treat timeout
or sudo-failure as a recoverable error (log and continue or surface a clear
error) and reference overrideResult, runShell, tmpDropIn, and
OLLAMA_SYSTEMD_OVERRIDE_PATH when making the changes.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1256-1271: Add an assertion to verify the user-visible repair
transcript for the already-loopback path: after the existing assertions that
inspect payload.shellCommands and payload.tagsProbeCount in
onboard-selection.test.ts, assert that payload.transcript (or the equivalent
transcript/log field used in this test) includes the string "Configuring Ollama
systemd loopback override..." so the visible confirmation cannot regress; use an
assert.ok check with a clear failure message referencing the already-loopback
path.
- Around line 1196-1216: The current test only counts tags probes via
tagsProbeCount, which can’t prove a probe happened after the restart; change the
stubbing in runner.runCapture and runner.runShell to record ordered events
(e.g., push timestamped strings into shellCommands or a new eventTimeline array
inside runner.runCapture when the "/api/tags" branch runs and inside
runner.runShell when the systemd restart command is issued), keep tagsProbeCount
if you want but append entries like "tags-probe" and "systemctl-restart" to the
timeline, then replace the existing assertion (tagsProbeCount >= 2) with an
order-aware assertion that verifies at least one "tags-probe" event appears
after the "systemctl-restart" event in the recorded timeline; update the
analogous checks at 1268-1271 the same way.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec1c1168-228f-4f8f-928d-a26b158f23ef

📥 Commits

Reviewing files that changed from the base of the PR and between 4b47ab4 and e41c559.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)

1380-1413: ⚡ Quick win

Also assert that override-apply failure never falls back to ollama serve.

This branch is meant to fail closed, but the test currently only checks stderr text. A regression that starts an unmanaged daemon before throwing would still pass here. Mirroring the manual-start sentinel used in the restart-failure test would close that gap.

Suggested fix
 runner.runShell = (command) => {
+  if (command.includes("ollama serve")) console.error("manual-start");
   if (command.includes("install -D -m 0644")) return { status: 1 };
   return { status: 0 };
 };
@@
     assert.equal(result.status, 1);
     assert.match(result.stderr, /Failed to apply Ollama systemd loopback override/);
     assert.match(result.stderr, /Refusing to continue/);
+    assert.doesNotMatch(result.stderr, /manual-start/);
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 1380 - 1413, The test must also
assert that a failed override-apply never falls back to spawning an unmanaged
Ollama daemon; after running the script that calls setupNim, add an assertion
(mirroring the restart-failure test's "manual-start" sentinel) that the
manual-start sentinel file was NOT created in the test's HOME/tmpDir area so we
guarantee no unmanaged "ollama serve" was started; locate this check alongside
the existing assertions on result and stderr (referencing variables result and
tmpDir and the setupNim invocation).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/onboard-selection.test.ts`:
- Around line 1119-1120: The install-command regex only matches when the command
begins with "sudo"; update the matcher used to set installedBody so it also
accepts bare installs by making the "sudo ( -n)?" portion optional (e.g., allow
an optional "sudo" + optional "-n" and whitespace before "install"), then keep
extracting the first capture group into installedBody as before (adjust the
regex used where `const match = command.match(...)` and `if (match)
installedBody = fs.readFileSync(match[1], "utf8");`).

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1380-1413: The test must also assert that a failed override-apply
never falls back to spawning an unmanaged Ollama daemon; after running the
script that calls setupNim, add an assertion (mirroring the restart-failure
test's "manual-start" sentinel) that the manual-start sentinel file was NOT
created in the test's HOME/tmpDir area so we guarantee no unmanaged "ollama
serve" was started; locate this check alongside the existing assertions on
result and stderr (referencing variables result and tmpDir and the setupNim
invocation).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42efda51-7359-41bc-923d-350182bd9f33

📥 Commits

Reviewing files that changed from the base of the PR and between e41c559 and 49f7f63.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard.ts

Comment thread test/onboard-selection.test.ts Outdated
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Verifies the intended PR #3039 contract: existing-running-Ollama path now applies the systemd loopback drop-in, with a real merge step that preserves user-set Environment= lines and relies on systemd's last-wins for the OLLAMA_HOST override. The ActiveEnterTimestampMonotonic + MainPID poll correctly distinguishes a real restart from "still up". All error paths fail closed instead of silently spawning a duplicate ollama serve.

Test coverage is strong — 5 cases covering merge, idempotent reapply, restart-failure, and install-failure. Non-blocking suggestions for a follow-up if useful:

  • Tight timeout headroom in runShell({ timeout: 45_000 }) vs ~30s poll + daemon-reload — consider 60_000.
  • The "Failed to inspect existing Ollama systemd override" error could name sudo -n non-interactive failure as a likely cause.
  • Optional test assertion to lock in the single-desired-line invariant: assert.equal((installedBody.match(/OLLAMA_HOST=127\.0\.0\.1:11434/g) || []).length, 1).

None block the merge.

@wscurran wscurran added fix provider: ollama Ollama local model provider behavior labels May 12, 2026
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6650-6748: Extract the Ollama systemd helper functions into a new
module: move ensureOllamaLoopbackSystemdOverride and
mergeOllamaLoopbackSystemdOverride (and any helper constants they rely on such
as OLLAMA_PORT, OLLAMA_SYSTEMD_OVERRIDE_PATH and utility calls like
secureTempFile, runShell, runCapture, cleanupTempDir, resetOllamaHostCache,
findReachableOllamaHost, sleep, isWsl, isNonInteractive, shellQuote) into a new
file (e.g., an onboarding helper module) preserving exact behavior, export
ensureOllamaLoopbackSystemdOverride from that module, and replace the
implementation here with a simple import and call site that uses the same
symbol; ensure all referenced utilities are imported into the new module and
that exported types/values match so callers (e.g., calls to
ensureOllamaLoopbackSystemdOverride) compile and behavior remains identical.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1cd99aeb-e0ab-46c3-b0ba-20fdc4930596

📥 Commits

Reviewing files that changed from the base of the PR and between c5ed492 and 8defe06.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

Comment thread src/lib/onboard.ts Outdated
ericksoa added 2 commits May 12, 2026 21:44
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…ack-existing

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

# Conflicts:
#	test/onboard-selection.test.ts

@ericksoa ericksoa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after follow-up hardening: helper extracted out of onboard entrypoint, CodeRabbit/budget issue resolved, local build and full onboard-selection suite passed, and relevant Ollama proxy E2E passed.

@ericksoa ericksoa merged commit 1b7357b into NVIDIA:main May 13, 2026
21 checks passed
@wscurran wscurran added area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression and removed Local Models labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression provider: ollama Ollama local model provider behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Brev][Onboard] NemoClaw onboard skips Ollama systemd loopback drop-in per PR #3039 contract

4 participants