fix(onboard): repair existing Ollama loopback override#3384
Conversation
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Linux-only systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf with idempotent merge logic, a helper to apply/restart and re-probe loopback reachability, integrates this into provider-selection and install flows, and adds Vitest coverage for apply/repair/idempotency and failure cases. ChangesOllama Systemd Loopback Override
Sequence Diagram(s)sequenceDiagram
participant OnboardCLI
participant FileSystem
participant Systemd
participant OllamaService
participant NetworkProbe
OnboardCLI->>FileSystem: ensure/read/merge override -> /etc/systemd/system/ollama.service.d/override.conf
OnboardCLI->>Systemd: systemctl daemon-reload
OnboardCLI->>Systemd: systemctl --no-block restart ollama
Systemd->>OllamaService: restart request
OnboardCLI->>OllamaService: clear cached host reachability
OnboardCLI->>NetworkProbe: probe 127.0.0.1:${OLLAMA_PORT}
NetworkProbe->>OnboardCLI: reachability result (ready/failed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
test/onboard-selection.test.ts (2)
1256-1271: ⚡ Quick winAssert the visible repair transcript in the already-loopback path.
This is the silent-skip regression called out in
#3337, but the test only checks side effects. Please also assert that this path printsConfiguring Ollama systemd loopback override...so the required user-visible confirmation cannot regress.Suggested assertion
assert.ok( payload.shellCommands.some((command: string) => command.includes("/etc/systemd/system/ollama.service.d/override.conf"), ), "already-loopback systemd Ollama still needs the persistent drop-in", ); + assert.ok( + payload.lines.some((line: string) => + line.includes("Configuring Ollama systemd loopback override"), + ), + "already-loopback repair should emit the visible loopback-override transcript", + ); assert.ok( !payload.shellCommands.some((command: string) => command.includes("OLLAMA_HOST=127.0.0.1:11434 ollama serve"), ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 1256 - 1271, Add an assertion to verify the user-visible repair transcript for the already-loopback path: after the existing assertions that inspect payload.shellCommands and payload.tagsProbeCount in onboard-selection.test.ts, assert that payload.transcript (or the equivalent transcript/log field used in this test) includes the string "Configuring Ollama systemd loopback override..." so the visible confirmation cannot regress; use an assert.ok check with a clear failure message referencing the already-loopback path.
1196-1216: ⚡ Quick winMake the re-probe assertion order-aware.
tagsProbeCount >= 2does not prove a/api/tagsprobe happened after the systemd restart. Two early probes would still pass. Record an event timeline and assert that at least one tags probe occurs after the restart command.Suggested tightening
-let tagsProbeCount = 0; +const events = []; const shellCommands = []; runner.runCapture = (command) => { const cmd = Array.isArray(command) ? command.join(" ") : command; if (cmd.includes("command -v ollama")) return "/usr/bin/ollama"; if (cmd.includes("127.0.0.1:11434/api/tags")) { - tagsProbeCount += 1; + events.push("tags"); return JSON.stringify({ models: [{ name: "qwen3:8b" }] }); } @@ runner.runShell = (command) => { shellCommands.push(command); + if (command.includes("sudo systemctl restart ollama")) events.push("restart"); return { status: 0 }; }; @@ - originalLog(JSON.stringify({ result, lines, shellCommands, tagsProbeCount })); + originalLog(JSON.stringify({ result, lines, shellCommands, events })); @@ - assert.ok( - payload.tagsProbeCount >= 2, - "should re-probe after the systemd restart instead of trusting a stale loopback cache", - ); + const restartIndex = payload.events.indexOf("restart"); + assert.ok(restartIndex >= 0, "expected a systemd restart"); + assert.ok( + payload.events.slice(restartIndex + 1).includes("tags"), + "should re-probe after the systemd restart instead of trusting a stale loopback cache", + );Also applies to: 1268-1271
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 1196 - 1216, The current test only counts tags probes via tagsProbeCount, which can’t prove a probe happened after the restart; change the stubbing in runner.runCapture and runner.runShell to record ordered events (e.g., push timestamped strings into shellCommands or a new eventTimeline array inside runner.runCapture when the "/api/tags" branch runs and inside runner.runShell when the systemd restart command is issued), keep tagsProbeCount if you want but append entries like "tags-probe" and "systemctl-restart" to the timeline, then replace the existing assertion (tagsProbeCount >= 2) with an order-aware assertion that verifies at least one "tags-probe" event appears after the "systemctl-restart" event in the recorded timeline; update the analogous checks at 1268-1271 the same way.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8204-8208: The code currently allows falling back to the unmanaged
"ollama serve" path after ensureOllamaLoopbackSystemdOverride() has modified a
systemd unit by setting ollamaReady based on findReachableOllamaHost(); instead,
if ensureOllamaLoopbackSystemdOverride() returns true and
findReachableOllamaHost() returns null (probe failed), abort onboarding
immediately (e.g., return an error or throw) rather than continuing to the
background daemon path; update the block around ollamaRunning/ollamaReady (and
the analogous block that uses the same pattern elsewhere) so that a touched
systemd override + failed post-restart probe triggers an abort/early return
instead of falling back to unmanaged start.
- Around line 7242-7245: The privileged shell call using runShell with
interactive sudo can block in non-interactive/CI; change the command to use
non-interactive sudo (add sudo -n) and make systemctl restart non-blocking (add
--no-block), and pass a short timeout to runShell (e.g., 20–30s) so it fails
fast instead of hanging; update handling around overrideResult to treat timeout
or sudo-failure as a recoverable error (log and continue or surface a clear
error) and reference overrideResult, runShell, tmpDropIn, and
OLLAMA_SYSTEMD_OVERRIDE_PATH when making the changes.
---
Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1256-1271: Add an assertion to verify the user-visible repair
transcript for the already-loopback path: after the existing assertions that
inspect payload.shellCommands and payload.tagsProbeCount in
onboard-selection.test.ts, assert that payload.transcript (or the equivalent
transcript/log field used in this test) includes the string "Configuring Ollama
systemd loopback override..." so the visible confirmation cannot regress; use an
assert.ok check with a clear failure message referencing the already-loopback
path.
- Around line 1196-1216: The current test only counts tags probes via
tagsProbeCount, which can’t prove a probe happened after the restart; change the
stubbing in runner.runCapture and runner.runShell to record ordered events
(e.g., push timestamped strings into shellCommands or a new eventTimeline array
inside runner.runCapture when the "/api/tags" branch runs and inside
runner.runShell when the systemd restart command is issued), keep tagsProbeCount
if you want but append entries like "tags-probe" and "systemctl-restart" to the
timeline, then replace the existing assertion (tagsProbeCount >= 2) with an
order-aware assertion that verifies at least one "tags-probe" event appears
after the "systemctl-restart" event in the recorded timeline; update the
analogous checks at 1268-1271 the same way.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ec1c1168-228f-4f8f-928d-a26b158f23ef
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard-selection.test.ts
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)
1380-1413: ⚡ Quick winAlso assert that override-apply failure never falls back to
ollama serve.This branch is meant to fail closed, but the test currently only checks stderr text. A regression that starts an unmanaged daemon before throwing would still pass here. Mirroring the
manual-startsentinel used in the restart-failure test would close that gap.Suggested fix
runner.runShell = (command) => { + if (command.includes("ollama serve")) console.error("manual-start"); if (command.includes("install -D -m 0644")) return { status: 1 }; return { status: 0 }; }; @@ assert.equal(result.status, 1); assert.match(result.stderr, /Failed to apply Ollama systemd loopback override/); assert.match(result.stderr, /Refusing to continue/); + assert.doesNotMatch(result.stderr, /manual-start/); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 1380 - 1413, The test must also assert that a failed override-apply never falls back to spawning an unmanaged Ollama daemon; after running the script that calls setupNim, add an assertion (mirroring the restart-failure test's "manual-start" sentinel) that the manual-start sentinel file was NOT created in the test's HOME/tmpDir area so we guarantee no unmanaged "ollama serve" was started; locate this check alongside the existing assertions on result and stderr (referencing variables result and tmpDir and the setupNim invocation).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/onboard-selection.test.ts`:
- Around line 1119-1120: The install-command regex only matches when the command
begins with "sudo"; update the matcher used to set installedBody so it also
accepts bare installs by making the "sudo ( -n)?" portion optional (e.g., allow
an optional "sudo" + optional "-n" and whitespace before "install"), then keep
extracting the first capture group into installedBody as before (adjust the
regex used where `const match = command.match(...)` and `if (match)
installedBody = fs.readFileSync(match[1], "utf8");`).
---
Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 1380-1413: The test must also assert that a failed override-apply
never falls back to spawning an unmanaged Ollama daemon; after running the
script that calls setupNim, add an assertion (mirroring the restart-failure
test's "manual-start" sentinel) that the manual-start sentinel file was NOT
created in the test's HOME/tmpDir area so we guarantee no unmanaged "ollama
serve" was started; locate this check alongside the existing assertions on
result and stderr (referencing variables result and tmpDir and the setupNim
invocation).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 42efda51-7359-41bc-923d-350182bd9f33
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard-selection.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard.ts
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM.
Verifies the intended PR #3039 contract: existing-running-Ollama path now applies the systemd loopback drop-in, with a real merge step that preserves user-set Environment= lines and relies on systemd's last-wins for the OLLAMA_HOST override. The ActiveEnterTimestampMonotonic + MainPID poll correctly distinguishes a real restart from "still up". All error paths fail closed instead of silently spawning a duplicate ollama serve.
Test coverage is strong — 5 cases covering merge, idempotent reapply, restart-failure, and install-failure. Non-blocking suggestions for a follow-up if useful:
- Tight timeout headroom in
runShell({ timeout: 45_000 })vs ~30s poll + daemon-reload — consider 60_000. - The
"Failed to inspect existing Ollama systemd override"error could namesudo -nnon-interactive failure as a likely cause. - Optional test assertion to lock in the single-desired-line invariant:
assert.equal((installedBody.match(/OLLAMA_HOST=127\.0\.0\.1:11434/g) || []).length, 1).
None block the merge.
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6650-6748: Extract the Ollama systemd helper functions into a new
module: move ensureOllamaLoopbackSystemdOverride and
mergeOllamaLoopbackSystemdOverride (and any helper constants they rely on such
as OLLAMA_PORT, OLLAMA_SYSTEMD_OVERRIDE_PATH and utility calls like
secureTempFile, runShell, runCapture, cleanupTempDir, resetOllamaHostCache,
findReachableOllamaHost, sleep, isWsl, isNonInteractive, shellQuote) into a new
file (e.g., an onboarding helper module) preserving exact behavior, export
ensureOllamaLoopbackSystemdOverride from that module, and replace the
implementation here with a simple import and call site that uses the same
symbol; ensure all referenced utilities are imported into the new module and
that exported types/values match so callers (e.g., calls to
ensureOllamaLoopbackSystemdOverride) compile and behavior remains identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1cd99aeb-e0ab-46c3-b0ba-20fdc4930596
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard-selection.test.ts
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…ack-existing Signed-off-by: Aaron Erickson <aerickson@nvidia.com> # Conflicts: # test/onboard-selection.test.ts
ericksoa
left a comment
There was a problem hiding this comment.
Approved after follow-up hardening: helper extracted out of onboard entrypoint, CodeRabbit/budget issue resolved, local build and full onboard-selection suite passed, and relevant Ollama proxy E2E passed.
Summary
Repair Local Ollama onboarding so existing native Linux systemd installs get the same loopback override as fresh installs. This keeps raw Ollama bound to localhost and repairs stale or missing
OLLAMA_HOSTdrop-ins before the auth proxy is used.Related Issue
Fixes #3337
Changes
OLLAMA_HOSTrepair.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit
New Features
Tests