fix(agents): route multi-instance Ollama providers by model baseUrl by matthewpapa07 · Pull Request #63218 · openclaw/openclaw

matthewpapa07 · 2026-04-08T15:24:44Z

Summary

Problem: With at least two separate Ollama instances it seems that Openclaw forces all routing to the first instance it finds.
Why it matters: In multi-GPU/ollama setups this can unload/reload the wrong model on the first Ollama instance, break provider isolation, and make subagents silently run on the parent GPU instead of the configured shard GPU. Generally speaking this breaks a ton of use cases for OpenClaw
What changed: Shared Ollama transport resolution now uses the active model/provider boundary to choose the correct baseUrl at request time, and dedicated-agent fallback resolution is kept scoped to the child agent instead of bleeding into the parent session context.
What did NOT change (scope boundary): This PR does not change Ollama model loading/warmup behavior, timeouts, or first-request latency; it only fixes provider routing and child fallback scoping.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes # (none)
Related # (none)
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: Custom providers sharing api: "ollama" could fall through a shared transport registry path that effectively closed over the wrong Ollama baseUrl, so provider/model selection looked correct while the request still went to the wrong backend.
Missing detection / guardrail: There was no regression coverage for multiple custom Ollama providers sharing the same API family but requiring different transport destinations, and no guardrail covering dedicated child agents inheriting parent fallback/provider context.
Contributing context (if known): The bug is easiest to see when different Ollama instances are pinned to different GPUs or host different model inventories, because misrouting causes visible model unload/reload behavior on the wrong instance.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/agents/provider-stream.test.ts
- src/agents/provider-transport-stream.test.ts
- src/agents/pi-embedded-runner/stream-resolution.test.ts
- src/agents/agent-scope.test.ts
Scenario the test should lock in: When two custom providers share api: "ollama" but point at different baseUrl values, the selected provider/model must resolve to the correct transport destination at run time; additionally, a dedicated child agent without configured fallbacks must not reuse parent fallback/provider context.
Why this is the smallest reliable guardrail: The bug lives at the provider stream resolution boundary and child agent scope resolution boundary, so targeted stream/scope tests catch it without requiring a full live Ollama environment.
Existing test that already covers this (if any): None before this PR.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Multi-instance Ollama providers now route by the configured provider/model baseUrl instead of potentially landing on the first registered Ollama backend.
Dedicated agents without configured fallbacks no longer accidentally inherit parent provider/model fallback state.
Docs now explicitly describe the multi-instance Ollama behavior and routing expectation.

Diagram (if applicable)

Before:
[subagent selects ollama-5090/model] -> [shared ollama transport closes over first provider] -> [request hits 11434] -> [wrong GPU/model reload]

After:
[subagent selects ollama-5090/model] -> [transport resolves model/provider boundary at call time] -> [request hits 11435] -> [correct GPU/provider]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux
Runtime/container: OpenClaw Gateway v2026.4.7; self-hosted Ollama on a separate host
Model/provider:
- ollama -> http://127.0.0.1:11434
- ollama-5090 / ollama_shard -> http://127.0.0.1:11435
- Repro model: qwen3.5:27b
Integration/channel (if any): Local gateway / agent runtime
Relevant config (redacted):
- main agent on ollama/qwen3.5:27b
- nexus / shard on ollama-5090/qwen3.5:27b
- dual Ollama providers share api: "ollama" but use different baseUrl values (and different pinned GPUs)

Steps

Configure two custom Ollama providers that both use api: "ollama" but point to different baseUrl values.
Route the main agent to provider A (11434) and a subagent like nexus to provider B (11435).
Trigger a subagent run and inspect the provider/model metadata plus Ollama /api/ps activity on both instances.

Expected

A subagent configured for ollama-5090/... should send its request to 11435 and only touch the shard GPU/instance.

Actual

Before the fix, the run could still hit 11434, unloading/reloading the wrong model on the wrong GPU even though the selected provider appeared to be ollama-5090.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- reproduced the original dual-Ollama misrouting locally before the fix
- verified after the fix that a nexus run recorded provider: "ollama-5090"
- verified that /api/ps activity advanced on 11435 while 11434 stayed unchanged
- verified dedicated-agent fallback scoping with targeted tests
Edge cases checked:
- shared api: "ollama" providers with different baseUrl values
- child agent with no explicit fallbacks
What you did not verify:
- full end-to-end coverage across every provider family that uses shared transport registration
- separate Ollama model warmup/load timeout behavior (observed locally, but out of scope for this PR)

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: Shared Ollama transport resolution now depends on the active model/provider boundary at call time and could affect other custom Ollama provider combinations.
- Mitigation: Added targeted regression tests for shared-provider stream resolution and documented the intended multi-instance behavior.
Risk: Child-agent fallback scoping may change behavior for setups that were unintentionally relying on parent fallback/provider bleed.
- Mitigation: The change is limited to dedicated child-agent scope resolution and aligns behavior with explicit per-agent config rather than accidental inheritance.

greptile-apps · 2026-04-08T15:30:57Z

Greptile Summary

This PR fixes multi-instance Ollama provider routing by replacing a closed-over static URL pattern with a createBoundaryAwareOllamaStreamFn that reads model.baseUrl dynamically at call time. It also introduces hasDedicatedAgentPrimaryWithoutFallbackOverride to prevent dedicated child agents from inheriting global fallback state. The Ollama transport fix is clean and correct; the fallback scoping change is intentional and well-tested.

Confidence Score: 5/5

Safe to merge; the core routing fix is correct and well-tested, and the only remaining findings are P2 documentation clarity issues.

The Ollama boundary-aware stream function reads model.baseUrl at call time, which is the correct fix for the shared-transport closure bug. The dedicated-agent fallback scoping is intentional, tested, and the behavior change is acknowledged in the PR description. All remaining findings (docs gap for the no-global-primary edge case) are P2 and do not affect correctness.

docs/gateway/configuration-reference.md and docs/concepts/model-failover.md — both miss the edge case where agents.defaults.model.primary is unset; src/agents/agent-scope.ts lines 213-214 for the corresponding code condition.

Vulnerabilities

No security concerns identified. The change only affects internal provider routing and fallback resolution logic; no new network surfaces, token handling, or privilege boundaries are introduced.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/agent-scope.ts
Line: 213-214

Comment:
**Dedicated-lane trigger covers undocumented case**

When `agents.defaults.model` has no `primary` but does have `fallbacks`, any agent with an explicit `primary` is treated as a dedicated lane and receives `[]` fallbacks. This is broader than what both doc updates describe ("primary that differs from `agents.defaults.model.primary`"). The test for `cfgInheritDefaults` (no global primary, global fallbacks set, `linus` has own primary) confirms this path is hit, but users whose config matches that pattern will see a silent behavior change with no guidance in the docs.

Consider either narrowing the condition to require a global primary to exist, or adding a sentence in the updated doc sections noting that this also applies when `agents.defaults.model.primary` is unset.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/gateway/configuration-reference.md
Line: 1565

Comment:
**Docs miss the no-global-primary edge case**

The new sentence says "Agents with a dedicated `primary` that differs from `agents.defaults.model.primary`…" but `hasDedicatedAgentPrimaryWithoutFallbackOverride` also triggers — and suppresses global fallbacks — when `agents.defaults.model.primary` is absent altogether, as long as `agents.defaults.model.fallbacks` is non-empty. The same gap exists in `docs/concepts/model-failover.md`. A user who relies on a config like `defaults: { model: { fallbacks: ["..."] } }` plus an agent-level `primary` will see their fallbacks silently drop after this change without a clear pointer in the docs.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(agents): route multi-instance Ollama..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T15:31:01Z

+  if (!defaultPrimary) {
+    return resolveAgentModelFallbackValues(cfg.agents?.defaults?.model).length > 0;


Dedicated-lane trigger covers undocumented case

When agents.defaults.model has no primary but does have fallbacks, any agent with an explicit primary is treated as a dedicated lane and receives [] fallbacks. This is broader than what both doc updates describe ("primary that differs from agents.defaults.model.primary"). The test for cfgInheritDefaults (no global primary, global fallbacks set, linus has own primary) confirms this path is hit, but users whose config matches that pattern will see a silent behavior change with no guidance in the docs.

Consider either narrowing the condition to require a global primary to exist, or adding a sentence in the updated doc sections noting that this also applies when agents.defaults.model.primary is unset.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/agent-scope.ts Line: 213-214 Comment: **Dedicated-lane trigger covers undocumented case** When `agents.defaults.model` has no `primary` but does have `fallbacks`, any agent with an explicit `primary` is treated as a dedicated lane and receives `[]` fallbacks. This is broader than what both doc updates describe ("primary that differs from `agents.defaults.model.primary`"). The test for `cfgInheritDefaults` (no global primary, global fallbacks set, `linus` has own primary) confirms this path is hit, but users whose config matches that pattern will see a silent behavior change with no guidance in the docs. Consider either narrowing the condition to require a global primary to exist, or adding a sentence in the updated doc sections noting that this also applies when `agents.defaults.model.primary` is unset. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-08T15:31:02Z

 - `id`: stable agent id (required).
 - `default`: when multiple are set, first wins (warning logged). If none set, first list entry is default.
- `model`: string form overrides `primary` only; object form `{ primary, fallbacks }` overrides both (`[]` disables global fallbacks). Cron jobs that only override `primary` still inherit default fallbacks unless you set `fallbacks: []`.
+- `model`: string form overrides `primary` only; object form `{ primary, fallbacks }` overrides both (`[]` disables global fallbacks). Agents with a dedicated `primary` that differs from `agents.defaults.model.primary` no longer implicitly fall back to the global primary; set `fallbacks` explicitly if you want that behavior. Cron jobs that only override `primary` still inherit default fallbacks unless you set `fallbacks: []`.


Docs miss the no-global-primary edge case

The new sentence says "Agents with a dedicated primary that differs from agents.defaults.model.primary…" but hasDedicatedAgentPrimaryWithoutFallbackOverride also triggers — and suppresses global fallbacks — when agents.defaults.model.primary is absent altogether, as long as agents.defaults.model.fallbacks is non-empty. The same gap exists in docs/concepts/model-failover.md. A user who relies on a config like defaults: { model: { fallbacks: ["..."] } } plus an agent-level primary will see their fallbacks silently drop after this change without a clear pointer in the docs.

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/gateway/configuration-reference.md Line: 1565 Comment: **Docs miss the no-global-primary edge case** The new sentence says "Agents with a dedicated `primary` that differs from `agents.defaults.model.primary`…" but `hasDedicatedAgentPrimaryWithoutFallbackOverride` also triggers — and suppresses global fallbacks — when `agents.defaults.model.primary` is absent altogether, as long as `agents.defaults.model.fallbacks` is non-empty. The same gap exists in `docs/concepts/model-failover.md`. A user who relies on a config like `defaults: { model: { fallbacks: ["..."] } }` plus an agent-level `primary` will see their fallbacks silently drop after this change without a clear pointer in the docs. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68531b3d13

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T15:31:40Z

+    normalizeLowercaseStringOrEmpty(explicitPrimary) !==
+    normalizeLowercaseStringOrEmpty(defaultPrimary)


Compare canonical model refs before marking dedicated lanes

hasDedicatedAgentPrimaryWithoutFallbackOverride currently decides dedicated-lane behavior by lowercasing raw primary strings, so equivalent model refs written in different valid forms (for example "gpt-5.4" vs "openai/gpt-5.4") are treated as different. In that scenario resolveEffectiveModelFallbacks now returns [], which silently disables inherited fallback candidates for agents that did not actually change models—only notation—leading to unexpected fallback failures after this commit.

Useful? React with 👍 / 👎.

steipete · 2026-04-27T00:34:48Z

Thanks for the detailed repro and test plan. This is now superseded by current main.

The multi-instance/custom-provider Ollama routing bug is fixed on main through the provider API-owner path plus provider-specific Ollama stream config:

custom providers with api: "ollama" resolve the Ollama plugin via the configured API owner hint
the Ollama plugin resolves the active provider's baseUrl when creating the stream function
focused tests cover both the owner-hint resolution and ollama2 baseUrl routing
related issues [Bug]: Multi-ollama provider routing still broken after #61776 — requests silently go to first provider #63775, Regression: api: "ollama" rejected for custom-URL providers; no working path for num_ctx injection through a local proxy #62735, and Cron jobs fail with custom Ollama providers: No API provider registered for api: ollama #57811 are now closed as fixed on main

Focused verification passed: pnpm test src/plugins/provider-runtime.test.ts extensions/ollama/index.test.ts.

Closing this PR as superseded. The final mainline fix is narrower than this branch and avoids carrying the unrelated fallback-scope/docs changes from this PR.

fix(agents): route multi-instance Ollama providers by model baseUrl

68531b3

openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime agents Agent runtime and tooling size: M labels Apr 8, 2026

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 8, 2026

View reviewed changes

docs(agents): clarify dedicated-lane fallback inheritance

4e1abdf

steipete closed this Apr 27, 2026

This was referenced May 11, 2026

[Bug]: Per-model api override not resolved in auth path #80487

Closed

fix: resolve per-model api override in auth hooks and transport selection #80488

Merged

Linux2010 mentioned this pull request May 11, 2026

fix(auth): resolve per-model api override in provider hook refs #80541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): route multi-instance Ollama providers by model baseUrl#63218

fix(agents): route multi-instance Ollama providers by model baseUrl#63218
matthewpapa07 wants to merge 2 commits into
openclaw:mainfrom
matthewpapa07:matt/ollama-shared-api-routing

matthewpapa07 commented Apr 8, 2026

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Vulnerabilities

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Uh oh!

steipete commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (!defaultPrimary) {
		return resolveAgentModelFallbackValues(cfg.agents?.defaults?.model).length > 0;

		normalizeLowercaseStringOrEmpty(explicitPrimary) !==
		normalizeLowercaseStringOrEmpty(defaultPrimary)

Uh oh!

Conversation

matthewpapa07 commented Apr 8, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants