fix(ollama): preserve local limits and native thinking fallback by vincentkoc · Pull Request #39292 · openclaw/openclaw

vincentkoc · 2026-03-08T00:34:41Z

Summary

Problem: Native Ollama responses can drop model output when a provider returns message.thinking, and merged model config can silently overwrite explicit lower contextWindow / maxTokens values with larger catalog defaults.
Why it matters: Native thinking-capable Ollama models can appear empty, and local-model users on constrained hardware can still hit oversized num_ctx / memory pressure even after explicitly lowering limits.
What changed: Preserve thinking -> reasoning fallback handling in the native Ollama stream path and honor explicit lower token-limit overrides during provider/model merge.
What did NOT change (scope boundary): This does not change embedded-run timeout behavior, compaction behavior, or release tagging.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Native Ollama models that emit message.thinking no longer lose their response content when message.content is empty.
Explicitly lower contextWindow / maxTokens values now survive merge-mode refreshes instead of being replaced by larger catalog defaults.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 26.3.1
Runtime/container: Node 24 / pnpm
Model/provider: Ollama native + merge-mode model config
Integration/channel (if any): N/A
Relevant config (redacted): provider api: "ollama", explicit contextWindow / maxTokens overrides

Steps

Configure an Ollama provider that uses a thinking-capable native model and stream a reply where message.content is empty but message.thinking is populated.
Configure a provider/model entry with explicit lower contextWindow / maxTokens values than the implicit catalog defaults.
Refresh merged model config and inspect the generated output.

Expected

Native Ollama output is preserved through thinking fallback.
Explicit lower token limits remain in the merged config.

Actual

Before this change, message.thinking could be dropped and lower token-limit overrides were replaced by larger implicit defaults.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: pnpm test -- --run src/agents/ollama-stream.test.ts src/agents/models-config.fills-missing-provider-apikey-from-env-var.test.ts src/agents/model-auth.profiles.test.ts passes (54 tests total). Targeted pnpm exec oxlint ... and pnpm exec oxfmt --check ... on the touched files also pass.
Edge cases checked: thinking fallback, legacy reasoning fallback, streamed thinking accumulation, preserved lower token limits, fallback to implicit limits when explicit values are invalid.
What you did not verify: Live Ollama end-to-end traffic, the broader src/agents/pi-embedded-runner/run/attempt.test.ts suite, and a full clean pnpm build / pnpm check run. Local pnpm build now clears the patch-specific type issue but still exits in tsdown with Bus error: 10 after artifact generation in this environment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: Revert the two commits in this PR.
Files/config to restore: src/agents/ollama-stream.ts, src/agents/models-config.ts
Known bad symptoms reviewers should watch for: Empty native Ollama replies on thinking-capable models, or explicit lower token limits being overwritten again during merge refresh.

Risks and Mitigations

Risk: Some native Ollama variants may still emit reasoning instead of thinking.
- Mitigation: The stream/message fallback keeps both fields, with thinking preferred and reasoning retained as a compatibility fallback.
Risk: Users with invalid explicit token-limit values could end up with missing limits.
- Mitigation: Invalid explicit values now fall back to valid implicit defaults, and the merge only writes numeric limits when one is actually valid.

aisle-research-bot · 2026-03-08T00:41:58Z

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Ollama stream exposes `thinking`/`reasoning` fields as user-visible content (chain-of-thought leakage)

1. 🟡 Ollama stream exposes `thinking`/`reasoning` fields as user-visible content (chain-of-thought leakage)

Property	Value
Severity	Medium
CWE	CWE-200
Location	`src/agents/ollama-stream.ts:471-505`

Description

The Ollama streaming adapter now falls back to thinking/reasoning and assigns it to message.content when content is empty.

This can unintentionally disclose internal model traces (chain-of-thought) or other non-user-facing content if an Ollama model/server emits it in thinking/reasoning (common for “reasoning” models):

Input (untrusted): chunk.message.thinking / chunk.message.reasoning received from the Ollama server during streaming
Transformation: accumulated into fallbackContent
Sink (user-visible): written into finalResponse.message.content and then converted into a TextContent block in buildAssistantMessage(), emitted to downstream consumers via the done event

Vulnerable code (stream fallback into final content):

} else if (!sawContent && chunk.message?.thinking) {
  fallbackContent += chunk.message.thinking;
} else if (!sawContent && chunk.message?.reasoning) {
  fallbackContent += chunk.message.reasoning;
}
...
finalResponse.message.content = accumulatedContent || fallbackContent;

Even though the adapter prefers content once it appears, any conversation where the model returns an empty content and populates thinking/reasoning will surface that internal text to the user.

This is security-relevant because internal traces may contain sensitive context (e.g., parts of system prompts, policy text, tool output excerpts) that the application did not intend to display.

Recommendation

Do not automatically surface thinking/reasoning as message.content unless the caller explicitly opts in.

Options:

Default-safe: ignore thinking/reasoning for user-visible output:

// Only show actual assistant content
finalResponse.message.content = accumulatedContent;

Opt-in exposure: gate fallback behind a flag (e.g., options?.includeThinking === true) and otherwise keep it out of the visible message:

const includeThinking = options?.includeThinking === true;
finalResponse.message.content = accumulatedContent || (includeThinking ? fallbackContent : "");

Keep separate metadata: store thinking/reasoning in a non-displayed field on the emitted event/message (if supported), rather than mapping it into TextContent.

Also consider stripping thinking/reasoning when tool_calls are present to avoid leaking tool planning text.

Analyzed PR: #39292 at commit 075bc83

_{Last updated on: 2026-03-08T01:24:34Z}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3468b128f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

greptile-apps · 2026-03-08T00:46:33Z

Greptile Summary

This PR fixes two independent but related bugs in the Ollama integration: (1) native Ollama responses from thinking-capable models (e.g. Qwen3) that populate message.thinking instead of message.content were being silently dropped; (2) an explicit lower contextWindow/maxTokens override in user config was being overwritten by a larger catalog default on every merge-mode refresh.

Changes

src/agents/ollama-stream.ts: Adds thinking?: string to the OllamaChatResponse type and inserts a thinking fallback before reasoning in both buildAssistantMessage (non-streaming path) and the stream accumulation loop (streaming path). The priority order is now content → thinking → reasoning.
src/agents/models-config.ts: Replaces the old max(explicit, implicit) token-limit resolution with resolvePreferredTokenLimit, which now checks whether the key is explicitly present in the user's config. A valid explicit value (positive finite number) is always honoured regardless of whether it is smaller than the catalog default. Invalid explicit values (0, negative, non-numeric) fall back to the implicit catalog default. When neither value is valid the field is omitted entirely instead of being set to undefined.
Tests are correctly updated to reflect both the new thinking priority and the preserved-lower-limit semantics, with a new test covering the invalid-explicit-value fallback path.

Potential edge case (pre-existing): The stream accumulation loop uses an else if chain, so if a model streams thinking tokens in early chunks and then actual content in later chunks, both phases accumulate into the same buffer and are surfaced as a single text block. This behaviour predates this PR (it existed for reasoning) and is acceptable for models that use thinking exclusively in place of content, but it is worth documenting as an assumption.

Confidence Score: 4/5

Safe to merge — targeted, well-tested bug fixes with clear scope boundaries and backward-compatible behavior.
Both fixes are narrow and correct for their stated purpose. The thinking fallback follows the same proven pattern as the existing reasoning fallback, and the new resolvePreferredTokenLimit logic is thoroughly covered by updated tests including the invalid-value edge case. No regressions to existing behavior are expected. Score is 4 rather than 5 because the streaming accumulation still carries a pre-existing design assumption (thinking-only models never mix thinking and content phases in the same stream) that is not explicitly documented or guarded.
No files require special attention. The comment on src/agents/ollama-stream.ts lines 475–483 is the only flagged concern, and it is a pre-existing style/documentation gap rather than a newly introduced bug.

_{Last reviewed commit: 3468b12}

…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes

…claw#39292) (cherry picked from commit d6d04f3)

…claw#39292) (#1783) (cherry picked from commit d6d04f3) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes

vincentkoc added 2 commits March 7, 2026 16:33

fix(ollama): support thinking field fallback in native stream

503985d

fix(models): honor explicit lower token limits in merge mode

b34cd4e

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Mar 8, 2026

This was referenced Mar 8, 2026

bug: Ollama native API stream reader checks message.reasoning but field is message.thinking #34722

Closed

[Bug]: Ollama: agent bootstrap hardcodes KvSize=262144 (128K context), ignoring all config attempts to lower num_ctx — unusable on <32GB RAM #35436

Closed

vincentkoc self-assigned this Mar 8, 2026

vincentkoc marked this pull request as ready for review March 8, 2026 00:41

Merge branch 'main' into vincentkoc-code/ollama-local-followups

3468b12

chatgpt-codex-connector Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread src/agents/ollama-stream.ts Outdated

greptile-apps Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread src/agents/ollama-stream.ts

vincentkoc added 2 commits March 7, 2026 16:51

fix(ollama): prefer streamed content over fallback thinking

ccfb07a

changelog: note Ollama local model fixes

075bc83

vincentkoc merged commit d6d04f3 into main Mar 8, 2026
10 of 11 checks passed

vincentkoc deleted the vincentkoc-code/ollama-local-followups branch March 8, 2026 00:53

github-actions Bot mentioned this pull request Mar 8, 2026

📡 Upstream Digest — 2026-03-08 01:21 UTC curtismercier/openclaw-mods#206

Open

github-actions Bot mentioned this pull request Mar 8, 2026

上游更新: v2026.3.7 — 19 P0 + 122 P1 待合并 jiulingyun/openclaw-cn#486

Open

joemcurry mentioned this pull request Mar 10, 2026

Local Ollama models still hang in OpenClaw 2026.3.8 (retest of #31399) #41871

Closed

alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026

fix(ollama): preserve local limits and native thinking fallback (open…

b15db60

…claw#39292) (cherry picked from commit d6d04f3)

alexey-pelykh mentioned this pull request Mar 22, 2026

Cherry-pick d6d04f361: fix(ollama): preserve local limits and native thinking fallback remoteclaw/remoteclaw#1783

Merged

alexey-pelykh added a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026

fix(ollama): preserve local limits and native thinking fallback (open…

578da52

…claw#39292) (#1783) (cherry picked from commit d6d04f3) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ollama): preserve local limits and native thinking fallback#39292

fix(ollama): preserve local limits and native thinking fallback#39292
vincentkoc merged 5 commits intomainfrom
vincentkoc-code/ollama-local-followups

vincentkoc commented Mar 8, 2026

Uh oh!

aisle-research-bot Bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

greptile-apps Bot commented Mar 8, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vincentkoc commented Mar 8, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

aisle-research-bot Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🟡 Ollama stream exposes thinking/reasoning fields as user-visible content (chain-of-thought leakage)

Description

Recommendation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aisle-research-bot Bot commented Mar 8, 2026 •

edited

Loading

1. 🟡 Ollama stream exposes `thinking`/`reasoning` fields as user-visible content (chain-of-thought leakage)