Skip to content

fix(ollama): preserve local limits and native thinking fallback#39292

Merged
vincentkoc merged 5 commits intomainfrom
vincentkoc-code/ollama-local-followups
Mar 8, 2026
Merged

fix(ollama): preserve local limits and native thinking fallback#39292
vincentkoc merged 5 commits intomainfrom
vincentkoc-code/ollama-local-followups

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: Native Ollama responses can drop model output when a provider returns message.thinking, and merged model config can silently overwrite explicit lower contextWindow / maxTokens values with larger catalog defaults.
  • Why it matters: Native thinking-capable Ollama models can appear empty, and local-model users on constrained hardware can still hit oversized num_ctx / memory pressure even after explicitly lowering limits.
  • What changed: Preserve thinking -> reasoning fallback handling in the native Ollama stream path and honor explicit lower token-limit overrides during provider/model merge.
  • What did NOT change (scope boundary): This does not change embedded-run timeout behavior, compaction behavior, or release tagging.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Native Ollama models that emit message.thinking no longer lose their response content when message.content is empty.
  • Explicitly lower contextWindow / maxTokens values now survive merge-mode refreshes instead of being replaced by larger catalog defaults.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 26.3.1
  • Runtime/container: Node 24 / pnpm
  • Model/provider: Ollama native + merge-mode model config
  • Integration/channel (if any): N/A
  • Relevant config (redacted): provider api: "ollama", explicit contextWindow / maxTokens overrides

Steps

  1. Configure an Ollama provider that uses a thinking-capable native model and stream a reply where message.content is empty but message.thinking is populated.
  2. Configure a provider/model entry with explicit lower contextWindow / maxTokens values than the implicit catalog defaults.
  3. Refresh merged model config and inspect the generated output.

Expected

  • Native Ollama output is preserved through thinking fallback.
  • Explicit lower token limits remain in the merged config.

Actual

  • Before this change, message.thinking could be dropped and lower token-limit overrides were replaced by larger implicit defaults.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: pnpm test -- --run src/agents/ollama-stream.test.ts src/agents/models-config.fills-missing-provider-apikey-from-env-var.test.ts src/agents/model-auth.profiles.test.ts passes (54 tests total). Targeted pnpm exec oxlint ... and pnpm exec oxfmt --check ... on the touched files also pass.
  • Edge cases checked: thinking fallback, legacy reasoning fallback, streamed thinking accumulation, preserved lower token limits, fallback to implicit limits when explicit values are invalid.
  • What you did not verify: Live Ollama end-to-end traffic, the broader src/agents/pi-embedded-runner/run/attempt.test.ts suite, and a full clean pnpm build / pnpm check run. Local pnpm build now clears the patch-specific type issue but still exits in tsdown with Bus error: 10 after artifact generation in this environment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert the two commits in this PR.
  • Files/config to restore: src/agents/ollama-stream.ts, src/agents/models-config.ts
  • Known bad symptoms reviewers should watch for: Empty native Ollama replies on thinking-capable models, or explicit lower token limits being overwritten again during merge refresh.

Risks and Mitigations

  • Risk: Some native Ollama variants may still emit reasoning instead of thinking.
    • Mitigation: The stream/message fallback keeps both fields, with thinking preferred and reasoning retained as a compatibility fallback.
  • Risk: Users with invalid explicit token-limit values could end up with missing limits.
    • Mitigation: Invalid explicit values now fall back to valid implicit defaults, and the merge only writes numeric limits when one is actually valid.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 8, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Ollama stream exposes thinking/reasoning fields as user-visible content (chain-of-thought leakage)

1. 🟡 Ollama stream exposes thinking/reasoning fields as user-visible content (chain-of-thought leakage)

Property Value
Severity Medium
CWE CWE-200
Location src/agents/ollama-stream.ts:471-505

Description

The Ollama streaming adapter now falls back to thinking/reasoning and assigns it to message.content when content is empty.

This can unintentionally disclose internal model traces (chain-of-thought) or other non-user-facing content if an Ollama model/server emits it in thinking/reasoning (common for “reasoning” models):

  • Input (untrusted): chunk.message.thinking / chunk.message.reasoning received from the Ollama server during streaming
  • Transformation: accumulated into fallbackContent
  • Sink (user-visible): written into finalResponse.message.content and then converted into a TextContent block in buildAssistantMessage(), emitted to downstream consumers via the done event

Vulnerable code (stream fallback into final content):

} else if (!sawContent && chunk.message?.thinking) {
  fallbackContent += chunk.message.thinking;
} else if (!sawContent && chunk.message?.reasoning) {
  fallbackContent += chunk.message.reasoning;
}
...
finalResponse.message.content = accumulatedContent || fallbackContent;

Even though the adapter prefers content once it appears, any conversation where the model returns an empty content and populates thinking/reasoning will surface that internal text to the user.

This is security-relevant because internal traces may contain sensitive context (e.g., parts of system prompts, policy text, tool output excerpts) that the application did not intend to display.

Recommendation

Do not automatically surface thinking/reasoning as message.content unless the caller explicitly opts in.

Options:

  1. Default-safe: ignore thinking/reasoning for user-visible output:
// Only show actual assistant content
finalResponse.message.content = accumulatedContent;
  1. Opt-in exposure: gate fallback behind a flag (e.g., options?.includeThinking === true) and otherwise keep it out of the visible message:
const includeThinking = options?.includeThinking === true;
finalResponse.message.content = accumulatedContent || (includeThinking ? fallbackContent : "");
  1. Keep separate metadata: store thinking/reasoning in a non-displayed field on the emitted event/message (if supported), rather than mapping it into TextContent.

Also consider stripping thinking/reasoning when tool_calls are present to avoid leaking tool planning text.


Analyzed PR: #39292 at commit 075bc83

Last updated on: 2026-03-08T01:24:34Z

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3468b128f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/ollama-stream.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

This PR fixes two independent but related bugs in the Ollama integration: (1) native Ollama responses from thinking-capable models (e.g. Qwen3) that populate message.thinking instead of message.content were being silently dropped; (2) an explicit lower contextWindow/maxTokens override in user config was being overwritten by a larger catalog default on every merge-mode refresh.

Changes

  • src/agents/ollama-stream.ts: Adds thinking?: string to the OllamaChatResponse type and inserts a thinking fallback before reasoning in both buildAssistantMessage (non-streaming path) and the stream accumulation loop (streaming path). The priority order is now content → thinking → reasoning.
  • src/agents/models-config.ts: Replaces the old max(explicit, implicit) token-limit resolution with resolvePreferredTokenLimit, which now checks whether the key is explicitly present in the user's config. A valid explicit value (positive finite number) is always honoured regardless of whether it is smaller than the catalog default. Invalid explicit values (0, negative, non-numeric) fall back to the implicit catalog default. When neither value is valid the field is omitted entirely instead of being set to undefined.
  • Tests are correctly updated to reflect both the new thinking priority and the preserved-lower-limit semantics, with a new test covering the invalid-explicit-value fallback path.

Potential edge case (pre-existing): The stream accumulation loop uses an else if chain, so if a model streams thinking tokens in early chunks and then actual content in later chunks, both phases accumulate into the same buffer and are surfaced as a single text block. This behaviour predates this PR (it existed for reasoning) and is acceptable for models that use thinking exclusively in place of content, but it is worth documenting as an assumption.

Confidence Score: 4/5

  • Safe to merge — targeted, well-tested bug fixes with clear scope boundaries and backward-compatible behavior.
  • Both fixes are narrow and correct for their stated purpose. The thinking fallback follows the same proven pattern as the existing reasoning fallback, and the new resolvePreferredTokenLimit logic is thoroughly covered by updated tests including the invalid-value edge case. No regressions to existing behavior are expected. Score is 4 rather than 5 because the streaming accumulation still carries a pre-existing design assumption (thinking-only models never mix thinking and content phases in the same stream) that is not explicitly documented or guarded.
  • No files require special attention. The comment on src/agents/ollama-stream.ts lines 475–483 is the only flagged concern, and it is a pre-existing style/documentation gap rather than a newly introduced bug.

Last reviewed commit: 3468b12

Comment thread src/agents/ollama-stream.ts
@vincentkoc vincentkoc merged commit d6d04f3 into main Mar 8, 2026
10 of 11 checks passed
@vincentkoc vincentkoc deleted the vincentkoc-code/ollama-local-followups branch March 8, 2026 00:53
vincentkoc added a commit to BryanTegomoh/openclaw-upstream that referenced this pull request Mar 8, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
ziomancer pushed a commit to ziomancer/openclaw that referenced this pull request Mar 8, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
openperf pushed a commit to openperf/moltbot that referenced this pull request Mar 8, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
mcaxtr pushed a commit to mcaxtr/openclaw that referenced this pull request Mar 8, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
alexey-pelykh added a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
…claw#39292) (#1783)

(cherry picked from commit d6d04f3)

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…claw#39292)

* fix(ollama): support thinking field fallback in native stream

* fix(models): honor explicit lower token limits in merge mode

* fix(ollama): prefer streamed content over fallback thinking

* changelog: note Ollama local model fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: S

Projects

None yet

1 participant