fix(ollama): preserve local limits and native thinking fallback#39292
fix(ollama): preserve local limits and native thinking fallback#39292vincentkoc merged 5 commits intomainfrom
Conversation
🔒 Aisle Security AnalysisWe found 1 potential security issue(s) in this PR:
1. 🟡 Ollama stream exposes
|
| Property | Value |
|---|---|
| Severity | Medium |
| CWE | CWE-200 |
| Location | src/agents/ollama-stream.ts:471-505 |
Description
The Ollama streaming adapter now falls back to thinking/reasoning and assigns it to message.content when content is empty.
This can unintentionally disclose internal model traces (chain-of-thought) or other non-user-facing content if an Ollama model/server emits it in thinking/reasoning (common for “reasoning” models):
- Input (untrusted):
chunk.message.thinking/chunk.message.reasoningreceived from the Ollama server during streaming - Transformation: accumulated into
fallbackContent - Sink (user-visible): written into
finalResponse.message.contentand then converted into aTextContentblock inbuildAssistantMessage(), emitted to downstream consumers via thedoneevent
Vulnerable code (stream fallback into final content):
} else if (!sawContent && chunk.message?.thinking) {
fallbackContent += chunk.message.thinking;
} else if (!sawContent && chunk.message?.reasoning) {
fallbackContent += chunk.message.reasoning;
}
...
finalResponse.message.content = accumulatedContent || fallbackContent;Even though the adapter prefers content once it appears, any conversation where the model returns an empty content and populates thinking/reasoning will surface that internal text to the user.
This is security-relevant because internal traces may contain sensitive context (e.g., parts of system prompts, policy text, tool output excerpts) that the application did not intend to display.
Recommendation
Do not automatically surface thinking/reasoning as message.content unless the caller explicitly opts in.
Options:
- Default-safe: ignore
thinking/reasoningfor user-visible output:
// Only show actual assistant content
finalResponse.message.content = accumulatedContent;- Opt-in exposure: gate fallback behind a flag (e.g.,
options?.includeThinking === true) and otherwise keep it out of the visible message:
const includeThinking = options?.includeThinking === true;
finalResponse.message.content = accumulatedContent || (includeThinking ? fallbackContent : "");- Keep separate metadata: store
thinking/reasoningin a non-displayed field on the emitted event/message (if supported), rather than mapping it intoTextContent.
Also consider stripping thinking/reasoning when tool_calls are present to avoid leaking tool planning text.
Analyzed PR: #39292 at commit 075bc83
Last updated on: 2026-03-08T01:24:34Z
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3468b128f6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Greptile SummaryThis PR fixes two independent but related bugs in the Ollama integration: (1) native Ollama responses from thinking-capable models (e.g. Qwen3) that populate Changes
Potential edge case (pre-existing): The stream accumulation loop uses an Confidence Score: 4/5
Last reviewed commit: 3468b12 |
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) (cherry picked from commit d6d04f3)
…claw#39292) (#1783) (cherry picked from commit d6d04f3) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
…claw#39292) * fix(ollama): support thinking field fallback in native stream * fix(models): honor explicit lower token limits in merge mode * fix(ollama): prefer streamed content over fallback thinking * changelog: note Ollama local model fixes
Summary
message.thinking, and merged model config can silently overwrite explicit lowercontextWindow/maxTokensvalues with larger catalog defaults.num_ctx/ memory pressure even after explicitly lowering limits.thinking -> reasoningfallback handling in the native Ollama stream path and honor explicit lower token-limit overrides during provider/model merge.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
message.thinkingno longer lose their response content whenmessage.contentis empty.contextWindow/maxTokensvalues now survive merge-mode refreshes instead of being replaced by larger catalog defaults.Security Impact (required)
NoNoNoNoNoYes, explain risk + mitigation:Repro + Verification
Environment
api: "ollama", explicitcontextWindow/maxTokensoverridesSteps
message.contentis empty butmessage.thinkingis populated.contextWindow/maxTokensvalues than the implicit catalog defaults.Expected
thinkingfallback.Actual
message.thinkingcould be dropped and lower token-limit overrides were replaced by larger implicit defaults.Evidence
Human Verification (required)
What you personally verified (not just CI), and how:
pnpm test -- --run src/agents/ollama-stream.test.ts src/agents/models-config.fills-missing-provider-apikey-from-env-var.test.ts src/agents/model-auth.profiles.test.tspasses (54 tests total). Targetedpnpm exec oxlint ...andpnpm exec oxfmt --check ...on the touched files also pass.thinkingfallback, legacyreasoningfallback, streamed thinking accumulation, preserved lower token limits, fallback to implicit limits when explicit values are invalid.src/agents/pi-embedded-runner/run/attempt.test.tssuite, and a full cleanpnpm build/pnpm checkrun. Localpnpm buildnow clears the patch-specific type issue but still exits intsdownwithBus error: 10after artifact generation in this environment.Compatibility / Migration
YesNoNoFailure Recovery (if this breaks)
src/agents/ollama-stream.ts,src/agents/models-config.tsRisks and Mitigations
reasoninginstead ofthinking.thinkingpreferred andreasoningretained as a compatibility fallback.