-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Ollama provider: missing response-level reasoning stripper for Kimi models causes inline reasoning leak to chat #86129
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
When using
ollama/kimi-k2.6:cloud(and likelykimi-k2.5:cloud) withThink: off, the model's inline reasoning text leaks into the visible chat output. The gateway correctly sendsthink: false(native Ollama) andthinking: { type: "disabled" }(Moonshot wrapper) on outgoing requests, but the model still emits reasoning text inline — separated from the actual response by a boundary delimiter. The Ollama provider has no response-level stripper for this inline reasoning, unlike the opencode-go provider which hasstripOpencodeGoKimiReasoningPayload.Environment
ollamaollama/kimi-k2.6:cloud(also observed withkimi-k2.5:cloud)Think: offhttp://192.168.1.72:11434Reproduction
Config (relevant excerpt)
Steps
ollama/kimi-k2.6:cloudthinking/ThinkisoffActual behavior
The visible response contains the model's internal reasoning monologue, followed by a boundary delimiter, then the actual response. Example from session history:
The text before
️is raw reasoning that should be internal only.Expected behavior
Only the text after the reasoning delimiter should be visible to the user. The reasoning block should be stripped, discarded, or stored as internal metadata — never rendered as chat content.
Root cause analysis
Request side is correct —
createConfiguredOllamaCompatStreamWrapperapplies both:createOllamaThinkingWrapper(..., false)→ setsthink: falseon native Ollama payloadcreateMoonshotThinkingWrapper(..., "disabled")→ setsthinking: { type: "disabled" }Model ignores the disable signal —
kimi-k2.6still outputs reasoning inline, likely because the Ollama API passthrough doesn't propagate the disable parameter correctly to the underlying model, or the model inherently emits reasoning regardless.Missing response stripper — The
opencode-goprovider hasstripOpencodeGoKimiReasoningPayloadwhich:reasoning,reasoning_details,reasoning_content,reasoning_textfieldstype: "thinking"/type: "reasoning"content parts from messages[assistant reasoning omitted]The
ollamaprovider has no equivalent response sanitizer for Kimi models.Cross-references
opencode-go/kimi-k2.6(and other openai-completions models):reasoningfield leaks through passthrough replay policy, rejected as "Extra inputs are not permitted" on multi-turn #81988 —opencode-go/kimi-k2.6:reasoningfield leaks through passthrough replay policy (same family, different provider)opencode-go/kimi-k2.6sends unsupportedreasoning_detailsin replayed messages (request-side fix for opencode-go)Suggested fix
Add a Kimi-specific response sanitizer in the Ollama provider, analogous to what exists for opencode-go:
Option A (provider-level): In
createConfiguredOllamaCompatStreamWrapper, whenisOllamaCloudKimiModelRef(modelId)is true, wrap the stream with a response interceptor that strips inline reasoning text from assistant message content before it reaches the user.Option B (gateway-level): Add a general
stripInlineReasoningFromAssistantTextutility in the message processing pipeline that recognizes the️(or equivalent) delimiter and splits/omits the reasoning portion.Option C (model registry): Mark
kimi-k2.6:cloudandkimi-k2.5:cloudunder the Ollama provider asreasoning: truewith areasoningOutputMode: "inline"so the gateway knows to apply stripping regardless of what the model claims.Workarounds
llama3.2:3b,gemma4)Impact
ollama/kimi-k2.6:cloud