Skip to content

Ollama provider: missing response-level reasoning stripper for Kimi models causes inline reasoning leak to chat #86129

@zoltanferenczfi

Description

@zoltanferenczfi

Summary

When using ollama/kimi-k2.6:cloud (and likely kimi-k2.5:cloud) with Think: off, the model's inline reasoning text leaks into the visible chat output. The gateway correctly sends think: false (native Ollama) and thinking: { type: "disabled" } (Moonshot wrapper) on outgoing requests, but the model still emits reasoning text inline — separated from the actual response by a boundary delimiter. The Ollama provider has no response-level stripper for this inline reasoning, unlike the opencode-go provider which has stripOpencodeGoKimiReasoningPayload.

Environment

  • OpenClaw version: 2026.5.22 (a374c3a)
  • Provider: ollama
  • Model: ollama/kimi-k2.6:cloud (also observed with kimi-k2.5:cloud)
  • Runtime: Think: off
  • OS: Windows 10.0.26200 (x64)
  • Ollama base URL: http://192.168.1.72:11434

Reproduction

Config (relevant excerpt)

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://192.168.1.72:11434",
        api: "ollama",
        models: [
          {
            id: "kimi-k2.6:cloud",
            name: "kimi-k2.6:cloud",
            reasoning: false,
            params: {
              num_ctx: 262144
            }
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: { primary: "ollama/kimi-k2.6:cloud" }
    }
  }
}

Steps

  1. Set primary model to ollama/kimi-k2.6:cloud
  2. Ensure thinking / Think is off
  3. Send any message that triggers the agent
  4. Observe the assistant response in chat

Actual behavior

The visible response contains the model's internal reasoning monologue, followed by a boundary delimiter, then the actual response. Example from session history:

"The user is asking what projects we've worked on so far. Based on my memory files...

...Let me provide a clear summary. ️ So far we've got **3 active projects** tracked:"

The text before is raw reasoning that should be internal only.

Expected behavior

Only the text after the reasoning delimiter should be visible to the user. The reasoning block should be stripped, discarded, or stored as internal metadata — never rendered as chat content.

Root cause analysis

  1. Request side is correctcreateConfiguredOllamaCompatStreamWrapper applies both:

    • createOllamaThinkingWrapper(..., false) → sets think: false on native Ollama payload
    • createMoonshotThinkingWrapper(..., "disabled") → sets thinking: { type: "disabled" }
  2. Model ignores the disable signalkimi-k2.6 still outputs reasoning inline, likely because the Ollama API passthrough doesn't propagate the disable parameter correctly to the underlying model, or the model inherently emits reasoning regardless.

  3. Missing response stripper — The opencode-go provider has stripOpencodeGoKimiReasoningPayload which:

    • Deletes reasoning, reasoning_details, reasoning_content, reasoning_text fields
    • Filters out type: "thinking" / type: "reasoning" content parts from messages
    • Replaces stripped content with [assistant reasoning omitted]

    The ollama provider has no equivalent response sanitizer for Kimi models.

Cross-references

Suggested fix

Add a Kimi-specific response sanitizer in the Ollama provider, analogous to what exists for opencode-go:

Option A (provider-level): In createConfiguredOllamaCompatStreamWrapper, when isOllamaCloudKimiModelRef(modelId) is true, wrap the stream with a response interceptor that strips inline reasoning text from assistant message content before it reaches the user.

Option B (gateway-level): Add a general stripInlineReasoningFromAssistantText utility in the message processing pipeline that recognizes the (or equivalent) delimiter and splits/omits the reasoning portion.

Option C (model registry): Mark kimi-k2.6:cloud and kimi-k2.5:cloud under the Ollama provider as reasoning: true with a reasoningOutputMode: "inline" so the gateway knows to apply stripping regardless of what the model claims.

Workarounds

  • Switch to a non-reasoning model (e.g., llama3.2:3b, gemma4)
  • Modify the model's Ollama Modelfile to inject a system prompt forbidding reasoning output

Impact

  • Severity: Medium-High — leaks internal decision-making and planning to user-visible chat
  • Affected channels: All (webchat, Discord, Telegram, etc.)
  • Frequency: Every multi-turn assistant message with ollama/kimi-k2.6:cloud
Image

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions