Skip to content

fix(ollama): strip inline kimi cloud reasoning leak#86286

Closed
jason-allen-oneal wants to merge 12 commits into
openclaw:mainfrom
jason-allen-oneal:fix/issue-86129
Closed

fix(ollama): strip inline kimi cloud reasoning leak#86286
jason-allen-oneal wants to merge 12 commits into
openclaw:mainfrom
jason-allen-oneal:fix/issue-86129

Conversation

@jason-allen-oneal

@jason-allen-oneal jason-allen-oneal commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add Ollama Kimi-cloud response sanitizer for leaked inline reasoning prefixes in visible assistant text
  • scope sanitizer to kimi-k*:cloud and boundary-delimited payloads only, with conservative guards to avoid clipping normal output
  • add regression coverage in Ollama stream runtime tests
  • buffer Kimi-cloud streaming output until the inline reasoning boundary is safe, so raw reasoning prefixes are not emitted while the response is live

Changes

  • extensions/ollama/src/stream.ts
    • added stripKimiInlineReasoningFromVisibleText(...)
    • invoke sanitizer from buildAssistantMessage(...) before publishing visible text blocks
    • route streaming visible text through the same Kimi-cloud boundary decision before text_delta, text_end, partials, and final done
  • extensions/ollama/src/stream-runtime.test.ts
    • new test: strips inline reasoning prefix for kimi-k2.6:cloud
    • new test: does not strip boundary marker for non-Kimi models
    • new test: buffers split Kimi inline reasoning until the answer boundary is safe

Real behavior proof

Behavior addressed: Ollama Kimi cloud responses can include private reasoning in the visible content stream before the final answer boundary. OpenClaw must not emit that prefix in live streaming events or in the saved final assistant message.

Real environment tested: bob@isengard, /tmp/openclaw-86286-pr at PR head aaddcf0a3e56bd3eb1afceea4546d93efc1bff24, running OpenClaw's real createOllamaStreamFn against a local Ollama-compatible HTTP NDJSON server. This exercised the actual OpenClaw Ollama stream parser and event builder rather than the final message constructor alone.

Exact steps or command run after this patch: /home/bob/.nvm/versions/node/v22.22.0/bin/node --import tsx /tmp/openclaw-86286-real-proof.mjs

Evidence after fix: Terminal output from that command included only the cleaned answer in the emitted OpenClaw stream events:

[
  {
    "type": "text_delta",
    "delta": "Final answer only.",
    "partial": { "content": [{ "type": "text", "text": "Final answer only." }] }
  },
  {
    "type": "text_end",
    "content": "Final answer only.",
    "partial": { "content": [{ "type": "text", "text": "Final answer only." }] }
  },
  {
    "type": "done",
    "message": { "content": [{ "type": "text", "text": "Final answer only." }] }
  }
]
RESULT: OpenClaw streamed only the cleaned answer; hidden prefix was not emitted.

Observed result after fix: The injected hidden prefix text (The user is asking for a short answer...) did not appear in text_delta, text_end, partial content, or the final done message. The stream emitted only Final answer only..

What was not tested: A live vendor response that naturally reproduces the delimiter leak was not captured; direct calls to the available Ollama Kimi cloud key did not reproduce the delimiter leak during investigation, so the proof used a local Ollama-compatible stream with the observed leaked wire shape.

Validation

  • /home/bob/.nvm/versions/node/v22.22.0/bin/node --import tsx /tmp/openclaw-86286-real-proof.mjs
  • node scripts/run-vitest.mjs extensions/ollama/src/stream-runtime.test.ts
  • node scripts/run-vitest.mjs src/agents/model-catalog-visibility.test.ts

Fixes #86129

Copilot AI review requested due to automatic review settings May 25, 2026 02:10
@openclaw-barnacle openclaw-barnacle Bot added extensions: ollama size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 9:23 AM ET / 13:23 UTC.

Summary
The PR adds an Ollama Kimi-cloud visible-content sanitizer, wires it into final and streaming assistant output, and adds Ollama stream regression tests for delimiter, buffering, emoji, short-answer, and tool-call cases.

PR surface: Source +183, Tests +473. Total +656 across 6 files.

Reproducibility: yes. source-reproducible: current main forwards Ollama message.content directly into visible text deltas and final messages, so the linked Kimi-cloud delimiter payload would be rendered. I did not run a live vendor reproduction in this read-only review.

Review metrics: 2 noteworthy metrics.

  • Changed Files: 6 files, +714/-58. The patch is large for a provider hot path, so maintainer review should focus on sanitizer scope and streaming semantics.
  • Visible Output Surfaces: 4 surfaces sanitized. The fix covers text_delta, text_end, partial snapshots, and final done message content, which are the user-visible leak points before channels render output.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none

Risk before merge

  • The PR intentionally changes live Kimi-cloud streaming by holding short visible content until a boundary, final response, or bypass window decision, so it can alter text_delta timing even when final content is correct.
  • The sanitizer is based on the observed delimiter payload and compatible local stream proof rather than a naturally reproduced live vendor response; false negatives would leak hidden reasoning, while false positives could suppress visible text.

Maintainer options:

  1. Accept The Scoped Sanitizer (recommended)
    Merge after required checks if maintainers accept the local-compatible stream proof and the bounded Kimi-cloud buffering tradeoff.
  2. Hold For Live Vendor Capture
    Ask for a live Ollama Kimi Cloud before/after transcript or redacted logs if maintainers want proof from the natural vendor leak before landing.
  3. Pause For Provider Contract Work
    Pause this PR only if maintainers want a broader cross-provider reasoning-output contract instead of a narrow Ollama Kimi-cloud workaround.

Next step before merge
No automated repair is queued because the current head has no concrete code finding; maintainers need to accept or reject the scoped sanitizer and residual live-vendor proof gap.

Security
Cleared: No dependency, workflow, credential, or supply-chain changes were found; the patch is security-sensitive because it gates hidden reasoning visibility, but I found no concrete introduced security defect.

Review details

Best possible solution:

Land a narrow provider-local Ollama Kimi-cloud sanitizer only after maintainers accept the bounded buffering behavior, delimiter assumptions, and regression coverage as sufficient for this security-sensitive leak.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main forwards Ollama message.content directly into visible text deltas and final messages, so the linked Kimi-cloud delimiter payload would be rendered. I did not run a live vendor reproduction in this read-only review.

Is this the best way to solve the issue?

Yes, with maintainer risk acceptance: keeping the workaround scoped to the Ollama provider and Kimi-cloud model refs is narrower than a gateway-wide inline reasoning stripper. The main remaining question is proof/risk tolerance, not a clearer code repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e761eb8f3e1d.

Label changes

Label justifications:

  • P1: The PR addresses a user-visible hidden-reasoning leak in active Ollama Kimi-cloud chats, which is an urgent provider/message workflow defect.
  • merge-risk: 🚨 message-delivery: Merging changes when and whether Kimi-cloud visible text deltas are emitted, including buffering and possible suppression around delimiter decisions.
  • merge-risk: 🚨 security-boundary: The changed code decides whether hidden reasoning reaches user-visible output, so false negatives or false positives affect sensitive assistant content boundaries.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body and follow-up comments include after-fix terminal output from the real createOllamaStreamFn path against a local Ollama-compatible NDJSON server showing the hidden prefix absent from text_delta, text_end, partial, and done output.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and follow-up comments include after-fix terminal output from the real createOllamaStreamFn path against a local Ollama-compatible NDJSON server showing the hidden prefix absent from text_delta, text_end, partial, and done output.
Evidence reviewed

PR surface:

Source +183, Tests +473. Total +656 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 5 240 57 +183
Tests 1 474 1 +473
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 714 58 +656

What I checked:

  • Current main forwards visible Ollama content unsanitized: At current main e761eb8, buildAssistantMessage pushes response.message.content directly into a visible text block, and the stream path appends chunk.message.content directly into text_delta/partial state; this matches the linked issue's leak path. (extensions/ollama/src/stream.ts:954, e761eb8f3e1d)
  • PR scopes the sanitizer to Ollama Cloud Kimi refs: The new sanitizer normalizes provider-qualified refs, applies only to wire model ids starting with kimi-k and containing :cloud, and uses a whitespace/start-delimited U+FE0F marker plus prefix-length and pending-window guards. (extensions/ollama/src/sanitizers/kimi-inline-reasoning.ts:15, 0559f4d1e4e8)
  • PR sanitizes before live and final publication: The PR routes streaming visible content through createOllamaVisibleContentSanitizer before text_start/text_delta/text_end/partial state and sets finalResponse.message.content to the sanitized accumulated visible content before done. (extensions/ollama/src/stream.ts:1220, 0559f4d1e4e8)
  • Regression tests cover the risky stream surfaces: The PR adds tests for buffering hidden Kimi reasoning until the boundary is safe, preserving marker-less/bypassed output, keeping deltas append-only after bypass, avoiding double sanitization before done, and dropping hidden-prefix-only tool-call output. (extensions/ollama/src/stream-runtime.test.ts:1844, 0559f4d1e4e8)
  • Maintainer follow-up addressed earlier brittle cases: The discussion shows osolmaz asked not to merge until the sanitizer handled marker whitespace and short answers; later comments state the current head covers those cases, keeps emoji variation selectors intact, and keeps stream deltas append-only. (0559f4d1e4e8)
  • Relevant checks and proof state: GitHub check-runs for head 0559f4d show a latest Real behavior proof success and relevant node/quality checks succeeding; an earlier canceled Real behavior proof run is superseded by the later success. (0559f4d1e4e8)

Likely related people:

  • FullerStackDev: git blame attributes the current main Ollama buildAssistantMessage and streaming emission blocks to commit 44bb2be. (role: current stream implementation author; confidence: medium; commits: 44bb2be0b473; files: extensions/ollama/src/stream.ts, extensions/ollama/src/stream-runtime.test.ts)
  • Peter Steinberger: git show identifies Peter Steinberger as the committer of the current-main commit that introduced the Ollama stream/test baseline in this checkout history. (role: committer for current stream baseline; confidence: medium; commits: 44bb2be0b473; files: extensions/ollama/src/stream.ts, extensions/ollama/src/stream-runtime.test.ts)
  • osolmaz: The linked issue is assigned to osolmaz, and the PR discussion/commit list shows osolmaz drove the later sanitizer fixes after maintainer review comments. (role: linked issue assignee and reviewer/follow-up contributor; confidence: medium; commits: aaddcf0a3e56, 51636586b0d8, 0559f4d1e4e8; files: extensions/ollama/src/stream.ts, extensions/ollama/src/stream-runtime.test.ts, extensions/ollama/src/sanitizers/kimi-inline-reasoning.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an Ollama-provider response sanitizer to prevent Kimi cloud models from leaking inline “reasoning” into user-visible assistant text, and introduces regression tests around buildAssistantMessage().

Changes:

  • Add stripKimiInlineReasoningFromVisibleText(...) for kimi-k*:cloud responses and apply it when building final assistant text content.
  • Add stream-runtime tests verifying stripping for kimi-k2.6:cloud and non-stripping for non-Kimi models.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
extensions/ollama/src/stream.ts Introduces and applies a Kimi cloud inline-reasoning sanitizer during final assistant message construction.
extensions/ollama/src/stream-runtime.test.ts Adds regression tests for the sanitizer behavior in buildAssistantMessage().
Comments suppressed due to low confidence (1)

extensions/ollama/src/stream.ts:987

  • The sanitizer is only applied when building the final assistant message here; the streaming path above emits text_delta events using accumulatedContent without any stripping. If downstream consumers render streaming deltas, the inline reasoning still leaks until done. Consider applying the same stripping logic to the streaming partials/deltas (or maintaining a separate visible accumulator) so users never see the reasoning prefix during streaming.
  const text = stripKimiInlineReasoningFromVisibleText({
    modelId: modelInfo.id,
    text: response.message.content || "",
  });
  if (text) {

Comment thread extensions/ollama/src/stream.ts Outdated
Comment on lines +364 to +368
const KIMI_INLINE_REASONING_BOUNDARY = "️";
const KIMI_INLINE_REASONING_MIN_PREFIX_CHARS = 80;
const KIMI_INLINE_REASONING_MIN_ANSWER_CHARS = 8;

function stripKimiInlineReasoningFromVisibleText(params: {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current head. The delimiter is now matched as whitespace/start + \uFE0F with optional following whitespace in the Kimi sanitizer module, and the old bare indexOf marker no longer exists in stream.ts.

Comment on lines +949 to +956
it("strips inline reasoning prefix from kimi cloud visible text", () => {
const response = {
model: "kimi-k2.6:cloud",
created_at: "2026-01-01T00:00:00Z",
message: {
role: "assistant" as const,
content:
"I should think privately and not leak this planning text in the answer. I need to keep deciding what to say next. ️Final answer only.",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the current head with the regression does not treat emoji variation selectors as Kimi inline-reasoning boundaries.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Neon Signal Puff

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: hums during re-review.
Image traits: location green-check meadow; accessory miniature diff map; palette moss green and polished brass; mood curious; pose standing beside its cracked shell; shell brushed metal shell; lighting golden review-room light; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Neon Signal Puff in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 25, 2026
@jason-allen-oneal

Copy link
Copy Markdown
Contributor Author

Addressed the two blocking patch findings in this PR head (3bc2cbcb8e):

  1. Moved Kimi inline-reasoning sanitization into the streaming output path so streamed events now sanitize before publication.

    • text_delta
    • text_end
    • partial content snapshots
    • final done message content
  2. Replaced the bare U+FE0F split with a boundary-specific matcher (/(^|\s)\uFE0F(?=\S)/u) to avoid emoji variation-selector false positives.

Added regression tests in extensions/ollama/src/stream-runtime.test.ts:

  • does not treat emoji variation selectors as Kimi inline-reasoning boundaries
  • sanitizes Kimi inline reasoning in text_delta, text_end, partial, and done output

Validation rerun after fix:

  • pnpm vitest run extensions/ollama/src/stream-runtime.test.ts
  • Result: 1 file passed, 86 tests passed

After-fix real behavior proof status (live Kimi Cloud): still unavailable in this runtime due subscription gating.
Fresh attempts on this branch head:

$ printf "Please answer with one sentence only.\n" | ollama run kimi-k2.6:cloud --think false
Error: 403 Forbidden: this model requires a subscription, upgrade for access: https://ollama.com/upgrade (ref: e396ff34-3d7b-4c77-82ad-616fb47d9e59)

$ printf "Please answer with one sentence only.\n" | ollama run kimi-k2.5:cloud --think false
Error: 403 Forbidden: this model requires a subscription, upgrade for access: https://ollama.com/upgrade (ref: ca10fecd-52a2-459d-b978-878a3b610410)

Redacted local artifacts:

  • /tmp/pr-86286-proof/kimi-k26-after-fix.txt
  • /tmp/pr-86286-proof/kimi-k25-after-fix.txt

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 25, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 25, 2026
@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

Maintainer note: please do not merge this PR yet.

The direction is right: this should stay as a narrow Ollama Kimi cloud fix, and the stream path must be fixed before text_delta, text_end, partials, and done are emitted.

The current implementation is still too brittle in two ways:

  1. It only recognizes the boundary when the answer starts immediately after the marker. It should handle both forms:

    • reasoning ️answer
    • reasoning ️ answer
  2. It refuses to strip short answers. That can still leak reasoning for valid answers like:

    • reasoning ️ OK.
    • reasoning ️ Yes.

The cleaner production fix is:

  • keep the workaround scoped to Ollama Kimi cloud only
  • use one small helper to classify the accumulated visible text
  • if there is a long reasoning-looking prefix, the marker, and any non-empty answer, emit only the answer
  • during streaming, buffer Kimi cloud output until that decision is safe
  • if the stream ends without a credible marker, emit the original text normally
  • use the same helper for both live streaming and the final saved assistant message

That keeps the fix narrow and avoids guessing across other providers, while covering the actual leak shapes this PR is trying to address.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 25, 2026
@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

Implementation and validation update for head 51636586b0d8a484bcd94ccc66f45b0e60bb372c:

What changed:

  • The Ollama Kimi Cloud final-message sanitizer and live stream sanitizer now use the same visible-text decision helper.
  • The delimiter matcher covers the reported space + U+FE0F + optional whitespace boundary.
  • Short valid answers such as OK. are stripped correctly after a credible hidden prefix.
  • Streaming now buffers only while the boundary decision is still safe; if the bounded hold window is exceeded, it bypasses sanitization consistently so deltas remain append-only.
  • The fix remains scoped to Ollama Cloud Kimi models.

Validation run:

  • git diff --check passed.
  • node scripts/run-vitest.mjs extensions/ollama/src/stream-runtime.test.ts passed.
  • Real behavior proof passed using createOllamaStreamFn against a local Ollama-compatible NDJSON server; streamed text_delta, text_end, and final done content were only OK., with no hidden prefix emitted.
  • codex review --base main completed cleanly with no actionable findings after the append-only streaming fix.
  • GitHub PR checks are green; the previously failing checks-node-agentic-agents shard passed on rerun, and Real behavior proof is passing.
  • ClawSweeper re-review updated the PR to status: 👀 ready for maintainer look, proof: sufficient, and rating: 🐚 platinum hermit with no rank-up moves.

Remaining note: this is behavior proof through OpenClaw's actual Ollama stream path with a compatible test server, not a live vendor Kimi Cloud capture. The live vendor prompts I tried earlier did not reproduce the delimiter payload, but the issue-shaped payload is now covered in both unit tests and stream behavior proof.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 25, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

Final implementation/validation update for head 0559f4d1e4e800308827f0a080f9f918a91d94dc.

What changed:

  • Kimi-specific delimiter/model logic now lives in extensions/ollama/src/sanitizers/kimi-inline-reasoning.ts.
  • stream.ts uses the generic visible-content sanitizer boundary instead of owning Kimi constants or heuristics.
  • The sanitizer handles provider-qualified refs like ollama/kimi-k2.6:cloud as well as bare Kimi cloud refs.
  • Streaming final message construction no longer re-sanitizes already-sanitized visible stream output.
  • Added regressions for provider-qualified refs, emoji variation selectors, empty/short answers, tool-call-only streams, bounded markerless buffering, and visible answers that contain a later U+FE0F marker.

Validation:

  • git diff --check -> passed.
  • node scripts/run-vitest.mjs extensions/ollama/src/stream-runtime.test.ts -> passed, 95 tests.
  • pnpm check:architecture -> passed.
  • pnpm lint:extensions -- extensions/ollama/src/stream.ts extensions/ollama/src/model-behavior.ts extensions/ollama/src/sanitizers/visible-content-contract.ts extensions/ollama/src/sanitizers/visible-content.ts extensions/ollama/src/sanitizers/kimi-inline-reasoning.ts extensions/ollama/src/stream-runtime.test.ts -> passed.
  • NODE_OPTIONS=--max-old-space-size=8192 ./node_modules/.bin/tsc -p test/tsconfig/tsconfig.extensions.test.json --noEmit --pretty false -> passed.
  • node --import tsx /Users/onur/scratch-repo/openclaw-86286-real-proof.mjs -> passed; OpenClaw streamed only OK. in text_delta, text_end, and final done, with no hidden prefix emitted.
  • codex review --base main -> clean on the latest head; no discrete actionable regressions found.

PR checks:

  • GitHub checks are green on 0559f4d1e4, including check-additional-runtime-topology-architecture, check-dependencies, check-test-types, and Real behavior proof.
  • ClawSweeper re-review completed on 0559f4d1e4 and says the PR is ready for maintainer review with no contributor-facing blocker left.
  • Replied to the stale Copilot inline comments about bare U+FE0F matching and missing emoji regression coverage.

Remaining proof note: the behavior proof is through OpenClaw's actual Ollama stream path against a local Ollama-compatible NDJSON server. It is not a natural live vendor leak capture.

@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

Live vendor reproduction was not stable in our environment.

We could call Ollama Cloud with the subscribed key, but current live responses from kimi-k2.6:cloud and kimi-k2.5:cloud were clean: no inline reasoning prefix, no U+FE0F boundary, and no structured thinking/reasoning fields.

This PR still fixes the reported wire shape from the issue and has stream-path proof using OpenClaw's real Ollama handler against an Ollama-compatible stream.

@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🔧
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-25 14:02:18 UTC review queued 0559f4d1e4e8 (queued)

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 reef update

Thanks for the work on this. ClawSweeper opened a replacement PR only because the source branch was not writable from the available bot permissions. branch tides, not contributor blame.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #86515
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
Closing this one because the run was configured to close superseded source PRs after opening the replacement.
The original contribution stays credited in the replacement PR context.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against b709229.

@clawsweeper clawsweeper Bot closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge extensions: ollama merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ollama provider: missing response-level reasoning stripper for Kimi models causes inline reasoning leak to chat

3 participants