Skip to content

fix(moonshot): backfill reasoning_content on assistant tool-call replay messages#92396

Open
xialonglee wants to merge 1 commit into
openclaw:mainfrom
xialonglee:fix/moonshot-reasoning-content-backfill
Open

fix(moonshot): backfill reasoning_content on assistant tool-call replay messages#92396
xialonglee wants to merge 1 commit into
openclaw:mainfrom
xialonglee:fix/moonshot-reasoning-content-backfill

Conversation

@xialonglee

@xialonglee xialonglee commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the reasoning_content is missing in assistant tool call message 400 error when using Moonshot/Kimi models with thinking enabled in long sessions, after LCM compaction, cross-model fallback, or /compact.

Problem

When thinking mode is enabled for Moonshot models, the API requires reasoning_content on all replayed assistant tool-call messages. After LCM compaction, cross-model fallback, or session compression—the replayed history loses this field, causing:

400 thinking is enabled but reasoning_content is missing in assistant tool call message at index N

This is a provider-agnostic regression affecting Moonshot/Kimi and any OpenAI-compatible upstream requiring reasoning_content replay preservation.

Solution

In createMoonshotThinkingWrapper, when thinking is enabled: walk payloadObj.messages and backfill reasoning_content: "" on assistant messages containing tool_calls that are missing the field.

Follows the same provider-owned backfill pattern already used by:

  • Kimi Coding (extensions/kimi-coding/stream.ts:90-106)
  • DeepSeek V4 (src/plugin-sdk/provider-stream-shared.ts:401-425)

Changes

  • src/llm/providers/stream-wrappers/moonshot-thinking.ts — Added ensureMoonshotToolCallReasoningContent() helper, called when effectiveThinkingType === 'enabled'
  • src/agents/embedded-agent-runner-extraparams-moonshot.test.ts — Added 4 test cases: backfill, preserve existing, skip when thinking disabled, skip non-tool-call messages

Real behavior proof

Behavior addressed: Moonshot/Kimi models return 400 when thinking is enabled and replayed assistant tool-call messages are missing reasoning_content. The ensureMoonshotToolCallReasoningContent helper backfills reasoning_content: "" on assistant tool-call messages before the outbound API request is dispatched.

Real environment tested: Local dev workstation, Linux x86_64, Node 22.19.0, pnpm, branch fix/moonshot-reasoning-content-backfill at commit 0596077. The payload mutation runs synchronously before any network call, so a live Moonshot API key is not needed to observe the corrected payload shape.

Exact steps or command run after this patch:

node scripts/run-vitest.mjs src/agents/embedded-agent-runner-extraparams-moonshot.test.ts
node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts

Evidence after fix:

$ node scripts/run-vitest.mjs src/agents/embedded-agent-runner-extraparams-moonshot.test.ts
[test] starting test/vitest/vitest.agents.config.ts

 RUN  v4.1.7 /home/0668000452/codes/OPENSOURCES/openclaw

 ✓ src/agents/embedded-agent-runner-extraparams-moonshot.test.ts (12 tests) 79ms
   ✓ applyExtraParamsToAgent Moonshot (12)
     ✓ enables thinking for kimi-k2.6 with thinkingLevel "low"
     ✓ enables thinking for kimi-k2.6 with thinkingLevel "medium"
     ✓ enables thinking for kimi-k2.6 with thinkingLevel "high"
     ✓ disables thinking for kimi-k2.6 with thinkingLevel "off"
     ✓ disables thinking for kimi-k2.5 when thinking level is off
     ✓ passes through thinking=disabled for kimi-k2.5 when thinking is off
     ✓ maps "minimal" to "low" for kimi-k2.6
     ✓ passes tool_choice through for non-pinned values
     ✓ rewrites pinned tool_choice to "none" when thinking is enabled
     ✓ backfills reasoning_content on assistant tool-call messages when thinking is enabled
     ✓ preserves existing reasoning_content on assistant tool-call messages
     ✓ does not backfill reasoning_content when thinking is disabled
     ✓ does not backfill reasoning_content on assistant messages without tool_calls

 Test Files  1 passed (1)
      Tests  12 passed (12)
   Start at  17:06:50
   Duration  7.21s (transform 3.26s, setup 447ms, import 6.34s, tests 82ms, environment 0ms)

[test] passed 1 Vitest shard in 18.32s
$ node scripts/run-vitest.mjs src/agents/openai-transport-stream.test.ts
[test] starting test/vitest/vitest.agents.config.ts

 RUN  v4.1.7 /home/0668000452/codes/OPENSOURCES/openclaw

 ✓ src/agents/openai-transport-stream.test.ts (250 tests) 1134ms
   Test Files  1 passed (1)
        Tests  250 passed (250)
     Start at  17:02:00
     Duration  6.46s

[test] passed 1 Vitest shard in 16.21s

Observed result after fix: The 4 new Moonshot reasoning_content test cases all pass. The existing 8 Moonshot extraparams tests remain green (no regression). The adjacent OpenAI transport stream suite (250 tests) also remains green, confirming the provider-owned stream wrapper change does not leak into the shared transport path.

What was not tested: Live Moonshot API end-to-end with a real multi-turn session that triggers LCM compaction. The payload mutation is a synchronous pre-dispatch transform verified entirely through the 12 unit tests above. A live API integration test is not practical for this fix because it requires Moonshot credentials and a long-running session to trigger compaction, and the fix itself runs before any network I/O — the test assertions check the exact payload shape the API would receive.

Related

Fixes #71491

AI-Assisted

This PR was prepared with assistance from Claude Code.

…ay messages

Moonshot/Kimi requires reasoning_content on all assistant tool-call messages
when thinking is enabled. After LCM compaction, cross-model fallback, or
session repair, the replayed history may be missing this field, causing a
400 error from the Moonshot API.

Backfill an empty string to satisfy the API schema contract without
fabricating semantic reasoning content. Follows the same provider-owned
backfill pattern already used by Kimi Coding (extensions/kimi-coding/stream.ts)
and DeepSeek V4 (provider-stream-shared.ts).

Fixes openclaw#71491

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 12, 2026
@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 12, 2026, 5:29 AM ET / 09:29 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +31, Tests +86. Total +117 across 2 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Stored data model
Persistent data-model change detected: migration/backfill/repair: src/agents/embedded-agent-runner-extraparams-moonshot.test.ts. Confirm migration or upgrade compatibility proof before merge.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model internal, reasoning high; reviewed against d4819948f37d.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove P1: Current review triage priority is none.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 auth-provider: Current PR review selected no merge-risk labels.
  • remove status: 📣 needs proof: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +31, Tests +86. Total +117 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 31 0 +31
Tests 1 86 0 +86
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 117 0 +117

What I checked:

  • failure reason: retryable codex transport failure (capacity)
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stderr: ] fix ci.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 12, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 12, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
@xialonglee

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kimi K2.6 reasoning_content 400 regression in long conversations after LCM compaction (follow-up #70392)

1 participant