Skip to content

fix(moonshot): preserve native Kimi tool_call IDs in openai-completions replay#70030

Merged
steipete merged 2 commits into
openclaw:mainfrom
Cierra0506:fix/moonshot-preserve-native-tool-call-ids
Apr 23, 2026
Merged

fix(moonshot): preserve native Kimi tool_call IDs in openai-completions replay#70030
steipete merged 2 commits into
openclaw:mainfrom
Cierra0506:fix/moonshot-preserve-native-tool-call-ids

Conversation

@Cierra0506

Copy link
Copy Markdown
Contributor

Summary

  • Problem: Moonshot's bundled OpenAI-compatible replay policy strict-sanitizes tool_call IDs down to [a-zA-Z0-9], which rewrites Kimi K2.6's native IDs (functions.<name>:<index>, e.g., functions.read:0functionsread0). Kimi's serving layer then fails to match the mangled IDs back to the original tool definitions in multi-turn history.
  • Why it matters: All multi-turn agentic flows through Kimi K2.6 break after 2–3 tool-calling rounds, with finish_reason: "stop" returned instead of "tool_calls" ~80% of the time.
  • What changed: Added a sanitizeToolCallIds opt-out to the shared openai-compatible replay family helper (buildOpenAICompatibleReplayPolicy + buildProviderReplayFamilyHooks), and wired the Moonshot plugin to opt out. Default behavior for all other openai-compatible providers is unchanged.
  • What did NOT change (scope boundary): Not touching the generic ToolCallIdMode enum or sanitizeToolCallId() implementation. Not touching the kimi-coding plugin (its policy is minimal and runs on anthropic-messages, not openai-completions; the 2026-04-10 comment about it being "also affected" is not reproducible from the code and deserves its own repro before scope expansion). No @mariozechner/pi-ai patches.

Change Type (select all)

  • Bug fix
  • Refactor required for the fix

Scope (select all touched areas)

  • API / contracts
  • Integrations

(Additive opt-out on buildProviderReplayFamilyHooks / buildOpenAICompatibleReplayPolicy; Moonshot provider plugin wiring.)

Linked Issue/PR

Root Cause

  • Root cause: Moonshot's plugin entry spread ...OPENAI_COMPATIBLE_REPLAY_HOOKS, which unconditionally returns { sanitizeToolCallIds: true, toolCallIdMode: "strict" }. Kimi K2.6 returns IDs containing . and :, which are not in [a-zA-Z0-9], so strict sanitization drops them. The mangled ID is then sent back in conversation history; Kimi's serving layer can't match it, and emits text instead of a structured tool call.
  • Missing detection / guardrail: No family-level opt-out from ID sanitization existed for openai-compatible transports — the only per-provider dial was to re-implement the whole policy locally (as mistral does with "strict9"). That friction pushed Moonshot toward copying the default.
  • Contributing context (if known): OpenAI's own call_<uuid> and the generic alphanumeric tests pass through sanitization losslessly, so the bug only surfaces for providers whose native IDs contain non-alphanumeric structure. Kimi is the first such case in the bundled set.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file:
    • src/plugins/provider-replay-helpers.test.ts — two new cases for buildOpenAICompatibleReplayPolicy(api, { sanitizeToolCallIds: false }) on both openai-completions and openai-responses.
    • src/plugin-sdk/provider-model-shared.test.ts — new case for buildProviderReplayFamilyHooks({ family: "openai-compatible", sanitizeToolCallIds: false }).
    • extensions/moonshot/index.test.ts — flipped the existing plugin-boundary assertion: Moonshot's policy must not carry sanitizeToolCallIds or toolCallIdMode on openai-completions, while keeping applyAssistantFirstOrderingFix/validateGeminiTurns/validateAnthropicTurns.
  • Scenario the test should lock in: with the Moonshot plugin registered against openai-completions, the resolved policy passes through tool_call IDs unchanged so native functions.<name>:<index> IDs survive round-trip.
  • Why this is the smallest reliable guardrail: the bug sits on a pure helper → family hook → plugin entry chain. Unit-level assertions at each of the three seams catch regressions without requiring full transcript-replay integration runs; the existing sanitize-session-history harness already owns the cross-cutting ID-rewriting contract.
  • Existing test that already covers this (if any): src/agents/pi-embedded-runner.openai-tool-id-preservation.test.ts owns the downstream replay behavior but is keyed on toolCallIdMode: "strict". With Moonshot opting out, those paths simply don't apply — no contradiction.

User-visible / Behavior Changes

  • Moonshot / Kimi K2.6 multi-turn tool calling over openai-completions now survives more than ~2 rounds.
  • No behavioral change for any other bundled or third-party provider on the openai-compatible family; the new option defaults to true (existing behavior) and must be explicitly opted out.

Diagram

Before (strict sanitization, openai-compatible family):
  Kimi returns: functions.read:0
    -> strict sanitize [^a-zA-Z0-9] -> functionsread0
    -> sent back as tool_call_id in next turn
    -> Kimi serving layer fails to match, emits finish_reason: "stop"

After (Moonshot opts out of sanitization):
  Kimi returns: functions.read:0
    -> pass-through (no sanitize step)
    -> sent back as tool_call_id in next turn
    -> Kimi serving layer matches; finish_reason: "tool_calls"

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A.

Repro + Verification

Environment

  • OS: macOS (Darwin 25.3.0, Apple Silicon)
  • Runtime/container: local Node 22 pnpm workspace checkout
  • Model/provider: Moonshot provider plugin with Kimi K2.6 on the openai-completions transport (issue reporter's CanopyWave endpoint; bug is endpoint-independent, governed by OpenClaw's own replay policy)
  • Integration/channel (if any): N/A (replay policy is transport-layer)
  • Relevant config (redacted): N/A

Steps

  1. Register the Moonshot plugin (bundled).
  2. Resolve its replay policy for { modelApi: "openai-completions", modelId: "kimi-k2.6" }.
  3. Confirm the resolved policy carries applyAssistantFirstOrderingFix / validateGeminiTurns / validateAnthropicTurns but does not carry sanitizeToolCallIds or toolCallIdMode.
  4. Confirm other openai-compatible providers (e.g., xai on openai-completions) continue to carry sanitizeToolCallIds: true and toolCallIdMode: "strict".

Expected

  • Moonshot policy passes native tool_call IDs through untouched.
  • All other openai-compatible provider defaults unchanged.

Actual

  • Matches expected. Three test suites lock this in at pure-helper, family-hook, and plugin-entry layers.

Evidence

  • Failing test/log before + passing after

Before the fix (on the new tests):

FAIL  extensions/moonshot/index.test.ts > moonshot provider plugin > owns replay policy for OpenAI-compatible Moonshot transports without mangling native Kimi tool_call IDs
AssertionError: expected { sanitizeToolCallIds: true, …(4) } to not have property "sanitizeToolCallIds"

After the fix:

Test Files  2 passed (2)
Tests       15 passed (15)   (unit-fast)

Test Files  1 passed (1)
Tests       2 passed (2)     (extension-providers)

Issue reporter's independent streaming-API measurements (pre-merge):

ID format Pass rate (5 trials, 66K system prompt)
functionsread0 (current strict) 1/5 (20%)
functions.read:0 (native, as this PR preserves) 5/5 (100%)
call_abc123def456 (OpenAI style) 5/5 (100%)

Human Verification (required)

What I personally verified (not just CI), and how:

  • Verified scenarios:
    • Unit: buildOpenAICompatibleReplayPolicy("openai-completions", { sanitizeToolCallIds: false }) omits both sanitizeToolCallIds and toolCallIdMode and keeps the openai-completions-shaped fields.
    • Unit: same for "openai-responses" (which has the non-openai-completions shape).
    • Family hook: buildProviderReplayFamilyHooks({ family: "openai-compatible", sanitizeToolCallIds: false }) threads through to the helper.
    • Plugin entry: Moonshot plugin resolves a policy that matches the opted-out shape.
    • Ran pnpm build — bundled plugin dist emits cleanly, no [INEFFECTIVE_DYNAMIC_IMPORT] warnings.
    • Ran pnpm plugin-sdk:api:check — no baseline drift (the added optional field lives on a module-internal type).
    • Ran pnpm check:changed — typecheck (core + core tests + extensions + extension tests), lint (core + extensions), import cycles, webhook/pairing guards all green.
  • Edge cases checked:
    • Default behavior unchanged when no option passed (existing tests locked in).
    • Opt-out works independently of the openai-completions vs openai-responses branch (both covered by unit tests).
    • Other bundled openai-compatible providers (xAI, etc.) untouched because they continue to spread OPENAI_COMPATIBLE_REPLAY_HOOKS without the opt-out.
  • What I did not verify:
    • Live round-trip against a real Kimi K2.6 endpoint from this checkout. The issue reporter's pre-fix measurements (patched locally) already cover the empirical side; this PR is the structural delivery of that fix.
    • kimi-coding plugin behavior. The 2026-04-10 comment flagging it as "also affected" is not consistent with its code (api: "anthropic-messages", minimal KIMI_REPLAY_POLICY, default sanitizeToolCallIds: false merge). Left for a separate issue/PR with a fresh repro.
  • Unrelated CI signal: pnpm check:changed reports 2 failures in src/agents/tools/web-fetch.provider-fallback.test.ts (SSRF guard rejects DNS that resolves to a private IP in my local network environment). These fail identically on a stashed clean upstream/mainnot introduced by this PR. All guards relevant to this diff are green.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes (sanitizeToolCallIds defaults to true; OPENAI_COMPATIBLE_REPLAY_HOOKS unchanged; only Moonshot opts out).
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A.

Risks and Mitigations

  • Risk: A future bundled openai-compatible provider appears whose ID format does need sanitization and accidentally copies the Moonshot pattern.
    • Mitigation: The opt-out is explicit (sanitizeToolCallIds: false); defaults still sanitize. The inline comment at extensions/moonshot/index.ts documents the rationale so future readers understand the opt-out is Kimi-specific.
  • Risk: A third-party Moonshot-compatible gateway returns tool_call IDs that a different endpoint would reject.
    • Mitigation: OpenAI and Moonshot API docs both accept arbitrary string IDs on tool_call_id; Kimi's native format is already valid wire-level and works across all three formats tested in the issue.

@greptile-apps

greptile-apps Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a sanitizeToolCallIds opt-out to buildOpenAICompatibleReplayPolicy and buildProviderReplayFamilyHooks, then wires Moonshot to opt out, so that Kimi K2+'s native functions.<name>:<index> tool_call ID format is passed through untouched in multi-turn replay. The fix is additive and fully backward-compatible: all other openai-compatible providers continue to use the default sanitizeToolCallIds: true behavior via the unchanged OPENAI_COMPATIBLE_REPLAY_HOOKS constant. Implementation is clean, tests cover all three seams (helper → family hook → plugin entry), and the inline comment in extensions/moonshot/index.ts clearly documents the rationale.

Confidence Score: 5/5

Safe to merge — additive opt-out with correct default, well-tested at three seams, no behavioral change for any other provider.

The change is minimal and strictly additive. The ?? true fallback in buildOpenAICompatibleReplayPolicy preserves the existing behavior for all callers that don't pass the new option. Moonshot is the only bundled provider opting out, and the three-layer unit test coverage locks in the contract. No security, data-loss, or runtime-error risk identified.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(moonshot): preserve native Kimi tool..." | Re-trigger Greptile

@steipete steipete force-pushed the fix/moonshot-preserve-native-tool-call-ids branch from 2a96c52 to d4fffd4 Compare April 23, 2026 00:41
@steipete steipete force-pushed the fix/moonshot-preserve-native-tool-call-ids branch from d4fffd4 to 36037fc Compare April 23, 2026 00:50
@steipete steipete merged commit c4dea58 into openclaw:main Apr 23, 2026
73 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase merge. Thanks @LeoDu0314.

Validation:

  • Local full gate: pnpm build && pnpm check && pnpm check:test-types && pnpm test
  • Local follow-up for updated base: pnpm test:contracts:plugins
  • CI: fresh rebased branch green before merge
  • Live review smoke: moonshot/kimi-k2.6 via .profile passed direct tool-replay sanity (reasoning_content retained; native Kimi tool_call IDs replayed)

Landed SHAs:

  • c4dea58 fix(moonshot): preserve native Kimi tool_call IDs in openai-completions replay
  • 23a4489 fix(xai): declare websocket runtime dependency

@steipete

Copy link
Copy Markdown
Contributor

Follow-up issue closure:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strict tool_call ID sanitization breaks Kimi K2.5 multi-turn tool calling

2 participants