[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling#32216
[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling#32216daniel-salib wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a dedicated KimiK2ReasoningParser to handle malformed tool call outputs from the Kimi K2 model. The implementation is logical and the accompanying tests are thorough for the non-streaming case.
My review identifies two main points:
- A critical issue where the streaming implementation is missing, which will cause incorrect parsing for streaming responses.
- A high-severity suggestion to refactor the non-streaming implementation to improve maintainability by leveraging inheritance properly.
Addressing these points will make the new parser robust and consistent across both streaming and non-streaming modes.
1abf21a to
c71941a
Compare
Signed-off-by: Daniel Salib <danielsalib@meta.com>
c71941a to
edc77b1
Compare
| for marker in TOOL_MARKERS: | ||
| pos = delta_text.find(marker) | ||
| if pos >= 0: | ||
| tool_positions.append(pos) |
There was a problem hiding this comment.
Streaming marker detection fails across token boundaries
Medium Severity
The extract_reasoning_streaming method checks for tool markers in previous_text and delta_text separately, but never checks current_text. When a marker like <|tool_calls_section_begin|> spans the boundary between previous and delta (which happens when the tokenizer splits markers into multiple tokens), it won't be detected in either check. The tokens that form the incomplete marker get incorrectly classified as reasoning, and only subsequent tokens after the complete marker appears in previous_text are classified as content. The companion tool parser (kimi_k2_tool_parser.py) handles this correctly using a token buffer, but this reasoning parser lacks equivalent logic.
| "get_weather:0<|tool_call_argument_begin|>{}" | ||
| "<|tool_call_end|><|tool_calls_section_end|>" | ||
| ), | ||
| "reasoning": "Let me check the weather", |
There was a problem hiding this comment.
I’m not quite sure why “Let me check the weather” is in the reasoning field.
There was a problem hiding this comment.
Is this PR mainly addressing this specific example?
Why is it in the reasoning field instead of content?
There was a problem hiding this comment.
Hey! So the issue we ran into is that Kimi K2 sometimes skips the tag before jumping into tool calls. It's a bit sporadic, but when it happens the parent parser treats the entire output as reasoning - including all the tool call markers - so they never make it to the tool parser and just get dropped.
This fix basically detects tool markers as a fallback split point when is missing. I went with putting the pre-marker text in reasoning since it's likely inside an unclosed block, but if putting it in content makes more sense we can change that - the main thing is making sure the tool markers get through to the tool parser.
I also raised a discussion post for kimi k2 here incase there's a potential chat template fix for this: https://huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/61
There was a problem hiding this comment.
Also experiencing this in vLLM 13.0 with kimi-k2
|
@MoyanZitto (and Moonshot people) for visibility |
|
@qandrew @daniel-salib For example, serving.py:1551 failed to properly call chat_utils.py:1907, violating the format specification. We should verify if the bug persists after fixing this. Or, perhaps we should find a better way... |
|
Pls have a look: #32768 @daniel-salib |
perfect - I tested this fix and it fixes the issue 👍 - will close this PR and defer to the solution in #32768 |
Purpose
Add a dedicated
KimiK2ReasoningParserto handle Kimi K2's behavior of sometimes outputting tool calls without a proper</think>delimiter.Kimi K2 uses the same
<think>...</think>tokens as DeepSeek R1 for reasoning content. However, when making tool calls, the model sometimes omits the</think>token, causing tool call markers to be absorbed into the reasoning content instead of being passed to the tool parser.This PR adds a specialized reasoning parser that:
DeepSeekR1ReasoningParserfor standard behavior<|tool_calls_section_begin|>markers when</think>is missing<|tool_call_begin|>markers (when section wrapper is also omitted)Test Plan
Test Result
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1736712000, "model": "moonshotai/Kimi-K2-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "reasoning_content": "The user wants to know the weather in Tokyo. I should use the get_weather function to retrieve this information.", "content": null, "tool_calls": [ { "id": "get_weather:0", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"Tokyo\"}" } } ] }, "finish_reason": "tool_calls" } ], "usage": { "prompt_tokens": 150, "completion_tokens": 45, "total_tokens": 195 } }The response shows:
reasoning_contentis properly extracted (not polluted with tool markers)tool_callsare correctly parsed and returned</think>before the tool call markersNote
Cursor Bugbot is generating a summary for commit 1abf21a9c282b192dedb9817efc8e4a483667728. Configure here.
Note
Adds dedicated parsing for Kimi K2 outputs that may omit
</think>before tool calls, ensuring reasoning and tool calls are split correctly.KimiK2ReasoningParser(extendsDeepSeekR1ReasoningParser) to detect first occurrence of"<|tool_calls_section_begin|>","<|tool_call_section_begin|>", or"<|tool_call_begin|>"and split output accordinglykimi_k2invllm/reasoning/__init__.py</think>with tool markers, marker priority, singular section variant, and first-marker splittingWritten by Cursor Bugbot for commit 1abf21a9c282b192dedb9817efc8e4a483667728. This will update automatically on new commits. Configure here.
Note
Adds a dedicated parser so Kimi K2 outputs with missing
</think>split correctly between reasoning and tool calls.KimiK2ReasoningParser(extendsDeepSeekR1ReasoningParser) to detect first occurrence of"<|tool_calls_section_begin|>","<|tool_call_section_begin|>", or"<|tool_call_begin|>"and split output accordingly, including in streamingkimi_k2invllm/reasoning/__init__.py</think>with tool markers, marker priority, singular section variant, first-marker splitting, and streamingWritten by Cursor Bugbot for commit c71941a8caa94401f30534b7d5bc96bdba625a11. This will update automatically on new commits. Configure here.
Note
Adds a specialized parser to correctly separate reasoning from tool calls when Kimi K2 omits
</think>.KimiK2ReasoningParser(extendsDeepSeekR1ReasoningParser) that detects the first"<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>"and splits output accordinglyextract_reasoning_streaming"kimi_k2"invllm/reasoning/__init__.py</think>with tool markers, marker priority, singular section variant, first-marker selection, and streamingWritten by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.
Note
Adds a dedicated parser to correctly separate reasoning from tool calls in Kimi K2 outputs that may omit
</think>.KimiK2ReasoningParser(extendsDeepSeekR1ReasoningParser) to detect the first"<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>"and split output accordingly, including streaming viaextract_reasoning_streaming"kimi_k2"invllm/reasoning/__init__.pytests/reasoning/test_kimi_k2_reasoning_parser.py) covering standard DeepSeek R1 cases, missing</think>with tool markers, marker priority, singular section variant, first-marker selection, and streamingWritten by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.