Skip to content

[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling#32216

Closed
daniel-salib wants to merge 1 commit into
vllm-project:mainfrom
daniel-salib:add-kimi-k2-reasoning-parser
Closed

[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling#32216
daniel-salib wants to merge 1 commit into
vllm-project:mainfrom
daniel-salib:add-kimi-k2-reasoning-parser

Conversation

@daniel-salib

@daniel-salib daniel-salib commented Jan 12, 2026

Copy link
Copy Markdown
Contributor

Purpose

Add a dedicated KimiK2ReasoningParser to handle Kimi K2's behavior of sometimes outputting tool calls without a proper </think> delimiter.

Kimi K2 uses the same <think>...</think> tokens as DeepSeek R1 for reasoning content. However, when making tool calls, the model sometimes omits the </think> token, causing tool call markers to be absorbed into the reasoning content instead of being passed to the tool parser.

This PR adds a specialized reasoning parser that:

  • Extends DeepSeekR1ReasoningParser for standard behavior
  • Detects <|tool_calls_section_begin|> markers when </think> is missing
  • Detects <|tool_call_begin|> markers (when section wrapper is also omitted)
  • Splits the output at the first tool marker boundary to properly hand off to the tool parser

Test Plan

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Instruct",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Test Result

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1736712000,
  "model": "moonshotai/Kimi-K2-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "The user wants to know the weather in Tokyo. I should use the get_weather function to retrieve this information.",
        "content": null,
        "tool_calls": [
          {
            "id": "get_weather:0",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 45,
    "total_tokens": 195
  }
}

The response shows:

  • reasoning_content is properly extracted (not polluted with tool markers)
  • tool_calls are correctly parsed and returned
  • This works even when the model omits </think> before the tool call markers

Note

Cursor Bugbot is generating a summary for commit 1abf21a9c282b192dedb9817efc8e4a483667728. Configure here.


Note

Adds dedicated parsing for Kimi K2 outputs that may omit </think> before tool calls, ensuring reasoning and tool calls are split correctly.

  • Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect first occurrence of "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", or "<|tool_call_begin|>" and split output accordingly
  • Registers the new parser under kimi_k2 in vllm/reasoning/__init__.py
  • Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, and first-marker splitting

Written by Cursor Bugbot for commit 1abf21a9c282b192dedb9817efc8e4a483667728. This will update automatically on new commits. Configure here.


Note

Adds a dedicated parser so Kimi K2 outputs with missing </think> split correctly between reasoning and tool calls.

  • Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect first occurrence of "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", or "<|tool_call_begin|>" and split output accordingly, including in streaming
  • Registers the parser under kimi_k2 in vllm/reasoning/__init__.py
  • Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, first-marker splitting, and streaming

Written by Cursor Bugbot for commit c71941a8caa94401f30534b7d5bc96bdba625a11. This will update automatically on new commits. Configure here.


Note

Adds a specialized parser to correctly separate reasoning from tool calls when Kimi K2 omits </think>.

  • Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) that detects the first "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>" and splits output accordingly
  • Implements equivalent logic for streaming via extract_reasoning_streaming
  • Registers the parser under "kimi_k2" in vllm/reasoning/__init__.py
  • Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, first-marker selection, and streaming

Written by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.


Note

Adds a dedicated parser to correctly separate reasoning from tool calls in Kimi K2 outputs that may omit </think>.

  • Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect the first "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>" and split output accordingly, including streaming via extract_reasoning_streaming
  • Registers the parser under "kimi_k2" in vllm/reasoning/__init__.py
  • Adds tests (tests/reasoning/test_kimi_k2_reasoning_parser.py) covering standard DeepSeek R1 cases, missing </think> with tool markers, marker priority, singular section variant, first-marker selection, and streaming

Written by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dedicated KimiK2ReasoningParser to handle malformed tool call outputs from the Kimi K2 model. The implementation is logical and the accompanying tests are thorough for the non-streaming case.

My review identifies two main points:

  1. A critical issue where the streaming implementation is missing, which will cause incorrect parsing for streaming responses.
  2. A high-severity suggestion to refactor the non-streaming implementation to improve maintainability by leveraging inheritance properly.

Addressing these points will make the new parser robust and consistent across both streaming and non-streaming modes.

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py
Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated
Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated
@daniel-salib daniel-salib force-pushed the add-kimi-k2-reasoning-parser branch from 1abf21a to c71941a Compare January 12, 2026 22:42
Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated
Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated
Signed-off-by: Daniel Salib <danielsalib@meta.com>
@daniel-salib daniel-salib force-pushed the add-kimi-k2-reasoning-parser branch from c71941a to edc77b1 Compare January 12, 2026 23:04
for marker in TOOL_MARKERS:
pos = delta_text.find(marker)
if pos >= 0:
tool_positions.append(pos)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming marker detection fails across token boundaries

Medium Severity

The extract_reasoning_streaming method checks for tool markers in previous_text and delta_text separately, but never checks current_text. When a marker like <|tool_calls_section_begin|> spans the boundary between previous and delta (which happens when the tokenizer splits markers into multiple tokens), it won't be detected in either check. The tokens that form the incomplete marker get incorrectly classified as reasoning, and only subsequent tokens after the complete marker appears in previous_text are classified as content. The companion tool parser (kimi_k2_tool_parser.py) handles this correctly using a token buffer, but this reasoning parser lacks equivalent logic.

Fix in Cursor Fix in Web

"get_weather:0<|tool_call_argument_begin|>{}"
"<|tool_call_end|><|tool_calls_section_end|>"
),
"reasoning": "Let me check the weather",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not quite sure why “Let me check the weather” is in the reasoning field.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR mainly addressing this specific example?

Why is it in the reasoning field instead of content?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! So the issue we ran into is that Kimi K2 sometimes skips the tag before jumping into tool calls. It's a bit sporadic, but when it happens the parent parser treats the entire output as reasoning - including all the tool call markers - so they never make it to the tool parser and just get dropped.

This fix basically detects tool markers as a fallback split point when is missing. I went with putting the pre-marker text in reasoning since it's likely inside an unclosed block, but if putting it in content makes more sense we can change that - the main thing is making sure the tool markers get through to the tool parser.

I also raised a discussion post for kimi k2 here incase there's a potential chat template fix for this: https://huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/61

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also experiencing this in vLLM 13.0 with kimi-k2

@qandrew

qandrew commented Jan 14, 2026

Copy link
Copy Markdown
Contributor

@MoyanZitto (and Moonshot people) for visibility

@MoyanZitto

MoyanZitto commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

@qandrew @daniel-salib
It's reasonable but I think the root cause of this unexpected behavior is, once again, that code refactoring broke Kimi's tool call rules: the tool_call_id must follow the format functions.func_name:idx

For example, serving.py:1551 failed to properly call chat_utils.py:1907, violating the format specification.

We should verify if the bug persists after fixing this.

Or, perhaps we should find a better way...

@MoyanZitto

Copy link
Copy Markdown
Contributor

Pls have a look: #32768 @daniel-salib

@daniel-salib

Copy link
Copy Markdown
Contributor Author

Pls have a look: #32768 @daniel-salib

perfect - I tested this fix and it fixes the issue 👍 - will close this PR and defer to the solution in #32768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants