[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling by daniel-salib · Pull Request #32216 · vllm-project/vllm

daniel-salib · 2026-01-12T22:09:49Z

Purpose

Add a dedicated KimiK2ReasoningParser to handle Kimi K2's behavior of sometimes outputting tool calls without a proper </think> delimiter.

Kimi K2 uses the same <think>...</think> tokens as DeepSeek R1 for reasoning content. However, when making tool calls, the model sometimes omits the </think> token, causing tool call markers to be absorbed into the reasoning content instead of being passed to the tool parser.

This PR adds a specialized reasoning parser that:

Extends DeepSeekR1ReasoningParser for standard behavior
Detects <|tool_calls_section_begin|> markers when </think> is missing
Detects <|tool_call_begin|> markers (when section wrapper is also omitted)
Splits the output at the first tool marker boundary to properly hand off to the tool parser

Test Plan

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2-Instruct",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Test Result

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1736712000,
  "model": "moonshotai/Kimi-K2-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": "The user wants to know the weather in Tokyo. I should use the get_weather function to retrieve this information.",
        "content": null,
        "tool_calls": [
          {
            "id": "get_weather:0",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 45,
    "total_tokens": 195
  }
}

The response shows:

reasoning_content is properly extracted (not polluted with tool markers)
tool_calls are correctly parsed and returned
This works even when the model omits </think> before the tool call markers

Note

^{Cursor Bugbot is generating a summary for commit 1abf21a9c282b192dedb9817efc8e4a483667728. Configure here.}

Note

Adds dedicated parsing for Kimi K2 outputs that may omit </think> before tool calls, ensuring reasoning and tool calls are split correctly.

Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect first occurrence of "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", or "<|tool_call_begin|>" and split output accordingly
Registers the new parser under kimi_k2 in vllm/reasoning/__init__.py
Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, and first-marker splitting

^{Written by Cursor Bugbot for commit 1abf21a9c282b192dedb9817efc8e4a483667728. This will update automatically on new commits. Configure here.}

Note

Adds a dedicated parser so Kimi K2 outputs with missing </think> split correctly between reasoning and tool calls.

Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect first occurrence of "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", or "<|tool_call_begin|>" and split output accordingly, including in streaming
Registers the parser under kimi_k2 in vllm/reasoning/__init__.py
Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, first-marker splitting, and streaming

^{Written by Cursor Bugbot for commit c71941a8caa94401f30534b7d5bc96bdba625a11. This will update automatically on new commits. Configure here.}

Note

Adds a specialized parser to correctly separate reasoning from tool calls when Kimi K2 omits </think>.

Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) that detects the first "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>" and splits output accordingly
Implements equivalent logic for streaming via extract_reasoning_streaming
Registers the parser under "kimi_k2" in vllm/reasoning/__init__.py
Adds tests covering standard DeepSeek R1 behavior, missing </think> with tool markers, marker priority, singular section variant, first-marker selection, and streaming

^{Written by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.}

Note

Adds a dedicated parser to correctly separate reasoning from tool calls in Kimi K2 outputs that may omit </think>.

Introduces KimiK2ReasoningParser (extends DeepSeekR1ReasoningParser) to detect the first "<|tool_calls_section_begin|>", "<|tool_call_section_begin|>", "<|tool_call_begin|>" and split output accordingly, including streaming via extract_reasoning_streaming
Registers the parser under "kimi_k2" in vllm/reasoning/__init__.py
Adds tests (tests/reasoning/test_kimi_k2_reasoning_parser.py) covering standard DeepSeek R1 cases, missing </think> with tool markers, marker priority, singular section variant, first-marker selection, and streaming

^{Written by Cursor Bugbot for commit edc77b1. This will update automatically on new commits. Configure here.}

gemini-code-assist

Code Review

This pull request introduces a dedicated KimiK2ReasoningParser to handle malformed tool call outputs from the Kimi K2 model. The implementation is logical and the accompanying tests are thorough for the non-streaming case.

My review identifies two main points:

A critical issue where the streaming implementation is missing, which will cause incorrect parsing for streaming responses.
A high-severity suggestion to refactor the non-streaming implementation to improve maintainability by leveraging inheritance properly.

Addressing these points will make the new parser robust and consistent across both streaming and non-streaming modes.

Signed-off-by: Daniel Salib <danielsalib@meta.com>

cursor · 2026-01-12T23:25:31Z

+            for marker in TOOL_MARKERS:
+                pos = delta_text.find(marker)
+                if pos >= 0:
+                    tool_positions.append(pos)


Streaming marker detection fails across token boundaries

Medium Severity

The extract_reasoning_streaming method checks for tool markers in previous_text and delta_text separately, but never checks current_text. When a marker like <|tool_calls_section_begin|> spans the boundary between previous and delta (which happens when the tokenizer splits markers into multiple tokens), it won't be detected in either check. The tokens that form the incomplete marker get incorrectly classified as reasoning, and only subsequent tokens after the complete marker appears in previous_text are classified as content. The companion tool parser (kimi_k2_tool_parser.py) handles this correctly using a token buffer, but this reasoning parser lacks equivalent logic.

chaunceyjiang · 2026-01-13T03:07:49Z

+        "get_weather:0<|tool_call_argument_begin|>{}"
+        "<|tool_call_end|><|tool_calls_section_end|>"
+    ),
+    "reasoning": "Let me check the weather",


I’m not quite sure why “Let me check the weather” is in the reasoning field.

Is this PR mainly addressing this specific example?

Why is it in the reasoning field instead of content?

Hey! So the issue we ran into is that Kimi K2 sometimes skips the tag before jumping into tool calls. It's a bit sporadic, but when it happens the parent parser treats the entire output as reasoning - including all the tool call markers - so they never make it to the tool parser and just get dropped.

This fix basically detects tool markers as a fallback split point when is missing. I went with putting the pre-marker text in reasoning since it's likely inside an unclosed block, but if putting it in content makes more sense we can change that - the main thing is making sure the tool markers get through to the tool parser.

I also raised a discussion post for kimi k2 here incase there's a potential chat template fix for this: https://huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/61

Also experiencing this in vLLM 13.0 with kimi-k2

qandrew · 2026-01-14T05:54:24Z

@MoyanZitto (and Moonshot people) for visibility

MoyanZitto · 2026-01-20T11:20:25Z

@qandrew @daniel-salib
It's reasonable but I think the root cause of this unexpected behavior is, once again, that code refactoring broke Kimi's tool call rules: the tool_call_id must follow the format functions.func_name:idx

For example, serving.py:1551 failed to properly call chat_utils.py:1907, violating the format specification.

We should verify if the bug persists after fixing this.

Or, perhaps we should find a better way...

MoyanZitto · 2026-01-21T10:35:29Z

Pls have a look： #32768 @daniel-salib

daniel-salib · 2026-01-21T11:22:15Z

Pls have a look： #32768 @daniel-salib

perfect - I tested this fix and it fixes the issue 👍 - will close this PR and defer to the solution in #32768

daniel-salib requested review from aarnphm and chaunceyjiang as code owners January 12, 2026 22:09

gemini-code-assist Bot reviewed Jan 12, 2026

View reviewed changes

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated

cursor Bot reviewed Jan 12, 2026

View reviewed changes

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated

daniel-salib force-pushed the add-kimi-k2-reasoning-parser branch from 1abf21a to c71941a Compare January 12, 2026 22:42

cursor Bot reviewed Jan 12, 2026

View reviewed changes

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated

Comment thread vllm/reasoning/kimi_k2_reasoning_parser.py Outdated

[Frontend] Add dedicated KimiK2ReasoningParser for tool call handling

edc77b1

Signed-off-by: Daniel Salib <danielsalib@meta.com>

daniel-salib force-pushed the add-kimi-k2-reasoning-parser branch from c71941a to edc77b1 Compare January 12, 2026 23:04

cursor Bot reviewed Jan 12, 2026

View reviewed changes

chaunceyjiang reviewed Jan 13, 2026

View reviewed changes

wangln19 mentioned this pull request Jan 21, 2026

fix: preserve native tool call ID in multi-turn tool calling #32768

Merged

5 tasks

daniel-salib closed this Jan 21, 2026

Uh oh!

Conversation

daniel-salib commented Jan 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jan 12, 2026

Choose a reason for hiding this comment

Streaming marker detection fails across token boundaries

Uh oh!

chaunceyjiang Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

daniel-salib Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

christianbalbin Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

qandrew commented Jan 14, 2026

Uh oh!

MoyanZitto commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MoyanZitto commented Jan 21, 2026

Uh oh!

daniel-salib commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

daniel-salib commented Jan 12, 2026 •

edited by github-actions Bot

Loading

MoyanZitto commented Jan 20, 2026 •

edited

Loading