Your current environment
current main - irrespective of env
🐛 Describe the bug
Sometimes with Kimi K2.5, the parser drops the tool calls in tool_choice: auto (without constrained decoding).
When the model generates tool calls during streaming, the model sometimes emits a \n (newline) between the <|tool_call_begin|> token and the function name. For example, instead of producing:
<|tool_call_begin|>functions.edit:15<|tool_call_argument_begin|>{"path": "..."}
it produces:
<|tool_call_begin|>
functions.edit:15<|tool_call_argument_begin|>{"path": "..."}
(I am unable to provide the full trace here because it has a lot of personal crap in it, and I only observe this with long contexts and unable to repro as a canonical example).
https://github.com/vllm-project/vllm/blob/main/vllm/tool_parsers/kimi_k2_tool_parser.py#L70
self.stream_tool_call_portion_regex = re.compile(
r"(?P<tool_call_id>.+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*)"
)
self.stream_tool_call_name_regex = re.compile(r"(?P<tool_call_id>.+:\d+)\s*")
From my reading:
stream_tool_call_portion_regex -- matches the full pattern: tool call ID + arguments
stream_tool_call_name_regex -- matches just the tool call ID portion
Both use .+ to match the tool call ID (e.g. functions.edit:15). The problem is that in Python, . does not match \n by default. So when the model inserts that stray newline, .+ fails to match across it, and the parser silently fails to detect the tool call -- the streaming response just drops the tool call entirely.
Resolution: adding a \s* and _re.DOTALL worked for me.
With DOTALL, . matches any character including \n, so .+ in the capture group (?P<tool_call_id>.+:\d+) now correctly spans the newline and captures the full tool call ID.
self.stream_tool_call_portion_regex = _re.compile(
r"\s*(?P<tool_call_id>.+:\d+)\s*" r"<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*)",
_re.DOTALL,
)
self.stream_tool_call_name_regex = _re.compile(r"\s*(?P<tool_call_id>.+:\d+)\s*", _re.DOTALL)
Call for further investigation: I dont think this has side-effects from my testing on Kimi K2.5. But maybe, something leaks into Kimi K2 Thinking - but I cannot find enough time to get this testing in.
Before submitting a new issue...
Your current environment
current main - irrespective of env
🐛 Describe the bug
Sometimes with Kimi K2.5, the parser drops the tool calls in
tool_choice: auto(without constrained decoding).When the model generates tool calls during streaming, the model sometimes emits a
\n(newline) between the<|tool_call_begin|>token and the function name. For example, instead of producing:it produces:
(I am unable to provide the full trace here because it has a lot of personal crap in it, and I only observe this with long contexts and unable to repro as a canonical example).
https://github.com/vllm-project/vllm/blob/main/vllm/tool_parsers/kimi_k2_tool_parser.py#L70
From my reading:
stream_tool_call_portion_regex-- matches the full pattern: tool call ID + argumentsstream_tool_call_name_regex-- matches just the tool call ID portionBoth use
.+to match the tool call ID (e.g.functions.edit:15). The problem is that in Python,.does not match\nby default. So when the model inserts that stray newline,.+fails to match across it, and the parser silently fails to detect the tool call -- the streaming response just drops the tool call entirely.Resolution: adding a
\s*and_re.DOTALLworked for me.With
DOTALL,.matches any character including\n, so.+in the capture group(?P<tool_call_id>.+:\d+)now correctly spans the newline and captures the full tool call ID.Call for further investigation: I dont think this has side-effects from my testing on Kimi K2.5. But maybe, something leaks into Kimi K2 Thinking - but I cannot find enough time to get this testing in.
Before submitting a new issue...