[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser#35687
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces a bugfix for the Qwen3 reasoning parser to correctly handle cases where <tool_call> appears inside a <think> block without a closing </think> tag. The change treats <tool_call> as an implicit end-of-reasoning marker, which prevents tool calls from being silently dropped. The implementation correctly addresses this in both streaming and non-streaming paths, and is supported by new, specific test cases.
My review identifies one area for improvement regarding performance. The implementation of is_reasoning_end can be made more efficient by avoiding multiple passes over the input tokens. A detailed suggestion is provided in the comments.
3332f54 to
a5f6f1f
Compare
73150e3 to
36f48bd
Compare
@qmx PTAL.
|
36f48bd to
4a4afe7
Compare
|
@chaunceyjiang I'll squash the commits once CI fully passes! |
|
Hi @qmx, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
0076a84 to
ea1ed91
Compare
|
@chaunceyjiang ready for review! |
91873d5 to
1a83702
Compare
|
seems like I'm hitting a few flaky tests? - rebased and pushed again |
71edf76 to
93da8f4
Compare
|
@chaunceyjiang looks like all tests are passing here now :) |
chaunceyjiang
left a comment
There was a problem hiding this comment.
LGTM.
/cc @sfeng33
|
Hi @qmx, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
@qmx please fix the pre-commit. |
c021d4b to
cb45ead
Compare
|
Hi @qmx, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Qwen3.5 models sometimes emit <tool_call> inside the <think> block without closing </think>. The reasoning parser classifies the entire output as reasoning, the tool parser receives empty content, and the tool call is silently dropped. Treat <tool_call> as an implicit end-of-reasoning marker, matching the approach in KimiK2ReasoningParser. Paired <tool_call>...</tool_call> from chat template examples are skipped so is_reasoning_end() only triggers on actual model output. Signed-off-by: Doug Campos <qmx@qmx.me>
…g parser When enable_thinking=False and model output contains <tool_call>, the extract_reasoning() method incorrectly split output into reasoning + content because the <tool_call> check ran before the thinking_enabled check. - Move the thinking_enabled early-return above the <tool_call> index lookup so disabled thinking always treats everything as content - Add test case for thinking_disabled with tool_call output Signed-off-by: Doug Campos <qmx@qmx.me>
cb45ead to
57292ea
Compare
|
@chaunceyjiang rebased + ci passing - ready to go! |
|
based |
|
@qmx I'm pretty sure you should not just check for exact tool_call token match but also partial outputed ones, Because the system prompt tells how to use the tool_call and it (the system prompt) might be tokenized without special tokens, the
Please note this is what is done there But careful there, I think they don't do it correctly because they will never have the guarantee that the delta hold more than one token Edit please take inspiration from this PR instead : #40783 |
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
…5687 mirror) Qwen3.5/3.6 models sometimes emit <tool_call> inside a <think> block without closing </think> first. With the previous parser the entire output was classified as reasoning, the qwen3_coder tool parser received empty content, and the tool call was silently dropped. Verbatim mirror of upstream vLLM PR #35687: - __init__ records <tool_call>/</tool_call> token ids - new is_reasoning_end / is_reasoning_end_streaming / extract_content_ids overrides match KimiK2ReasoningParser pattern - extract_reasoning falls back to <tool_call> as implicit reasoning end when </think> is missing - streaming variant mirrors the same 3-way branch - pair-checks <tool_call> vs </tool_call> so chat-template examples embedded in prompts do not false-fire Also drops a copy into windows_patches/ so a venv reinstall can recover the fix per the existing patch convention.
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Adrian <info@zzit.ch>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
Addresses silent tool-call drops where the model emits <tool_call> without closing </think>, causing Qwen3ReasoningParser to swallow the entire output as reasoning (QwenLM/Qwen3.6#150). Two-layer fix: 1. Bump pinned vLLM image from 2026-04-20 to 2026-05-07 nightly (commit 51f22dcf, v0.20.2rc1.dev93). Crosses the merge of vllm-project/vllm#35687 which treats <tool_call> as an implicit reasoning end in the Qwen3 parser. Applied to all 11 compose variants (35b-a3b-gguf untouched — different base image). 2. Add froggeric/Qwen-Fixed-Chat-Templates qwen3.6 template at patches/templates/qwen3.6-enhanced.jinja, mounted into each variant and passed via --chat-template. Provides defense-in-depth for the same bug (auto-closes unclosed <think> before <tool_call>) plus developer role, think_on/think_off toggles, </thinking> recognition, and non-ASCII JSON escaping. TOOL_CALLING_ISSUES.md documents the cause, what's evaluated, alternatives considered (fakezeta merged gist), and rollback steps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>
…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Purpose
Qwen3.5 models sometimes emit
<tool_call>inside the<think>block without closing</think>first. When this happens,Qwen3ReasoningParser.extract_reasoning()classifies the entire output as reasoning, the tool parser receives empty content, and the tool call is silently dropped.This is the same class of issue fixed for Kimi K2 in #33646. The fix treats
<tool_call>as an implicit end-of-reasoning marker in both streaming and non-streaming paths. Paired<tool_call>...</tool_call>from chat template examples are skipped sois_reasoning_end()only triggers on actual model output.Observed intermittently with
Qwen/Qwen3.5-35B-A3Busing--reasoning-parser qwen3 --tool-call-parser qwen3_xmlduring long multi-turn tool-calling sessions.Test Plan
New test cases added:
tool_call_no_think_end/tool_call_no_think_end_stream—<tool_call>without</think>, no<think>prefixtool_call_with_think_no_end/tool_call_with_think_no_end_stream—<tool_call>without</think>, with<think>prefixtool_call_implicit_reasoning_end— multi-token streaming delta with<tool_call>Test Result
All existing Qwen3 reasoning parser tests continue to pass. New test cases verify that
<tool_call>correctly splits reasoning from content in both streaming and non-streaming extraction.