[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser by qmx · Pull Request #35687 · vllm-project/vllm

qmx · 2026-03-01T23:30:39Z

Purpose

Qwen3.5 models sometimes emit <tool_call> inside the <think> block without closing </think> first. When this happens, Qwen3ReasoningParser.extract_reasoning() classifies the entire output as reasoning, the tool parser receives empty content, and the tool call is silently dropped.

This is the same class of issue fixed for Kimi K2 in #33646. The fix treats <tool_call> as an implicit end-of-reasoning marker in both streaming and non-streaming paths. Paired <tool_call>...</tool_call> from chat template examples are skipped so is_reasoning_end() only triggers on actual model output.

Observed intermittently with Qwen/Qwen3.5-35B-A3B using --reasoning-parser qwen3 --tool-call-parser qwen3_xml during long multi-turn tool-calling sessions.

Test Plan

pytest tests/reasoning/test_qwen3_reasoning_parser.py -v

New test cases added:

tool_call_no_think_end / tool_call_no_think_end_stream — <tool_call> without </think>, no <think> prefix
tool_call_with_think_no_end / tool_call_with_think_no_end_stream — <tool_call> without </think>, with <think> prefix
tool_call_implicit_reasoning_end — multi-token streaming delta with <tool_call>

Test Result

All existing Qwen3 reasoning parser tests continue to pass. New test cases verify that <tool_call> correctly splits reasoning from content in both streaming and non-streaming extraction.

github-actions · 2026-03-01T23:30:49Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a bugfix for the Qwen3 reasoning parser to correctly handle cases where <tool_call> appears inside a <think> block without a closing </think> tag. The change treats <tool_call> as an implicit end-of-reasoning marker, which prevents tool calls from being silently dropped. The implementation correctly addresses this in both streaming and non-streaming paths, and is supported by new, specific test cases.

My review identifies one area for improvement regarding performance. The implementation of is_reasoning_end can be made more efficient by avoiding multiple passes over the input tokens. A detailed suggestion is provided in the comments.

chaunceyjiang · 2026-03-03T07:23:46Z

@qmx PTAL.

qmx · 2026-03-04T02:31:39Z

@chaunceyjiang I'll squash the commits once CI fully passes!

mergify · 2026-03-04T02:35:29Z

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

qmx · 2026-03-04T14:02:47Z

@chaunceyjiang ready for review!

qmx · 2026-03-06T14:04:03Z

seems like I'm hitting a few flaky tests? - rebased and pushed again

bbartels · 2026-03-16T12:10:43Z

@chaunceyjiang looks like all tests are passing here now :)

chaunceyjiang

LGTM.

/cc @sfeng33

mergify · 2026-04-17T05:49:04Z

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

chaunceyjiang · 2026-04-17T05:50:03Z

@qmx please fix the pre-commit.

mergify · 2026-04-20T03:20:29Z

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Qwen3.5 models sometimes emit <tool_call> inside the <think> block without closing </think>. The reasoning parser classifies the entire output as reasoning, the tool parser receives empty content, and the tool call is silently dropped. Treat <tool_call> as an implicit end-of-reasoning marker, matching the approach in KimiK2ReasoningParser. Paired <tool_call>...</tool_call> from chat template examples are skipped so is_reasoning_end() only triggers on actual model output. Signed-off-by: Doug Campos <qmx@qmx.me>

…g parser When enable_thinking=False and model output contains <tool_call>, the extract_reasoning() method incorrectly split output into reasoning + content because the <tool_call> check ran before the thinking_enabled check. - Move the thinking_enabled early-return above the <tool_call> index lookup so disabled thinking always treats everything as content - Add test case for thinking_disabled with tool_call output Signed-off-by: Doug Campos <qmx@qmx.me>

qmx · 2026-04-24T00:19:00Z

@chaunceyjiang rebased + ci passing - ready to go!

ExtReMLapin · 2026-04-24T06:32:37Z

based

ExtReMLapin · 2026-04-24T07:00:10Z

@qmx I'm pretty sure you should not just check for exact tool_call token match but also partial outputed ones,

Because the system prompt tells how to use the tool_call and it (the system prompt) might be tokenized without special tokens, the <tool_call> instruction from system prompt might be outputed as the following tokens instead of the exact tool call token :

< + tool + _call + >

Please note this is what is done there

vllm/vllm/tool_parsers/qwen3coder_tool_parser.py

Line 396 in 01acf96

or self.tool_call_start_token in delta_text

But careful there, I think they don't do it correctly because they will never have the guarantee that the delta hold more than one token

Edit please take inspiration from this PR instead : #40783

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…5687 mirror) Qwen3.5/3.6 models sometimes emit <tool_call> inside a <think> block without closing </think> first. With the previous parser the entire output was classified as reasoning, the qwen3_coder tool parser received empty content, and the tool call was silently dropped. Verbatim mirror of upstream vLLM PR #35687: - __init__ records <tool_call>/</tool_call> token ids - new is_reasoning_end / is_reasoning_end_streaming / extract_content_ids overrides match KimiK2ReasoningParser pattern - extract_reasoning falls back to <tool_call> as implicit reasoning end when </think> is missing - streaming variant mirrors the same 3-way branch - pair-checks <tool_call> vs </tool_call> so chat-template examples embedded in prompts do not false-fire Also drops a copy into windows_patches/ so a venv reinstall can recover the fix per the existing patch convention.

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Adrian <info@zzit.ch>

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>

Addresses silent tool-call drops where the model emits <tool_call> without closing </think>, causing Qwen3ReasoningParser to swallow the entire output as reasoning (QwenLM/Qwen3.6#150). Two-layer fix: 1. Bump pinned vLLM image from 2026-04-20 to 2026-05-07 nightly (commit 51f22dcf, v0.20.2rc1.dev93). Crosses the merge of vllm-project/vllm#35687 which treats <tool_call> as an implicit reasoning end in the Qwen3 parser. Applied to all 11 compose variants (35b-a3b-gguf untouched — different base image). 2. Add froggeric/Qwen-Fixed-Chat-Templates qwen3.6 template at patches/templates/qwen3.6-enhanced.jinja, mounted into each variant and passed via --chat-template. Provides defense-in-depth for the same bug (auto-closes unclosed <think> before <tool_call>) plus developer role, think_on/think_off toggles, </thinking> recognition, and non-ASCII JSON escaping. TOOL_CALLING_ISSUES.md documents the cause, what's evaluated, alternatives considered (fakezeta merged gist), and rollback steps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

mergify Bot added qwen Related to Qwen models bug Something isn't working labels Mar 1, 2026

gemini-code-assist Bot reviewed Mar 1, 2026

View reviewed changes

Comment thread vllm/reasoning/qwen3_reasoning_parser.py

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 2 times, most recently from 3332f54 to a5f6f1f Compare March 1, 2026 23:40

qmx marked this pull request as ready for review March 2, 2026 00:28

qmx requested review from aarnphm and chaunceyjiang as code owners March 2, 2026 00:28

chaunceyjiang self-assigned this Mar 2, 2026

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 2, 2026

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from 73150e3 to 36f48bd Compare March 3, 2026 03:42

chaunceyjiang mentioned this pull request Mar 3, 2026

[Bug]: Streaming mode with --tool-call-parser hermes returns raw text instead of parsed tool_calls #31871

Closed

1 task

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from 36f48bd to 4a4afe7 Compare March 4, 2026 02:30

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 2 times, most recently from 0076a84 to ea1ed91 Compare March 4, 2026 05:18

chaunceyjiang reviewed Mar 5, 2026

View reviewed changes

Comment thread vllm/reasoning/qwen3_reasoning_parser.py Outdated

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 3 times, most recently from 91873d5 to 1a83702 Compare March 6, 2026 14:03

qmx requested a review from chaunceyjiang March 6, 2026 19:38

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 3 times, most recently from 71edf76 to 93da8f4 Compare March 10, 2026 00:52

chaunceyjiang approved these changes Apr 17, 2026

View reviewed changes

Sandermage mentioned this pull request Apr 17, 2026

[Bug/Feature] TurboQuant + Hybrid MoE (Qwen3.6-35B-A3B) broken on Ampere (SM 80-86) — 13 patches with fixes #40124

Open

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from c021d4b to cb45ead Compare April 20, 2026 03:16

qmx requested a review from bbrowning as a code owner April 20, 2026 03:16

qmx added 2 commits April 22, 2026 22:26

qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from cb45ead to 57292ea Compare April 23, 2026 02:27

chaunceyjiang merged commit 92762ed into vllm-project:main Apr 24, 2026
47 checks passed

qmx deleted the fix-qwen3.5-tool-call-in-reasoning branch April 24, 2026 12:46

drrros mentioned this pull request Apr 26, 2026

Eval bug: Answer in think tags. Qwen 3.6 27B ggml-org/llama.cpp#22398

Open

JJJYmmm mentioned this pull request Apr 28, 2026

Qwen3.6-27B frequently stopped with empty tool call QwenLM/Qwen3.6#150

Open

1 task

Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

8b910ba

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me> Signed-off-by: Adrian <info@zzit.ch>

LaZzyMan mentioned this pull request May 13, 2026

After the update, the Qwen code automatically instructs the user to stop the task. QwenLM/qwen-code#3730

Closed

weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

d1b9a1f

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

bc42016

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

143057e

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

68d953c

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

71f413a

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser (v…

b242bf3

…llm-project#35687) Signed-off-by: Doug Campos <qmx@qmx.me>

Uh oh!

Conversation

qmx commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chaunceyjiang commented Mar 3, 2026

Uh oh!

qmx commented Mar 4, 2026

Uh oh!

mergify Bot commented Mar 4, 2026

Uh oh!

qmx commented Mar 4, 2026

Uh oh!

Uh oh!

qmx commented Mar 6, 2026

Uh oh!

bbartels commented Mar 16, 2026

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 17, 2026

Uh oh!

chaunceyjiang commented Apr 17, 2026

Uh oh!

mergify Bot commented Apr 20, 2026

Uh oh!

qmx commented Apr 24, 2026

Uh oh!

Uh oh!

ExtReMLapin commented Apr 24, 2026

Uh oh!

ExtReMLapin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

qmx commented Mar 1, 2026 •

edited

Loading

ExtReMLapin commented Apr 24, 2026 •

edited

Loading