Skip to content

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser#35687

Merged
chaunceyjiang merged 2 commits into
vllm-project:mainfrom
qmx:fix-qwen3.5-tool-call-in-reasoning
Apr 24, 2026
Merged

[Bugfix] Treat <tool_call> as implicit reasoning end in Qwen3 parser#35687
chaunceyjiang merged 2 commits into
vllm-project:mainfrom
qmx:fix-qwen3.5-tool-call-in-reasoning

Conversation

@qmx

@qmx qmx commented Mar 1, 2026

Copy link
Copy Markdown
Contributor

Purpose

Qwen3.5 models sometimes emit <tool_call> inside the <think> block without closing </think> first. When this happens, Qwen3ReasoningParser.extract_reasoning() classifies the entire output as reasoning, the tool parser receives empty content, and the tool call is silently dropped.

This is the same class of issue fixed for Kimi K2 in #33646. The fix treats <tool_call> as an implicit end-of-reasoning marker in both streaming and non-streaming paths. Paired <tool_call>...</tool_call> from chat template examples are skipped so is_reasoning_end() only triggers on actual model output.

Observed intermittently with Qwen/Qwen3.5-35B-A3B using --reasoning-parser qwen3 --tool-call-parser qwen3_xml during long multi-turn tool-calling sessions.

Test Plan

pytest tests/reasoning/test_qwen3_reasoning_parser.py -v

New test cases added:

  • tool_call_no_think_end / tool_call_no_think_end_stream<tool_call> without </think>, no <think> prefix
  • tool_call_with_think_no_end / tool_call_with_think_no_end_stream<tool_call> without </think>, with <think> prefix
  • tool_call_implicit_reasoning_end — multi-token streaming delta with <tool_call>

Test Result

All existing Qwen3 reasoning parser tests continue to pass. New test cases verify that <tool_call> correctly splits reasoning from content in both streaming and non-streaming extraction.

@github-actions

github-actions Bot commented Mar 1, 2026

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify Bot added qwen Related to Qwen models bug Something isn't working labels Mar 1, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix for the Qwen3 reasoning parser to correctly handle cases where <tool_call> appears inside a <think> block without a closing </think> tag. The change treats <tool_call> as an implicit end-of-reasoning marker, which prevents tool calls from being silently dropped. The implementation correctly addresses this in both streaming and non-streaming paths, and is supported by new, specific test cases.

My review identifies one area for improvement regarding performance. The implementation of is_reasoning_end can be made more efficient by avoiding multiple passes over the input tokens. A detailed suggestion is provided in the comments.

Comment thread vllm/reasoning/qwen3_reasoning_parser.py
@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 2 times, most recently from 3332f54 to a5f6f1f Compare March 1, 2026 23:40
@qmx qmx marked this pull request as ready for review March 2, 2026 00:28
@chaunceyjiang chaunceyjiang self-assigned this Mar 2, 2026
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 2, 2026
@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from 73150e3 to 36f48bd Compare March 3, 2026 03:42
@chaunceyjiang

Copy link
Copy Markdown
Collaborator
image @qmx PTAL.

@qmx

qmx commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

@chaunceyjiang I'll squash the commits once CI fully passes!

@mergify

mergify Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 2 times, most recently from 0076a84 to ea1ed91 Compare March 4, 2026 05:18
@qmx

qmx commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

@chaunceyjiang ready for review!

Comment thread vllm/reasoning/qwen3_reasoning_parser.py Outdated
@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 3 times, most recently from 91873d5 to 1a83702 Compare March 6, 2026 14:03
@qmx

qmx commented Mar 6, 2026

Copy link
Copy Markdown
Contributor Author

seems like I'm hitting a few flaky tests? - rebased and pushed again

@qmx qmx requested a review from chaunceyjiang March 6, 2026 19:38
@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch 3 times, most recently from 71edf76 to 93da8f4 Compare March 10, 2026 00:52
@bbartels

Copy link
Copy Markdown
Contributor

@chaunceyjiang looks like all tests are passing here now :)

@chaunceyjiang chaunceyjiang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

/cc @sfeng33

@mergify

mergify Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@chaunceyjiang

Copy link
Copy Markdown
Collaborator

@qmx please fix the pre-commit.

@mergify

mergify Bot commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Hi @qmx, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

qmx added 2 commits April 22, 2026 22:26
Qwen3.5 models sometimes emit <tool_call> inside the <think> block
without closing </think>. The reasoning parser classifies the entire
output as reasoning, the tool parser receives empty content, and the
tool call is silently dropped.

Treat <tool_call> as an implicit end-of-reasoning marker, matching
the approach in KimiK2ReasoningParser. Paired <tool_call>...</tool_call>
from chat template examples are skipped so is_reasoning_end() only
triggers on actual model output.

Signed-off-by: Doug Campos <qmx@qmx.me>
…g parser

When enable_thinking=False and model output contains <tool_call>, the
extract_reasoning() method incorrectly split output into reasoning +
content because the <tool_call> check ran before the thinking_enabled
check.

- Move the thinking_enabled early-return above the <tool_call> index
  lookup so disabled thinking always treats everything as content
- Add test case for thinking_disabled with tool_call output

Signed-off-by: Doug Campos <qmx@qmx.me>
@qmx qmx force-pushed the fix-qwen3.5-tool-call-in-reasoning branch from cb45ead to 57292ea Compare April 23, 2026 02:27
@qmx

qmx commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@chaunceyjiang rebased + ci passing - ready to go!

@chaunceyjiang chaunceyjiang merged commit 92762ed into vllm-project:main Apr 24, 2026
47 checks passed
@ExtReMLapin

Copy link
Copy Markdown
Contributor

based

@ExtReMLapin

ExtReMLapin commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

@qmx I'm pretty sure you should not just check for exact tool_call token match but also partial outputed ones,

Because the system prompt tells how to use the tool_call and it (the system prompt) might be tokenized without special tokens, the <tool_call> instruction from system prompt might be outputed as the following tokens instead of the exact tool call token :

< + tool + _call + >

Please note this is what is done there

or self.tool_call_start_token in delta_text

But careful there, I think they don't do it correctly because they will never have the guarantee that the delta hold more than one token

Edit please take inspiration from this PR instead : #40783

@qmx qmx deleted the fix-qwen3.5-tool-call-in-reasoning branch April 24, 2026 12:46
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…llm-project#35687)

Signed-off-by: Doug Campos <qmx@qmx.me>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
devnen referenced this pull request in devnen/vllm-windows Apr 29, 2026
…5687 mirror)

Qwen3.5/3.6 models sometimes emit <tool_call> inside a <think> block
without closing </think> first. With the previous parser the entire
output was classified as reasoning, the qwen3_coder tool parser
received empty content, and the tool call was silently dropped.

Verbatim mirror of upstream vLLM PR #35687:
- __init__ records <tool_call>/</tool_call> token ids
- new is_reasoning_end / is_reasoning_end_streaming /
  extract_content_ids overrides match KimiK2ReasoningParser pattern
- extract_reasoning falls back to <tool_call> as implicit reasoning
  end when </think> is missing
- streaming variant mirrors the same 3-way branch
- pair-checks <tool_call> vs </tool_call> so chat-template examples
  embedded in prompts do not false-fire

Also drops a copy into windows_patches/ so a venv reinstall can
recover the fix per the existing patch convention.
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…llm-project#35687)

Signed-off-by: Doug Campos <qmx@qmx.me>
Signed-off-by: Adrian <info@zzit.ch>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
…llm-project#35687)

Signed-off-by: Doug Campos <qmx@qmx.me>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
tfriedel added a commit to tfriedel/qwen3.6-rtx3090-lab that referenced this pull request May 7, 2026
Addresses silent tool-call drops where the model emits <tool_call>
without closing </think>, causing Qwen3ReasoningParser to swallow the
entire output as reasoning (QwenLM/Qwen3.6#150).

Two-layer fix:

1. Bump pinned vLLM image from 2026-04-20 to 2026-05-07 nightly
   (commit 51f22dcf, v0.20.2rc1.dev93). Crosses the merge of
   vllm-project/vllm#35687 which treats <tool_call> as an implicit
   reasoning end in the Qwen3 parser. Applied to all 11 compose
   variants (35b-a3b-gguf untouched — different base image).

2. Add froggeric/Qwen-Fixed-Chat-Templates qwen3.6 template at
   patches/templates/qwen3.6-enhanced.jinja, mounted into each
   variant and passed via --chat-template. Provides defense-in-depth
   for the same bug (auto-closes unclosed <think> before <tool_call>)
   plus developer role, think_on/think_off toggles, </thinking>
   recognition, and non-ASCII JSON escaping.

TOOL_CALLING_ISSUES.md documents the cause, what's evaluated,
alternatives considered (fakezeta merged gist), and rollback steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
brian-dellabetta pushed a commit to neuralmagic/vllm that referenced this pull request May 29, 2026
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…llm-project#35687)

Signed-off-by: Doug Campos <qmx@qmx.me>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants