Add specialized tagged thinking tool parser#24202
Open
bartdeboer wants to merge 1 commit into
Open
Conversation
|
Hi @bartdeboer, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
6e20360 to
872f2ba
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds a specialized parser for tagged thinking/tool-call outputs.
It was motivated by testing Qwen3.5-9B with reasoning enabled and XML-style tool calls in an agent/tool-loop setup.
Two parser issues showed up:
<tool_call>emitted while the parser was inside a<think>...</think>reasoning block was surfaced asreasoning_contentinstead of being parsed as a tool call.<parameter=old_string>were required to appear in schema/property iteration order, even though the parameter name is already carried in-band by the tag.Observed Qwen behavior
Qwen3.5 appears to use three practical response modes:
The important detail is that Qwen does not always emit a complete
<think>...</think>block for every response mode.For pure reasoning messages, Qwen reliably emits/uses reasoning tags and closes with
</think>.For tool-call messages, Qwen may emit reasoning-like text before
<tool_call>, but it may not emit a matching</think>before the tool call.For final-answer messages, Qwen may emit plain assistant content without closing a template-prefilled
<think>block.The standard Qwen Jinja template helps by prefilling assistant generation with:
That works well for pure reasoning messages, because Qwen then emits:
reasoning... </think>However, for tool-call and final-answer messages, Qwen may effectively ignore that prefilled reasoning state. Some downstream template variants, such as no-prefill template fixes, remove this
<think>prefill entirely.So the parser should not rely only on whether the Jinja template prefilled
<think>. It needs to classify the emitted completion itself.Parser behavior added by this PR
This PR adds a specialized parser for the tagged thinking/tool-call protocol family used by Qwen-style templates:
The parser kicks in when this tag family is detected in the template.
It classifies emitted output as follows:
</think>marks preceding text asreasoning_content.<tool_call>marks preceding text asreasoning_content, even if no explicit<think>tag was emitted in the completion.<tool_call>...</tool_call>is parsed as a tool call.This makes the parser robust to both default Qwen templates that prefill
<think>and no-prefill template variants.Tagged parameter handling
For tagged tool calls, function argument names are already present in-band:
This PR parses tagged parameters by their emitted names rather than requiring schema/property iteration order.
It also allows string parameters to contain raw multiline content/code until the closing
</parameter>tag.This fixes Qwen-style tool calls such as edit/write calls where multiline code appears inside a string parameter.
Additional information
This was tested downstream with Qwen3.5-9B using reasoning enabled and XML-style tool calls.
A patched CUDA image used for downstream validation is available here:
Digest:
Validated locally with:
Related issues / prior art:
peg-nativechat format parser fails when model outputs text before<tool_call>(thinking model + tool calling) #20260<tool_call>can act as a reasoning boundary.Requirements