Add specialized tagged thinking tool parser by bartdeboer · Pull Request #24202 · ggml-org/llama.cpp

bartdeboer · 2026-06-05T18:01:14Z

Overview

This PR adds a specialized parser for tagged thinking/tool-call outputs.

It was motivated by testing Qwen3.5-9B with reasoning enabled and XML-style tool calls in an agent/tool-loop setup.

Two parser issues showed up:

A valid <tool_call> emitted while the parser was inside a <think>...</think> reasoning block was surfaced as reasoning_content instead of being parsed as a tool call.
Tagged function parameters such as <parameter=old_string> were required to appear in schema/property iteration order, even though the parameter name is already carried in-band by the tag.

Observed Qwen behavior

Qwen3.5 appears to use three practical response modes:

Pure reasoning messages

<think>
reasoning...
</think>

Tool-call messages, which may include reasoning before the tool call

reasoning before tool call...
<tool_call>
<function=...>
<parameter=...>...</parameter>
</function>
</tool_call>

Final-answer messages

Final answer...

The important detail is that Qwen does not always emit a complete <think>...</think> block for every response mode.

For pure reasoning messages, Qwen reliably emits/uses reasoning tags and closes with </think>.

For tool-call messages, Qwen may emit reasoning-like text before <tool_call>, but it may not emit a matching </think> before the tool call.

For final-answer messages, Qwen may emit plain assistant content without closing a template-prefilled <think> block.

The standard Qwen Jinja template helps by prefilling assistant generation with:

<think>

That works well for pure reasoning messages, because Qwen then emits:

reasoning...
</think>

However, for tool-call and final-answer messages, Qwen may effectively ignore that prefilled reasoning state. Some downstream template variants, such as no-prefill template fixes, remove this <think> prefill entirely.

So the parser should not rely only on whether the Jinja template prefilled <think>. It needs to classify the emitted completion itself.

Parser behavior added by this PR

This PR adds a specialized parser for the tagged thinking/tool-call protocol family used by Qwen-style templates:

<think>...</think>
<tool_call>
<function=name>
<parameter=arg>...</parameter>
</function>
</tool_call>

The parser kicks in when this tag family is detected in the template.

It classifies emitted output as follows:

</think> marks preceding text as reasoning_content.
<tool_call> marks preceding text as reasoning_content, even if no explicit <think> tag was emitted in the completion.
<tool_call>...</tool_call> is parsed as a tool call.
Text after a parsed tool call is parsed as assistant content.
If no reasoning/tool tags are emitted, the output is parsed as final assistant content.

This makes the parser robust to both default Qwen templates that prefill <think> and no-prefill template variants.

Tagged parameter handling

For tagged tool calls, function argument names are already present in-band:

<parameter=old_string>...</parameter>
<parameter=new_string>...</parameter>

This PR parses tagged parameters by their emitted names rather than requiring schema/property iteration order.

It also allows string parameters to contain raw multiline content/code until the closing </parameter> tag.

This fixes Qwen-style tool calls such as edit/write calls where multiline code appears inside a string parameter.

Additional information

This was tested downstream with Qwen3.5-9B using reasoning enabled and XML-style tool calls.

A patched CUDA image used for downstream validation is available here:

ghcr.io/bartdeboer/llama-cpp:server-cuda-tagged-thinking-tools-168643697

Digest:

sha256:0b43e14146cb715ff8fa6960083ac9c18aae065eed197f87bc88d3c96f7e0240

Validated locally with:

./build/bin/test-chat --suppress-debug --template Qwen3.5
./build/bin/test-chat --suppress-debug

Related issues / prior art:

Related to Improve tagged tool parsing with reasoning #23773
Fixes Eval bug: Qwen3.5 9B often prints tool calls in XML and stops when thinking is enabled - tool calls inside thinking block #20837
Fixes Eval bug: Qwen3.5 and 3.6 tool call emitted in reasoning_content instead of delta.tool_calls (GitHub CoPilot client) #22684
Related to Eval bug: unsloth/Qwen3.5-35B-A3B-GGUF peg-native chat format parser fails when model outputs text before <tool_call> (thinking model + tool calling) #20260
Related to Eval bug: Answer in think tags. Qwen 3.6 27B #22398
Related to vLLM's Qwen3 reasoning parser behavior, where <tool_call> can act as a reasoning boundary.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - AI assistance was used for investigation and prototyping. I have a software engineering background, but I am not primarily a C++/llama.cpp contributor. I reviewed the changes, tested them locally and downstream, and am responsible for the submitted code.

ggml-gh-bot · 2026-06-05T18:05:51Z

Hi @bartdeboer, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

bartdeboer requested review from a team and pwilkin as code owners June 5, 2026 18:01

github-actions Bot added the testing Everything test related label Jun 5, 2026

bartdeboer mentioned this pull request Jun 5, 2026

Improve tagged tool parsing with reasoning #23773

Closed

bartdeboer mentioned this pull request Jun 6, 2026

common/reasoning-budget: force tool call immediately after budget ends, prevent tool call token in reasoning section #23478

Draft

Add tagged thinking tool parser

872f2ba

bartdeboer force-pushed the qwen-style-tagged-parser-clean branch from 6e20360 to 872f2ba Compare June 10, 2026 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specialized tagged thinking tool parser#24202

Add specialized tagged thinking tool parser#24202
bartdeboer wants to merge 1 commit into
ggml-org:masterfrom
bartdeboer:qwen-style-tagged-parser-clean

bartdeboer commented Jun 5, 2026

Uh oh!

ggml-gh-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bartdeboer commented Jun 5, 2026

Overview

Observed Qwen behavior

Parser behavior added by this PR

Tagged parameter handling

Additional information

Requirements

Uh oh!

ggml-gh-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant