Skip to content

Add specialized tagged thinking tool parser#24202

Open
bartdeboer wants to merge 1 commit into
ggml-org:masterfrom
bartdeboer:qwen-style-tagged-parser-clean
Open

Add specialized tagged thinking tool parser#24202
bartdeboer wants to merge 1 commit into
ggml-org:masterfrom
bartdeboer:qwen-style-tagged-parser-clean

Conversation

@bartdeboer

Copy link
Copy Markdown

Overview

This PR adds a specialized parser for tagged thinking/tool-call outputs.

It was motivated by testing Qwen3.5-9B with reasoning enabled and XML-style tool calls in an agent/tool-loop setup.

Two parser issues showed up:

  1. A valid <tool_call> emitted while the parser was inside a <think>...</think> reasoning block was surfaced as reasoning_content instead of being parsed as a tool call.
  2. Tagged function parameters such as <parameter=old_string> were required to appear in schema/property iteration order, even though the parameter name is already carried in-band by the tag.

Observed Qwen behavior

Qwen3.5 appears to use three practical response modes:

  1. Pure reasoning messages
<think>
reasoning...
</think>
  1. Tool-call messages, which may include reasoning before the tool call
reasoning before tool call...
<tool_call>
<function=...>
<parameter=...>...</parameter>
</function>
</tool_call>
  1. Final-answer messages
Final answer...

The important detail is that Qwen does not always emit a complete <think>...</think> block for every response mode.

For pure reasoning messages, Qwen reliably emits/uses reasoning tags and closes with </think>.

For tool-call messages, Qwen may emit reasoning-like text before <tool_call>, but it may not emit a matching </think> before the tool call.

For final-answer messages, Qwen may emit plain assistant content without closing a template-prefilled <think> block.

The standard Qwen Jinja template helps by prefilling assistant generation with:

<think>

That works well for pure reasoning messages, because Qwen then emits:

reasoning...
</think>

However, for tool-call and final-answer messages, Qwen may effectively ignore that prefilled reasoning state. Some downstream template variants, such as no-prefill template fixes, remove this <think> prefill entirely.

So the parser should not rely only on whether the Jinja template prefilled <think>. It needs to classify the emitted completion itself.

Parser behavior added by this PR

This PR adds a specialized parser for the tagged thinking/tool-call protocol family used by Qwen-style templates:

<think>...</think>
<tool_call>
<function=name>
<parameter=arg>...</parameter>
</function>
</tool_call>

The parser kicks in when this tag family is detected in the template.

It classifies emitted output as follows:

  • </think> marks preceding text as reasoning_content.
  • <tool_call> marks preceding text as reasoning_content, even if no explicit <think> tag was emitted in the completion.
  • <tool_call>...</tool_call> is parsed as a tool call.
  • Text after a parsed tool call is parsed as assistant content.
  • If no reasoning/tool tags are emitted, the output is parsed as final assistant content.

This makes the parser robust to both default Qwen templates that prefill <think> and no-prefill template variants.

Tagged parameter handling

For tagged tool calls, function argument names are already present in-band:

<parameter=old_string>...</parameter>
<parameter=new_string>...</parameter>

This PR parses tagged parameters by their emitted names rather than requiring schema/property iteration order.

It also allows string parameters to contain raw multiline content/code until the closing </parameter> tag.

This fixes Qwen-style tool calls such as edit/write calls where multiline code appears inside a string parameter.

Additional information

This was tested downstream with Qwen3.5-9B using reasoning enabled and XML-style tool calls.

A patched CUDA image used for downstream validation is available here:

ghcr.io/bartdeboer/llama-cpp:server-cuda-tagged-thinking-tools-168643697

Digest:

sha256:0b43e14146cb715ff8fa6960083ac9c18aae065eed197f87bc88d3c96f7e0240

Validated locally with:

./build/bin/test-chat --suppress-debug --template Qwen3.5
./build/bin/test-chat --suppress-debug

Related issues / prior art:

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - AI assistance was used for investigation and prototyping. I have a software engineering background, but I am not primarily a C++/llama.cpp contributor. I reviewed the changes, tested them locally and downstream, and am responsible for the submitted code.

@bartdeboer bartdeboer requested review from a team and pwilkin as code owners June 5, 2026 18:01
@github-actions github-actions Bot added the testing Everything test related label Jun 5, 2026
@ggml-gh-bot

ggml-gh-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

Hi @bartdeboer, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@bartdeboer bartdeboer force-pushed the qwen-style-tagged-parser-clean branch from 6e20360 to 872f2ba Compare June 10, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

1 participant