Improve tagged tool parsing with reasoning by bartdeboer · Pull Request #23773 · ggml-org/llama.cpp

bartdeboer · 2026-05-27T09:08:18Z

Overview

This PR fixes two tagged-tool parsing issues found while testing Qwen3.5 with reasoning enabled and XML-style tool calls.

Commit 1: Fix tagged tool calls inside reasoning

The current llama.cpp parser splits content parsing and reasoning parsing into separate paths. This means a <tool_call> emitted inside a <think>...</think> block can be consumed as reasoning_content instead of being parsed as a tool call.

This commit combines those paths for the TAG_WITH_TAGGED parser. Reasoning tags now act as text-mode switches:

text inside reasoning tags is emitted as reasoning_content
text outside reasoning tags is emitted as normal content
tagged tool calls are parsed as tool_calls in either mode

The lazy tagged-tool grammar was also adjusted so a triggered <tool_call>...</tool_call> does not have to be the final suffix of the completion. This allows the full parser to handle following segments such as </think> or later content.

Commit 2: Accept required tagged tool args in any order

Tagged parameters include their names in-band, for example <parameter=old_string>. Requiring these parameters to appear in schema/property iteration order is stricter than necessary and caused valid Qwen-style tool calls to fail parsing.

This commit keeps the existing grammar-based parser structure, but generates permutations for required tagged arguments so they can be accepted in any order. Optional arguments can still appear flexibly around them.

This is intended as a narrow, low-impact fix. A more scalable long-term design would be to parse tagged parameters as an unordered list, collect them by name, reject duplicates, check required parameters after parsing, and emit normalized arguments. That would avoid permutation growth, but would require a larger parser/mapper refactor.

To avoid pathological grammar growth, this commit caps the permutation path at six required parameters. Schemas with more than six required parameters keep the previous fixed-order behavior.

This fixes Qwen-style outputs such as:

<think>
Reasoning...

<tool_call>
<function=edit>
<parameter=file>...</parameter>
<parameter=old_string>...</parameter>
<parameter=new_string>...</parameter>
</function>
</tool_call>
</think>

Additional information

This was tested downstream with ctgbot and Qwen3.5 9B using reasoning enabled and XML-style tool calls.

A patched CUDA image used for downstream validation is available here:

ghcr.io/bartdeboer/llama-cpp:server-cuda-ctgbot-patches-a03afef

Digest:

sha256:a952fea7e739130000e9bca0f4f886a76ba0bb95a932fe23d56b929ef8f1255e

Tested locally with:

./build/bin/test-chat --suppress-debug

Related issues:

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - AI assistance was used for investigation and prototyping. I have a software engineering background, but I am not primarily a C++/llama.cpp contributor. I designed the code shape together with Codex, reviewed the changes, tested them locally, and am responsible for the submitted code.

Atomic-Germ · 2026-06-04T18:27:41Z

Wouldn't a tool call in the middle of a reasoning block be a bad thing? The model isn't expecting it to actually do anything.

bartowski1182 · 2026-06-04T18:33:14Z

Some models do this, Kimi K2 I believe for example will reason, tool call, reason more

I do agree it feels weird, but since they do it we should support it

bartdeboer · 2026-06-04T19:05:55Z

Wouldn't a tool call in the middle of a reasoning block be a bad thing? The model isn't expecting it to actually do anything.

For Qwen3.5-style outputs, the important case is not that tools should execute “inside” reasoning. Rather, the model may start emitting a valid <tool_call> before it emits the closing </think> tag.

vLLM handles this in its Qwen3 reasoning parser by treating <tool_call> as an implicit end of reasoning:

https://docs.vllm.ai/en/stable/api/vllm/reasoning/qwen3_reasoning_parser/

This PR accepts tagged <tool_call> blocks as valid in either reasoning blocks or normal content blocks.

aldehir · 2026-06-04T19:59:42Z

Since this behavior is specific to Qwen, at least for now, I would rather implement a specialized parser than baking this into the autoparser. That said, we have discussed masking the <tool_call> token within reasoning as an alternative.

cc @pwilkin for discussion.

bartdeboer · 2026-06-04T20:50:36Z

I can open a second PR that implements this as a Qwen-specific parser. That could also address the tool args order issue, although that seems like a more generic tagged-parameter parsing issue.

The main thing I am unsure about is the detection strategy. The existing specialized parsers seem to mostly detect distinctive template signatures, but the Qwen markers (<think>, <tool_call>, <function=, <parameter=) are somewhat generic XML-style markers.

Would you prefer template-signature detection, or would it be acceptable to also use model metadata such as general.architecture == qwen35 when available?

pwilkin · 2026-06-04T22:25:48Z

The approach @aldehir mentioned is in #23478, I'll rebase it on current master.

bartdeboer · 2026-06-05T09:48:45Z

I tested PR #23478 with Qwen3.5-9B in an agent/toolloop setup.

Test image:

ghcr.io/bartdeboer/llama-cpp:server-cuda-pr23478-44a3568
digest: sha256:d520c1773f665e47053c65bece1d2d243da6b10b5b5b8f2e0e307cfbfdfceb31

I enabled the new option with:

LLAMA_ARG_THINK_PREVENT_TOOL_CALL=1

The option was active: the server started cleanly and logs showed reasoning-budget activation on requests.

Results were mixed.

In one test, the tool loop completed 40 iterations with parsed tool calls successfully.

However, in a later wordcount/test follow-up, it failed. The model appeared to be blocked from emitting the opening <tool_call> while still in reasoning, but then leaked dangling closing fragments into user-visible progress/reasoning text:

The file already has the correct import. Let me run the tests again.

</parameter>
</function>
</tool_call>

So this PR appears to help, and can work for some toolloop paths, but in my testing it did not recover when Qwen had already decided to emit a tagged tool call inside reasoning. Blocking the opening token can leave the rest of the tagged structure to leak as text.

bartdeboer · 2026-06-05T18:05:06Z

Opened #24202 that implements the specialized tagged thinking/tool parser approach.

bartdeboer · 2026-06-05T18:23:38Z

Closing this PR in favor of #24202 since I can have only one open PR.

github-actions Bot added the testing Everything test related label May 27, 2026

bartdeboer mentioned this pull request May 27, 2026

Qwen35 think tool mode switch #23678

Closed

bartdeboer force-pushed the qwen-tools-reasoning-fixes branch from d0fdb87 to ed0384d Compare May 29, 2026 21:03

bartdeboer added 2 commits June 2, 2026 08:31

Fix tagged tool calls inside reasoning

f78ec03

Accept required tagged tool args in any order

ce979f0

bartdeboer force-pushed the qwen-tools-reasoning-fixes branch from 2784e8b to ce979f0 Compare June 2, 2026 08:42

bartdeboer marked this pull request as ready for review June 2, 2026 08:55

bartdeboer requested review from a team and pwilkin as code owners June 2, 2026 08:55

bartdeboer mentioned this pull request Jun 5, 2026

Add specialized tagged thinking tool parser #24202

Open

bartdeboer closed this Jun 5, 2026

bartdeboer mentioned this pull request Jun 6, 2026

common/reasoning-budget: force tool call immediately after budget ends, prevent tool call token in reasoning section #23478

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tagged tool parsing with reasoning#23773

Improve tagged tool parsing with reasoning#23773
bartdeboer wants to merge 2 commits into
ggml-org:masterfrom
bartdeboer:qwen-tools-reasoning-fixes

bartdeboer commented May 27, 2026 •

edited

Loading

Uh oh!

Atomic-Germ commented Jun 4, 2026

Uh oh!

bartowski1182 commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 4, 2026

Uh oh!

aldehir commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 4, 2026

Uh oh!

pwilkin commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

bartdeboer commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Commit 1: Fix tagged tool calls inside reasoning

Commit 2: Accept required tagged tool args in any order

Additional information

Requirements

Uh oh!

Atomic-Germ commented Jun 4, 2026

Uh oh!

bartowski1182 commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 4, 2026

Uh oh!

aldehir commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 4, 2026

Uh oh!

pwilkin commented Jun 4, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

bartdeboer commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bartdeboer commented May 27, 2026 •

edited

Loading