Improve tagged tool parsing with reasoning#23773
Conversation
d0fdb87 to
ed0384d
Compare
2784e8b to
ce979f0
Compare
|
Wouldn't a tool call in the middle of a reasoning block be a bad thing? The model isn't expecting it to actually do anything. |
|
Some models do this, Kimi K2 I believe for example will reason, tool call, reason more I do agree it feels weird, but since they do it we should support it |
For Qwen3.5-style outputs, the important case is not that tools should execute “inside” reasoning. Rather, the model may start emitting a valid vLLM handles this in its Qwen3 reasoning parser by treating https://docs.vllm.ai/en/stable/api/vllm/reasoning/qwen3_reasoning_parser/ This PR accepts tagged |
|
Since this behavior is specific to Qwen, at least for now, I would rather implement a specialized parser than baking this into the autoparser. That said, we have discussed masking the cc @pwilkin for discussion. |
|
I can open a second PR that implements this as a Qwen-specific parser. That could also address the tool args order issue, although that seems like a more generic tagged-parameter parsing issue. The main thing I am unsure about is the detection strategy. The existing specialized parsers seem to mostly detect distinctive template signatures, but the Qwen markers ( Would you prefer template-signature detection, or would it be acceptable to also use model metadata such as |
|
I tested PR #23478 with Qwen3.5-9B in an agent/toolloop setup. Test image: I enabled the new option with: The option was active: the server started cleanly and logs showed reasoning-budget activation on requests. Results were mixed. In one test, the tool loop completed 40 iterations with parsed tool calls successfully. However, in a later wordcount/test follow-up, it failed. The model appeared to be blocked from emitting the opening So this PR appears to help, and can work for some toolloop paths, but in my testing it did not recover when Qwen had already decided to emit a tagged tool call inside reasoning. Blocking the opening token can leave the rest of the tagged structure to leak as text. |
|
Opened #24202 that implements the specialized tagged thinking/tool parser approach. |
|
Closing this PR in favor of #24202 since I can have only one open PR. |
Overview
This PR fixes two tagged-tool parsing issues found while testing Qwen3.5 with reasoning enabled and XML-style tool calls.
Commit 1: Fix tagged tool calls inside reasoning
The current llama.cpp parser splits content parsing and reasoning parsing into separate paths. This means a
<tool_call>emitted inside a<think>...</think>block can be consumed asreasoning_contentinstead of being parsed as a tool call.This commit combines those paths for the
TAG_WITH_TAGGEDparser. Reasoning tags now act as text-mode switches:reasoning_contentcontenttool_callsin either modeThe lazy tagged-tool grammar was also adjusted so a triggered
<tool_call>...</tool_call>does not have to be the final suffix of the completion. This allows the full parser to handle following segments such as</think>or later content.Commit 2: Accept required tagged tool args in any order
Tagged parameters include their names in-band, for example
<parameter=old_string>. Requiring these parameters to appear in schema/property iteration order is stricter than necessary and caused valid Qwen-style tool calls to fail parsing.This commit keeps the existing grammar-based parser structure, but generates permutations for required tagged arguments so they can be accepted in any order. Optional arguments can still appear flexibly around them.
This is intended as a narrow, low-impact fix. A more scalable long-term design would be to parse tagged parameters as an unordered list, collect them by name, reject duplicates, check required parameters after parsing, and emit normalized arguments. That would avoid permutation growth, but would require a larger parser/mapper refactor.
To avoid pathological grammar growth, this commit caps the permutation path at six required parameters. Schemas with more than six required parameters keep the previous fixed-order behavior.
This fixes Qwen-style outputs such as:
Additional information
This was tested downstream with ctgbot and Qwen3.5 9B using reasoning enabled and XML-style tool calls.
A patched CUDA image used for downstream validation is available here:
Digest:
Tested locally with:
Related issues:
peg-nativechat format parser fails when model outputs text before<tool_call>(thinking model + tool calling) #20260Requirements