[Frontend] Support strict mode for tool calling#45003
Conversation
|
this PR supercedes #43678 |
bb47907 to
d14e537
Compare
|
|
|
This pull request has merge conflicts that must be resolved before it can be |
fcc6eae to
0185fc7
Compare
|
Documentation preview: https://vllm--45003.org.readthedocs.build/en/45003/ |
3777089 to
946f4a0
Compare
|
I tested this with GPT-OSS 120B using: VLLM_ENFORCE_STRICT_TOOL_CALLING=true
--enable-auto-tool-choice
--tool-call-parser openai
--reasoning-parser openai_gptossI noticed the structural tag is not applied in the live GPT-OSS chat render path for tool_choice="required". The request for gpt goes through the _make_request_with_harmony() path which bypasses the generic preprocess_chat(... parser=self.parser ...) path where the structural_tag gets applied And, if I force the generated Harmony structural tag, it leaks raw Harmony markers into assistant content (reproducible when using ambiguous prompts). Also, the generated Harmony tag for tool_choice="required" contains: "at_least_one": false so even if wired into the live path, it does not appear to enforce required-tool semantics. Could you look into this? Personally, I think using json constraints for gpt oss is a lot simpler since it avoids the overall harmony leak but feel free to do your own deep-dive @chaunceyjiang @cjackal |
|
@ankrovv Yeah, GPT-OSS isn't supported yet. As you mentioned, it currently goes through a separate code path. We're in the process of unifying the Harmony and non-Harmony paths, and once that's done, we'll integrate it with structural_tags harmony tags as well. Overall, using structural tags has been quite effective at improving tool-calling accuracy and reliability. |
… calling Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
|
Can we resolve the merge conflict so that we can get this one in? Thanks for the great work from everyone! |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
… calling Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
… calling Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
@yzong-rh Thanks for catching this so carefully. Regarding the MiniMax issue, I believe the root cause is in xgrammar's built-in As for the OpenAI/Harmony issue, the previous Harmony path bypassed |
… calling Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Great, thank you. I went through the structural tags in A remaining risk is vllm/vllm/reasoning/kimi_k2_reasoning_parser.py Lines 76 to 83 in 7021be6 When that triggers, the start of a tool section is already generated.
Sg. There might be some challenges due to the harmony channel format. Happy to look into it if you don't have the bandwidth. |
… calling Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
The qwen3_xml parser was deleted upstream in vllm-project#45003. The old_xml pairing now resolves to the engine parser, making those tests redundant duplicates of the engine pairing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
Since this merged I'm seeing Qwen 3.6 models getting stuck in infinite whitespace generation loops regularly in relatively simple tool calling scenarios with our out of the box setup. It's easily triggered with things like BFCL multi_turn_base, resulting in extremely long generation times and timeouts while the model generates thousands of newline tokens per turn. We've flip this on by default for a lot of models now in tool calling scenarios, and at least in some cases this has made things worse. We may need a more selective testing and case-by-case enabling of this. |
Co-authored-by: @cjackal 44624812+cjackal@users.noreply.github.com
Purpose
[Frontend] Support strict mode for tool calling
Test Plan
I tested it locally with Minimax 2.5,Qwen2.5, Qwen3.5, Qwen3.6, Qwen3, and DeepSeek V3.2.
I also tested the tool_choice modes required, auto, and named tool selection, and all of them worked correctly.
Test Result
main:
this pr
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.