Motivation.
-
When using vLLM’s OpenAI-compatible API, if tool_choice="required" is set to force the model to call a tool, the model’s output behavior does not match expectations.
Specific concern: When tool_choice="required" is enabled, vLLM converts the tool schema into a JSON Schema and constrains the model output via structured_outputs. Since the schema is defined as {"type": "array", ...}, does this mean the model’s first token is forcibly constrained to be "["? , this would skip the model’s native tool_calls_start_token (e.g. <|tool_calls_section_begin|>), completely bypassing the model’s native tool-calling format and its thinking output.
-
Tool parsing logic: The current function-calling parsing logic looks quite hacky and fragmented. Most of the logic relies on regular expressions to extract the tool-calling portion, and then uses JSON parsing to extract the tool name and arguments. In practice, the tool-calling output produced by the model does not always strictly conform to the expected JSON format, which can lead to parsing failures or incorrect results. Therefore, introducing a structural_tag mechanism would allow more reliable marking of the start and end of tool calls, making the parsing process more robust and accurate.
Proposed Change.
xgrammar has recently introduced a series of new structural tags.
We can use a trigger mechanism: detect the start tag of a tool call; once detected, immediately initiate guided generation until the closing tag is encountered.
This approach also supports multiple types of tool_call formats.
Llama JSON-based tool calling, Gemma:
{"name": "function_name", "parameters": params}
Corresponding structural tag:
{
"type": "structural_tag",
"format": {
"type": "triggered_tags",
"triggers": ["{\"name\":"],
"tags": [
{
"begin": "{\"name\": \"func1\", \"parameters\": ",
"content": {"type": "json_schema", "json_schema": ...},
"end": "}"
},
{
"begin": "{\"name\": \"func2\", \"parameters\": ",
"content": {"type": "json_schema", "json_schema": ...},
"end": "}"
},
],
},
}
Qwen 2.5/3, Hermes:
<tool_call>
{"name": "get_current_temperature", "arguments": {"location": "San Francisco, CA, USA"}}
</tool_call>
Corresponding structural tag:
{
"type": "structural_tag",
"format": {
"type": "triggered_tags",
"triggers": ["<tool_call>"],
"tags": [
{
"begin": "<tool_call>\n{\"name\": \"func1\", \"arguments\": ",
"content": {"type": "json_schema", "json_schema": ...},
"end": "}\n</tool_call>",
},
{
"begin": "<tool_call>\n{\"name\": \"func2\", \"arguments\": ",
"content": {"type": "json_schema", "json_schema": ...},
"end": "}\n</tool_call>",
},
],
},
}
DeepSeek:
There is a special tag <|tool▁calls▁begin|> ... <|tool▁calls▁end|> quotes the whole tool calling part.
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>function_name_1
```jsonc
{params}
```<|tool▁call▁end|>
```jsonc
{params}
```<|tool▁call▁end|><|tool▁calls▁end|>
Corresponding structural tag:
{
"type": "structural_tag",
"format": {
"type": "triggered_tags",
"triggers": ["<|tool▁calls▁begin|>"],
"tags": [
{
"begin": "<|tool▁calls▁begin|>",
"end": "<|tool▁calls▁end|>",
"content": {
"type": "tags_with_separator",
"separator": "\n",
"tags": [
{
"begin": "<|tool▁call▁begin|>function<|tool▁sep|>function_name_1\n```jsonc\n",
"content": {"type": "json_schema", "json_schema": ...},
"end": "\n```<|tool▁call▁end|>",
},
{
"begin": "<|tool▁call▁begin|>function<|tool▁sep|>function_name_2\n```jsonc\n",
"content": {"type": "json_schema", "json_schema": ...},
"end": "\n```<|tool▁call▁end|>",
}
]
}
}
],
"stop_after_first": true,
},
}
https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html#examples
Known issue:
Currently, only xgrammar supports structural_tag. outlines_core, lmformatenforcer, and llguidance do not support it.
other:
#30797
#25515
#22918
https://docs.google.com/document/d/1AHlzU2eP3JG_KXsHefzfhwpC3aUvqwQWAY3QZ-Mj-vQ/edit?tab=t.0
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
Before submitting a new issue...
Motivation.
When using vLLM’s OpenAI-compatible API, if
tool_choice="required"is set to force the model to call a tool, the model’s output behavior does not match expectations.Specific concern: When
tool_choice="required"is enabled, vLLM converts the tool schema into a JSON Schema and constrains the model output viastructured_outputs. Since the schema is defined as{"type": "array", ...}, does this mean the model’s first token is forcibly constrained to be"["? , this would skip the model’s nativetool_calls_start_token(e.g.<|tool_calls_section_begin|>), completely bypassing the model’s native tool-calling format and its thinking output.Tool parsing logic: The current function-calling parsing logic looks quite hacky and fragmented. Most of the logic relies on regular expressions to extract the tool-calling portion, and then uses JSON parsing to extract the tool name and arguments. In practice, the tool-calling output produced by the model does not always strictly conform to the expected JSON format, which can lead to parsing failures or incorrect results. Therefore, introducing a
structural_tagmechanism would allow more reliable marking of the start and end of tool calls, making the parsing process more robust and accurate.Proposed Change.
xgrammar has recently introduced a series of new structural tags.
We can use a trigger mechanism: detect the start tag of a tool call; once detected, immediately initiate guided generation until the closing tag is encountered.
This approach also supports multiple types of tool_call formats.
Llama JSON-based tool calling, Gemma:
Corresponding structural tag:
Qwen 2.5/3, Hermes:
Corresponding structural tag:
DeepSeek:
There is a special tag
<|tool▁calls▁begin|> ... <|tool▁calls▁end|>quotes the whole tool calling part.Corresponding structural tag:
https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html#examples
Known issue:
Currently, only xgrammar supports
structural_tag. outlines_core, lmformatenforcer, and llguidance do not support it.other:
#30797
#25515
#22918
https://docs.google.com/document/d/1AHlzU2eP3JG_KXsHefzfhwpC3aUvqwQWAY3QZ-Mj-vQ/edit?tab=t.0
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response
Before submitting a new issue...