fix: build schema grammar in Qwen3-Coder XML template when no tools#1652
Closed
markaalonzo wants to merge 1 commit into
Closed
fix: build schema grammar in Qwen3-Coder XML template when no tools#1652markaalonzo wants to merge 1 commit into
markaalonzo wants to merge 1 commit into
Conversation
common_chat_params_init_qwen3_coder_xml previously only built a grammar when params.tools was non-empty. Requests that sent response_format.json_schema (or top-level json_schema) with no tools reached the sampler with an empty grammar string, leaving output completely unconstrained regardless of strict=true. Now: when tools are absent and json_schema is provided, build a strict GBNF from the schema. If the template has kept <think>... open (thinking-forced), make the grammar lazy with a trigger on </think> so the thinking block stays free and schema enforcement starts at the first content token. Otherwise enforce from token 0. Also: if json_schema and grammar are both set, throw instead of silently overwriting user-supplied grammar. Mirrors the precedent in common_chat_params_init_mistral_nemo and the deepseek_r1 / deepseek_v3_1 / kimi_k2 schema-enforcement paths.
ikawrakow
reviewed
Apr 18, 2026
| })(); | ||
| build_grammar_xml_tool_call(data, params.tools, form); | ||
| } else if (!params.json_schema.is_null()) { | ||
| if (!params.grammar.empty()) { |
Owner
There was a problem hiding this comment.
Do we really want to throw here? If json schema is the preferred way, why not just ignore the grammar if present?
f6a7d1a to
14ca68f
Compare
Collaborator
|
I think we can just wait for #1376 to be merged. |
Owner
|
This PR is no longer relevant after merging #1376, correct? |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
common_chat_params_init_qwen3_coder_xmlpreviously only built a grammar whenparams.toolswas non-empty. Requests that sentresponse_format.json_schema(or top-leveljson_schema) with no tools reached the sampler with an empty grammar string, so output was completely unconstrained regardless ofstrict=true.After the fix, three branches are explicit:
json_schemapresent → strict GBNF built from the schema. If the template kept<think>...</think>open (Step-3.5-Flash / Nemotron 3 Nano), the grammar is lazy with a</think>pattern trigger, mirroring thedeepseek_r1/deepseek_v3_1/kimi_k2thinking-enforced paths. Otherwise enforcement starts at token 0.Also adds a throw when both
json_schemaandgrammarare passed, matching thecommon_chat_params_init_mistral_nemoprecedent ("Either \"json_schema\" or \"grammar\" can be specified, but not both"). Previously the initializer would silently overwrite user-supplied grammar.Why at the initializer level
The Qwen3-Coder XML dispatcher still routes Qwen3-Coder, Step-3.5-Flash, and Nemotron-3 Nano here when
inputs.use_peg=false. Those users currently lose schema enforcement entirely. The autoparser refactor in #1376 centralizes this concern, at which point this fix dissolves into the new architecture; until it lands, the initializer-level fix is the minimal correct change.Testing
Validated on Terra (RTX 3080 Ti, Qwen3.5-35B-A3B Q4_K_M, turbo K4V3 KV,
-ger -ser 3,0 -c 512 -fa 1):/v1/chat/completionswithresponse_format={type: "json_schema", ...}and notoolsnow produces strictly schema-conformant JSON.tools=[...]) unchanged — XML tool-call grammar still built and triggered.json_schemaandgrammarnow raisesruntime_errorinstead of silently dropping the caller's grammar.Note on test coverage:
tests/test-chat.cppis currently commented out of the default build (unrelated pre-existing compile errors in legacy scaffolding,common_chat_parser_paramssymbol renamed tocommon_chat_syntax). Happy to add cases insidetest_template_output_parsersin a follow-up if the maintainer wants test-chat revived as part of this change.