fix: fix the behavior of XML-style JSON schema conversion.#634
Conversation
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request updates the JSON schema converter to support XML-based tool calling by using an ObjectSpec for schemas that accept any value, alongside updates to several test cases to reflect the new tagged parameter format. Feedback indicates that the current XML mode implementation would incorrectly reject parameters because allow_additional_properties is false by default in ObjectSpec. A refactor was suggested to explicitly enable this property and reduce code duplication.
| if (config_.json_format == JSONFormat::kJSON) { | ||
| auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint); | ||
| schema_cache_[cache_key] = spec; | ||
| return ResultOk(spec); | ||
| } else { | ||
| // Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling | ||
| // format. For example, we in QwenXML mode, we cannot accept the tool calling's like | ||
| // "<function=A>\nstring\n</function>", The correct format should be | ||
| // "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>". | ||
| auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint); | ||
| schema_cache_[cache_key] = spec; | ||
| return ResultOk(spec); | ||
| } |
There was a problem hiding this comment.
In XML mode, when the schema is true (indicating it should accept any value), the parameters must still follow the tagged format (e.g., <parameter=name>value</parameter>). However, the current implementation returns a default ObjectSpec{}, which has allow_additional_properties set to false by default. This will cause the grammar to reject any parameters passed to the tool, which contradicts the meaning of schema=true and will cause the updated tests (where strict=False is used) to fail.
Additionally, the logic can be refactored to avoid duplicating the cache update and return statements, and the comment has a minor typo ("we in QwenXML mode").
| if (config_.json_format == JSONFormat::kJSON) { | |
| auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint); | |
| schema_cache_[cache_key] = spec; | |
| return ResultOk(spec); | |
| } else { | |
| // Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling | |
| // format. For example, we in QwenXML mode, we cannot accept the tool calling's like | |
| // "<function=A>\nstring\n</function>", The correct format should be | |
| // "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>". | |
| auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint); | |
| schema_cache_[cache_key] = spec; | |
| return ResultOk(spec); | |
| } | |
| SchemaSpecPtr spec; | |
| if (config_.json_format == JSONFormat::kJSON) { | |
| spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint); | |
| } else { | |
| // In XML mode, tool parameters must follow the tagged format (e.g., <parameter=name>value</parameter>). | |
| // Thus, a schema that accepts "any" value (like 'true') must be represented as an | |
| // ObjectSpec that allows any additional properties. | |
| ObjectSpec obj_spec; | |
| obj_spec.allow_additional_properties = true; | |
| spec = SchemaSpec::Make(std::move(obj_spec), cache_key, rule_name_hint); | |
| } | |
| schema_cache_[cache_key] = spec; | |
| return ResultOk(spec); |
There was a problem hiding this comment.
Pull request overview
This PR adjusts how JSON Schema boolean true (unconstrained schema) is converted when generating XML-style tool-calling grammars, and updates structural-tag tests to reflect the corrected XML parameter-tag formats for affected model families.
Changes:
- Update C++ JSON schema parsing so
trueschemas in XML formats no longer map to the same unconstrained behavior as pure JSON output. - Update Python tests to use XML
<parameter ...>tags (instead of<q>...</q>) for DeepSeek/MiniMax/Qwen XML-style tool-call formats.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/python/test_builtin_structural_tag.py | Updates strict/missing-parameters acceptance fixtures to match XML parameter-tag formats. |
| cpp/json_schema_converter.cc | Changes handling of boolean true schema for XML-mode conversion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (config_.json_format == JSONFormat::kJSON) { | ||
| auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint); | ||
| schema_cache_[cache_key] = spec; | ||
| return ResultOk(spec); | ||
| } else { | ||
| // Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling | ||
| // format. For example, we in QwenXML mode, we cannot accept the tool calling's like | ||
| // "<function=A>\nstring\n</function>", The correct format should be | ||
| // "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>". | ||
| auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint); | ||
| schema_cache_[cache_key] = spec; | ||
| return ResultOk(spec); | ||
| } |
| // Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling | ||
| // format. For example, we in QwenXML mode, we cannot accept the tool calling's like | ||
| // "<function=A>\nstring\n</function>", The correct format should be |
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces TrueSpec to represent the boolean true schema, allowing XMLToolCallingConverter to handle root-level schemas by generating an object structure. Feedback indicates that because true and {} are equivalent in JSON Schema, AnySpec should be updated to match this behavior at the root level to ensure consistency. Additionally, the reviewer suggests adding TrueSpec to the basic rules cache to optimize grammar generation for nested structures.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refines the XMLToolCallingConverter to handle different nesting levels (root, XML parameters, and JSON content) and updates test cases to reflect a more structured XML parameter format. The review feedback identifies a high-severity issue where resetting the nesting level to 0 before defining XML object rules causes a cache collision, which may lead to incorrect grammar generation for nested JSON objects. Additionally, a suggestion was made to reuse the predefined kXMLObject rule in GenerateAny for the root level to improve consistency and efficiency.
This PR fixes the behavior of XML-style JSON schema's conversion. This PR also fixes the incorrect tests for
strict=False.