Skip to content

fix: fix the behavior of XML-style JSON schema conversion.#634

Merged
Seven-Streams merged 5 commits into
mlc-ai:mainfrom
Seven-Streams:fix/2026-05-07/xml_schema_true
May 8, 2026
Merged

fix: fix the behavior of XML-style JSON schema conversion.#634
Seven-Streams merged 5 commits into
mlc-ai:mainfrom
Seven-Streams:fix/2026-05-07/xml_schema_true

Conversation

@Seven-Streams

Copy link
Copy Markdown
Collaborator

This PR fixes the behavior of XML-style JSON schema's conversion. This PR also fixes the incorrect tests for strict=False.

Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
Copilot AI review requested due to automatic review settings May 7, 2026 14:05

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the JSON schema converter to support XML-based tool calling by using an ObjectSpec for schemas that accept any value, alongside updates to several test cases to reflect the new tagged parameter format. Feedback indicates that the current XML mode implementation would incorrectly reject parameters because allow_additional_properties is false by default in ObjectSpec. A refactor was suggested to explicitly enable this property and reduce code duplication.

Comment thread cpp/json_schema_converter.cc Outdated
Comment on lines +275 to +287
if (config_.json_format == JSONFormat::kJSON) {
auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
} else {
// Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling
// format. For example, we in QwenXML mode, we cannot accept the tool calling's like
// "<function=A>\nstring\n</function>", The correct format should be
// "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>".
auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In XML mode, when the schema is true (indicating it should accept any value), the parameters must still follow the tagged format (e.g., <parameter=name>value</parameter>). However, the current implementation returns a default ObjectSpec{}, which has allow_additional_properties set to false by default. This will cause the grammar to reject any parameters passed to the tool, which contradicts the meaning of schema=true and will cause the updated tests (where strict=False is used) to fail.

Additionally, the logic can be refactored to avoid duplicating the cache update and return statements, and the comment has a minor typo ("we in QwenXML mode").

Suggested change
if (config_.json_format == JSONFormat::kJSON) {
auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
} else {
// Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling
// format. For example, we in QwenXML mode, we cannot accept the tool calling's like
// "<function=A>\nstring\n</function>", The correct format should be
// "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>".
auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
}
SchemaSpecPtr spec;
if (config_.json_format == JSONFormat::kJSON) {
spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint);
} else {
// In XML mode, tool parameters must follow the tagged format (e.g., <parameter=name>value</parameter>).
// Thus, a schema that accepts "any" value (like 'true') must be represented as an
// ObjectSpec that allows any additional properties.
ObjectSpec obj_spec;
obj_spec.allow_additional_properties = true;
spec = SchemaSpec::Make(std::move(obj_spec), cache_key, rule_name_hint);
}
schema_cache_[cache_key] = spec;
return ResultOk(spec);

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts how JSON Schema boolean true (unconstrained schema) is converted when generating XML-style tool-calling grammars, and updates structural-tag tests to reflect the corrected XML parameter-tag formats for affected model families.

Changes:

  • Update C++ JSON schema parsing so true schemas in XML formats no longer map to the same unconstrained behavior as pure JSON output.
  • Update Python tests to use XML <parameter ...> tags (instead of <q>...</q>) for DeepSeek/MiniMax/Qwen XML-style tool-call formats.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
tests/python/test_builtin_structural_tag.py Updates strict/missing-parameters acceptance fixtures to match XML parameter-tag formats.
cpp/json_schema_converter.cc Changes handling of boolean true schema for XML-mode conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/json_schema_converter.cc Outdated
Comment on lines +275 to +287
if (config_.json_format == JSONFormat::kJSON) {
auto spec = SchemaSpec::Make(AnySpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
} else {
// Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling
// format. For example, we in QwenXML mode, we cannot accept the tool calling's like
// "<function=A>\nstring\n</function>", The correct format should be
// "<function=A>\n<parameter=name>\nstring\n</parameter>\n</function>".
auto spec = SchemaSpec::Make(ObjectSpec{}, cache_key, rule_name_hint);
schema_cache_[cache_key] = spec;
return ResultOk(spec);
}
Comment thread cpp/json_schema_converter.cc Outdated
Comment on lines +280 to +282
// Otherwise, we are in XML mode. In XML mode, we need to follow the general XML tool-calling
// format. For example, we in QwenXML mode, we cannot accept the tool calling's like
// "<function=A>\nstring\n</function>", The correct format should be
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
@Seven-Streams

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces TrueSpec to represent the boolean true schema, allowing XMLToolCallingConverter to handle root-level schemas by generating an object structure. Feedback indicates that because true and {} are equivalent in JSON Schema, AnySpec should be updated to match this behavior at the root level to ensure consistency. Additionally, the reviewer suggests adding TrueSpec to the basic rules cache to optimize grammar generation for nested structures.

Comment thread cpp/json_schema_converter.cc Outdated
Comment thread cpp/json_schema_converter.cc Outdated
Comment thread cpp/json_schema_converter_ext.cc Outdated
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
@Seven-Streams

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refines the XMLToolCallingConverter to handle different nesting levels (root, XML parameters, and JSON content) and updates test cases to reflect a more structured XML parameter format. The review feedback identifies a high-severity issue where resetting the nesting level to 0 before defining XML object rules causes a cache collision, which may lead to incorrect grammar generation for nested JSON objects. Additionally, a suggestion was made to reuse the predefined kXMLObject rule in GenerateAny for the root level to improve consistency and efficiency.

Comment thread cpp/json_schema_converter_ext.cc
Comment thread cpp/json_schema_converter_ext.cc
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
@Seven-Streams Seven-Streams merged commit 41dbbb1 into mlc-ai:main May 8, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants