feat: validate-then-repair layer for semantic tool argument issues#19652
feat: validate-then-repair layer for semantic tool argument issues#19652NikolayGusev-astra wants to merge 1 commit into
Conversation
Adds a validate-then-repair layer in handle_function_call that catches and fixes 5 classes of semantic tool argument malformations before they reach tool handlers: 1. null_on_optional - removes None values from optional fields 2. string_as_array - parses JSON-encoded strings as arrays 3. bare_as_array - wraps single scalars in single-element arrays 4. markdown_link - strips markdown link syntax from path fields 5. missing_with_offset / missing_with_limit - defaults for relational pairs Design: validate-then-repair (not preprocess) - only schema-violating fields are touched. Repair notes are prepended to tool results so the model learns from its mistakes. INFO-level logging for telemetry. Reference: https://telegra.ph/Cursor-vsyo-slomal-no-chinit-nado-ne-Cursor-validate-then-repair-protiv-razryva-svyazej-v-AI-agentah-05-04
|
Note: This PR adds a standalone model_tools.py file rather than integrating into the existing _repair_tool_call_arguments flow in run_agent.py. Consider whether this should be a method in the existing repair pipeline or a separate module. |
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from #19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
|
Thanks for the contribution and the detailed analysis in your blog post — the bare-scalar-as-array pattern is real and worth fixing. Salvaged the genuinely new capability (wrapping bare scalars into single-element lists when the schema declares The broader validate-then-repair layer had a few issues that kept us from merging the full PR:
The surviving capability lives directly in the existing coercion path — ~24 LOC + 7 tests — and you're the commit author. Thanks again! |
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
…args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from NousResearch#19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
Problem
LLMs (especially open-weight models like DeepSeek, Qwen, GLM) systematically produce tool call arguments with 5 classes of semantic malformations that pass JSON syntax validation but violate the tool argument schema:
nullon optional fields — instead of omitting the field[notes.md](http://notes.md)instead ofnotes.mdAll 5 patterns currently pass through without feedback to the model, causing silent data loss, false successes, or confusing error messages.
Solution
A validate-then-repair layer in
handle_function_call()that:[validate-then-repair: ...]note to the tool result so the model learns from its mistakes(tool_name, repair_type)Key design choices
_repair_tool_call_argumentsin run_agent.py (JSON syntax: trailing commas, unclosed brackets, control chars) is complementary — this layer handles semantic issuesTesting
ast.parseRelated
Blog post detailing the problem analysis: https://telegra.ph/Cursor-vsyo-slomal-no-chinit-nado-ne-Cursor-validate-then-repair-protiv-razryva-svyazej-v-AI-agentah-05-04