fix(patch-tool): per-mode required-param hints + per-turn file-mutation verifier footer#22149
Closed
teknium1 wants to merge 2 commits into
Closed
fix(patch-tool): per-mode required-param hints + per-turn file-mutation verifier footer#22149teknium1 wants to merge 2 commits into
teknium1 wants to merge 2 commits into
Conversation
…tions Models that enforce required-only constraints (e.g. kimi-k2.x) were omitting old_string/new_string for replace mode and patch for patch mode because the schema only declared required: ["mode"]. Add explicit "REQUIRED when mode='X'" markers to each conditionally-required property description and a top-level "REQUIRED PARAMETERS: ..." summary for each mode. Avoids anyOf/oneOf which break Anthropic, Fireworks, and Kimi/Moonshot providers. Add TestPatchSchemaShape to lock the shape. Fixes #15524 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Detect when write_file / patch calls fail during a turn and are never superseded by a successful write to the same path. When the final text response is delivered, append an advisory footer listing the files that did NOT change — so models that over-claim 'patched 5 files' after 4 silent failures can't hide the lie. Catches the failure mode reported in Ben Eng's llm-wiki session: grok-4.1-fast issued batches of parallel patches, half failed with 'Could not find old_string', and the agent summarised the turn claiming every file was edited. The user had to manually run 'git status' each turn to catch it. The verifier is a pure post-hoc check on tool results — no new LLM calls, no synthetic messages injected into history (prompt cache preserved), no changes to tool argument dispatch. Per-turn state is keyed by path; a later successful write to the same path clears the failure entry so single-file retry recovery is not flagged. Wired into both _execute_tool_calls_concurrent and _execute_tool_calls_sequential, so batched parallel patches and one-at- a-time edits are both covered. Footer emission happens after the agent loop exits, before transform_llm_output / post_llm_call plugin hooks run, so plugins still see (and can modify) the augmented text. Config: display.file_mutation_verifier (bool, default true) + HERMES_FILE_MUTATION_VERIFIER env override. 31 unit tests in tests/run_agent/test_file_mutation_verifier.py cover target extraction (write_file, patch-replace, patch-v4a single and multi-file), error-preview extraction (JSON .error field and plain string), per-turn state transitions (first-error-wins on repeated failure, success supersedes failure), footer rendering (truncation at 10 entries, user-actionable hint), and env/config precedence. Companion docs updated: user-guide/configuration.md + reference/environment-variables.md.
Contributor
Author
|
Superseded by #24498 — same Layer 2 (verifier) commit, rebuilt onto current main with the docs conflict resolved. Layer 1 of this PR (the patch-tool schema-description fix cherry-picked from #15673) was independently salvaged and merged as 3adcc64 on Apr 25, which is why this branch went stale. #24498 carries only the substantive verifier commit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Models no longer silently over-claim file edits. Two layers, one PR:
Layer 1 — salvaged from #15673 (@briandevans): Advertises per-mode required params in the patch schema's descriptions so strict tool callers (kimi-k2.x in #15524, grok-4.1-fast reported separately) stop omitting
old_string/new_string/patch. JSON Schema'srequiredarray stays["mode"]becauseanyOf/oneOfat parameters level breaks Anthropic, Fireworks, and Kimi — description-levelREQUIRED when mode='…'hints are the only cross-provider-compatible mechanism.Layer 2 (new): Per-turn verifier that catches over-claiming after the fact. Every
write_file/patchcall's outcome is tracked per resolved path; failures that are never superseded by a later successful write to the same path are surfaced in a footer appended to the assistant's final response:Regression target: Ben Eng's llm-wiki session where grok-4.1-fast batched parallel patches, half failed with 'Could not find old_string', and the model summarised the turn claiming every file was edited — forcing the user to manually run
git statusevery turn.Changes
tools/file_tools.py—PATCH_SCHEMAdescription + per-propertyREQUIRED when mode='…'hints (cherry-pick from fix(patch-tool): advertise per-mode required params in schema descriptions #15673)run_agent.py—_FILE_MUTATING_TOOLSconst,_extract_file_mutation_targets(handles write_file, patch-replace, and patch-v4a single + multi-file),_extract_error_preview,AIAgent._record_file_mutation_result,AIAgent._file_mutation_verifier_enabled,AIAgent._format_file_mutation_failure_footer. Recording wired into both_execute_tool_calls_concurrentand_execute_tool_calls_sequentialso parallel and sequential batches are both covered. Footer emission runs at turn end, beforetransform_llm_output/post_llm_callplugin hooks, so plugins still see (and can modify) the augmented text.hermes_cli/config.py—display.file_mutation_verifierdefaultTruetests/run_agent/test_file_mutation_verifier.py— 31 unit tests (target extraction, error-preview extraction, state transitions, footer rendering, env/config precedence)tests/tools/test_file_tools.py— 9TestPatchSchemaShapeassertions from fix(patch-tool): advertise per-mode required params in schema descriptions #15673website/docs/user-guide/configuration.md+website/docs/reference/environment-variables.md—display.file_mutation_verifier+HERMES_FILE_MUTATION_VERIFIERValidation
old_stringrequired: ["mode"]), tool returnsold_string required, model loopsREQUIRED when mode='replace', model includes the fieldgit statusto find the lietests/tools/test_file_tools.py+tests/run_agent/test_file_mutation_verifier.py+test_steer.py+test_tool_call_args_sanitizer.pyNotes
display.file_mutation_verifier: falseinconfig.yamlorHERMES_FILE_MUTATION_VERIFIER=0.*** Update File:/*** Add File:/*** Delete File:headers so each file in a multi-file patch is tracked separately.Closes #15524
Cherry-picks and credits @briandevans (PR #15673).