fix(tools): unescape common sequences in new_string when escape_normalized matches#33737
Closed
liuhao1024 wants to merge 1 commit into
Closed
fix(tools): unescape common sequences in new_string when escape_normalized matches#33737liuhao1024 wants to merge 1 commit into
liuhao1024 wants to merge 1 commit into
Conversation
…lized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes NousResearch#33733
Contributor
|
Merged via PR #33799. Your commit was cherry-picked onto current main with your authorship preserved in git log (commit e9f3f2b). The follow-up commit widens the fix so it also catches the issue's headline reproduction — when Thanks for the contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the patch tool's
escape_normalizedmatching strategy succeeds,old_stringcontained literal\\t,\\n,\\rsequences that were unescaped to match real control characters in the file. However,new_stringwas written as-is, leaving literal backslash sequences (two characters:\+t) in the patched file instead of real tab/newline bytes.Root cause: The
_strategy_escape_normalizedfunction unescapesold_stringfor matching butnew_stringpasses through to_apply_replacementswithout the same treatment.Fix: Add
_unescape_common_sequences()helper and apply it tonew_stringonly when the matching strategy isescape_normalized. Exact matches are unaffected — literal backslashes are preserved when the file genuinely contains them.Fixes #33733
Changes
tools/fuzzy_match.py: Added_unescape_common_sequences()helper; modified replacement block to unescapenew_stringwhenstrategy_name == "escape_normalized"tests/tools/test_fuzzy_match.py: AddedTestEscapeNormalizedNewStringclass with 5 regression tests (tab, newline, CR, mixed sequences, exact-match preservation)Testing
All existing tests continue to pass. 5 new regression tests verify:
\\t) innew_stringis unescaped to real tab byte\\n) innew_stringis unescaped to real newline\\r) innew_stringis unescapedCode Intelligence
tools/fuzzy_match.py:fuzzy_find_and_replace(callers:tools/patch_tool.py,tests/tools/test_fuzzy_match.py)new_stringprocessing whenescape_normalizedstrategy matches; exact matches and all other strategies are unaffected_detect_escape_drift(guards\\'/\\"drift),_strategy_escape_normalized(unescapesold_stringfor matching)