fix: unescape \t and \r in new_string before writing to file (#33733)#33745
Closed
hoobnn wants to merge 1 commit into
Closed
fix: unescape \t and \r in new_string before writing to file (#33733)#33745hoobnn wants to merge 1 commit into
hoobnn wants to merge 1 commit into
Conversation
When the LLM produces literal backslash-t or backslash-r (two-character sequences) in new_string instead of real tab/carriage-return bytes, the patch tool now converts them before writing to disk. This is the symmetrical counterpart to _strategy_escape_normalized which already does the same for old_string during matching. \n is excluded because newlines serialize correctly in JSON. Fixes NousResearch#33733
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #33733 — the patch tool now unescapes common escape sequences (
\tand\r) innew_stringbefore writing to disk.Root Cause
The LLM's training data overwhelmingly represents tabs as the two-character sequence
\t(e.g. in Python string literals, regex, diffs). When generating JSON tool call arguments, the model produces these literal characters instead of real tab bytes. The tool then faithfully writes those literal characters into the file.Fix
Added unescaping of
\t-> tab and\r-> carriage return innew_stringat the top offuzzy_find_and_replace(), before the matching loop. This is the symmetrical counterpart to_strategy_escape_normalizedwhich already does the same forold_stringduring matching.\nis intentionally excluded because newlines serialize correctly in JSON.Changes
tools/fuzzy_match.py: Added 2-line unescape fornew_string(\tand\r)tests/tools/test_fuzzy_match.py: Added 5 new tests covering: