fix(patch): unescape \t/\r in new_string when file region uses real control bytes (#33733) by teknium1 · Pull Request #33799 · NousResearch/hermes-agent

teknium1 · 2026-05-28T10:15:35Z

Summary

Patch tool no longer corrupts tab-indented files when the LLM serializes tabs as the two-character sequence \t in new_string. Salvages @liuhao1024's #33737 and widens it to cover the issue's headline reproduction (matched via exact strategy, not escape_normalized).

Fixes #33733.

Root cause

LLMs frequently produce \t (backslash + t, two characters) in JSON tool-call arguments where they meant a real tab byte. The patch tool wrote those literal characters straight to disk. _strategy_escape_normalized already unescaped old_string for matching; new_string had no equivalent treatment.

What changed vs #33737

#33737 gated the unescape on strategy_name == "escape_normalized". The issue's repro has a real tab in old_string, so the match fires under exact — the gate never opens. This PR decides per-sequence by inspecting the matched region of the file: only convert \t -> tab (or \r -> CR) when the file region actually contains the corresponding control byte. Mirrors _detect_escape_drift's region-based heuristic and avoids touching legitimate writes of the literal two-character string "\t" (e.g. sep = "\t" in Python source). \n is intentionally excluded — newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help.

Changes

tools/fuzzy_match.py: replace _unescape_common_sequences with _maybe_unescape_new_string(new_string, content, matches); call it unconditionally from fuzzy_find_and_replace (the helper itself bails out cheaply when there's nothing to do).
tests/tools/test_fuzzy_match.py: rewrite TestEscapeNormalizedNewString to cover both exact and escape_normalized strategies, the legitimate-literal preservation case, and the \n-not-unescaped invariant.

Validation

E2E against the real patch tool, fixed code on disk:

variant	file has	old_string	new_string	result
1 — issue headline	real tab	real tab	literal `\t`	real tab written ✓
2 — both literal	real tab	literal `\t`	literal `\t`	real tab written ✓
3 — legit `"\t"` literal	`\` + t	`\` + t	`\` + t (new word)	`\` + t preserved ✓
4 — passthrough	none	none	none	no-op ✓
5 — `\'`/`\"` drift guard	real `'`	`\'`-drifted	`\'`-drifted	drift error still fires ✓

Targeted tests: 220 passing across test_fuzzy_match.py, test_patch_failure_tracking.py, test_patch_parser.py, test_file_operations*.py, test_file_tools.py.

Credit

Cherry-picked @liuhao1024's commit from #33737 with authorship preserved; follow-up commit widens the fix. Closes #33737 (superseded) and #33745 (closed by author).

Infographic

…lized matches When the patch tool matches via the escape_normalized strategy, old_string contains literal \t, \n, \r sequences that get unescaped to match real control characters in the file. However, new_string was written as-is, leaving literal backslash sequences in the output. Add _unescape_common_sequences() helper and apply it to new_string when the matching strategy is escape_normalized. This ensures LLM-generated tab/newline sequences become real bytes in the patched file. Fixes #33733

@liuhao1024

…33733) Extends @liuhao1024's escape-normalized fix so the patch tool also recovers when old_string carries a real tab byte and matches via the `exact` strategy — which is the headline reproduction in the issue and the most common case in practice (LLMs frequently get old_string right because they re-read the file, but still serialize new_string's tabs as two-character `\t`). Instead of gating on the match strategy, decide per-sequence by looking at the *matched region of the file*: only convert `\t` -> tab and `\r` -> CR when the file region we're replacing actually contains the corresponding control byte. That mirrors the region-based heuristic in `_detect_escape_drift` and keeps legitimate writes of the literal two-character string `"\t"` (e.g. patching `sep = "\t"` in Python source) untouched — those files have a backslash+t in the matched region, not a real tab, so new_string passes through verbatim. `\n` is still excluded because newlines serialize correctly through JSON and unescaping would corrupt source escape sequences far more often than help. E2E verified against the live `patch` tool: tab-indented file + literal `\t` in new_string under both `exact` (Variant 1) and `escape_normalized` (Variant 2) strategies now produces real tab bytes; a Python source line containing `sep = "\t"` (legitimate literal backslash-t) survives a patch unchanged. Tests updated to cover both strategies and the legitimate-literal case, and to assert that `\n` is intentionally preserved. Refs #33733

github-actions · 2026-05-28T10:16:34Z

🔎 Lint report: `hermes/hermes-6ef29853` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9543 on HEAD, 9543 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5028 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

liuhao1024 and others added 2 commits May 28, 2026 02:58

teknium1 merged commit 78be458 into main May 28, 2026
25 checks passed

teknium1 deleted the hermes/hermes-6ef29853 branch May 28, 2026 10:27

This was referenced May 28, 2026

fix(tools): unescape common sequences in new_string when escape_normalized matches #33737

Closed

patch tool: new_string escape sequences (\t) get written literally #33733

Closed

alt-glitch added P2 Medium — degraded but workaround exists type/bug Something isn't working comp/tools Tool registry, model_tools, toolsets tool/file File tools (read, write, patch, search) labels May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(patch): unescape \t/\r in new_string when file region uses real control bytes (#33733)#33799

fix(patch): unescape \t/\r in new_string when file region uses real control bytes (#33733)#33799
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6ef29853

teknium1 commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

What changed vs #33737

Changes

Validation

Credit

Infographic

Uh oh!

github-actions Bot commented May 28, 2026

🔎 Lint report: hermes/hermes-6ef29853 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented May 28, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-6ef29853` vs `origin/main`