title: "Bug Report: read_file tool dedup cache pollution corrupts file content"
created: 2026-04-20
updated: 2026-04-20
type: query
tags: [bug, tool, web]
sources: []
GitHub Issue Draft for Hermes Agent
Issue Title
Bug: read_file tool's dedup mechanism pollutes file content with cache hint text
Issue Body
## Describe the bug
The `read_file` tool's deduplication mechanism (in `tools/file_tools.py`) returns cache hint text in the `content` field when a file is re-read with the same parameters. This text gets appended to the actual file content, corrupting the file with pollution like:
1|---
2|title: "..."
1|File unchanged since last read. The content from the earlier read_file result...
After multiple re-reads, the file accumulates multiple copies of this hint text, making it unusable.
## Root Cause Analysis
### Normal read return format:
```json
{
"content": " 1|---\n 2|title: \"...\"\n 3|...",
"total_lines": 50,
"file_size": 1234
}
Dedup hit return format (❌ broken):
{
"content": "File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.",
"path": "...",
"dedup": true
}
Problem: The content field contains plain text instead of line-numbered format. When downstream logic (e.g., subdir_hints in run_agent.py:7624-7625) appends to this content, the cache hint becomes part of the file content and gets line numbers added on subsequent reads, creating nested pollution:
1|---
1| 1|---
2| 2|title: "..."
Affected Code
File: tools/file_tools.py
Lines: 347-355
if current_mtime == cached_mtime:
return json.dumps({
"content": (
"File unchanged since last read. The content from "
"the earlier read_file result in this conversation is "
"still current — refer to that instead of re-reading."
),
"path": path,
"dedup": True,
}, ensure_ascii=False)
Proposed Fix
Change the dedup return to use the error field instead of content, making it clear this is a skip signal rather than file content:
if current_mtime == cached_mtime:
# Use 'error' field instead of 'content' to avoid pollution.
# This signals to the LLM that this is a skip/warning, not file content.
return json.dumps({
"error": (
"SKIP: File unchanged since last read. You already have this content "
"from the earlier read_file call in this conversation. "
"STOP re-reading and proceed with your task."
),
"path": path,
"dedup": True,
}, ensure_ascii=False)
Benefits of This Fix
- Clear semantics:
error field indicates this is not file content
- No pollution: The hint won't be appended to file content
- LLM-friendly: Explicitly tells the LLM to stop re-reading
- Consistent: Matches the pattern used elsewhere in file_tools.py for error returns
Testing
I've verified this fix works correctly:
First read → ✅ Returns normal content (with line numbers)
Second read → ✅ Returns error field (dedup: true)
Third read → ✅ Still returns error, no accumulation
Impact
- Affected scenarios: Re-reading the same file region within a session
- Behavior change: Dedup hits return
error instead of content, LLM receives explicit "skip" signal
- Backward compatibility: No impact on first reads or reads after file changes
Additional Context
This bug was discovered during a Wiki knowledge base audit where 10 files were corrupted with 395 lines of cache pollution text. The files have been cleaned, and the fix has been applied locally with successful verification.
Environment
- Hermes Agent version: Latest (as of 2026-04-20)
- Python version: 3.x
- OS: Linux/Ubuntu
Labels: bug, tools, high-priority
## Reference
- Fix documentation: `/root/wiki/plans/fix-read-file-dedup.md`
- Audit report: `/root/wiki/logs/lint/audit-2026-04-20.md`
title: "Bug Report: read_file tool dedup cache pollution corrupts file content"
created: 2026-04-20
updated: 2026-04-20
type: query
tags: [bug, tool, web]
sources: []
GitHub Issue Draft for Hermes Agent
Issue Title
Issue Body
Dedup hit return format (❌ broken):
{ "content": "File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.", "path": "...", "dedup": true }Problem: The
contentfield contains plain text instead of line-numbered format. When downstream logic (e.g.,subdir_hintsinrun_agent.py:7624-7625) appends to this content, the cache hint becomes part of the file content and gets line numbers added on subsequent reads, creating nested pollution:Affected Code
File:
tools/file_tools.pyLines: 347-355
Proposed Fix
Change the dedup return to use the
errorfield instead ofcontent, making it clear this is a skip signal rather than file content:Benefits of This Fix
errorfield indicates this is not file contentTesting
I've verified this fix works correctly:
Impact
errorinstead ofcontent, LLM receives explicit "skip" signalAdditional Context
This bug was discovered during a Wiki knowledge base audit where 10 files were corrupted with 395 lines of cache pollution text. The files have been cleaned, and the fix has been applied locally with successful verification.
Environment
Labels:
bug,tools,high-priority