Skip to content

Bug: read_file tool's dedup mechanism pollutes file content with cache hint text #13079

@0oLiHano0

Description

@0oLiHano0

title: "Bug Report: read_file tool dedup cache pollution corrupts file content"
created: 2026-04-20
updated: 2026-04-20
type: query
tags: [bug, tool, web]
sources: []

GitHub Issue Draft for Hermes Agent

Issue Title

Bug: read_file tool's dedup mechanism pollutes file content with cache hint text

Issue Body

## Describe the bug

The `read_file` tool's deduplication mechanism (in `tools/file_tools.py`) returns cache hint text in the `content` field when a file is re-read with the same parameters. This text gets appended to the actual file content, corrupting the file with pollution like:
 1|---
 2|title: "..."
 1|File unchanged since last read. The content from the earlier read_file result...

After multiple re-reads, the file accumulates multiple copies of this hint text, making it unusable.

## Root Cause Analysis

### Normal read return format:
```json
{
  "content": "     1|---\n     2|title: \"...\"\n     3|...",
  "total_lines": 50,
  "file_size": 1234
}

Dedup hit return format (❌ broken):

{
  "content": "File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.",
  "path": "...",
  "dedup": true
}

Problem: The content field contains plain text instead of line-numbered format. When downstream logic (e.g., subdir_hints in run_agent.py:7624-7625) appends to this content, the cache hint becomes part of the file content and gets line numbers added on subsequent reads, creating nested pollution:

     1|---
     1|     1|---
     2|     2|title: "..."

Affected Code

File: tools/file_tools.py
Lines: 347-355

if current_mtime == cached_mtime:
    return json.dumps({
        "content": (
            "File unchanged since last read. The content from "
            "the earlier read_file result in this conversation is "
            "still current — refer to that instead of re-reading."
        ),
        "path": path,
        "dedup": True,
    }, ensure_ascii=False)

Proposed Fix

Change the dedup return to use the error field instead of content, making it clear this is a skip signal rather than file content:

if current_mtime == cached_mtime:
    # Use 'error' field instead of 'content' to avoid pollution.
    # This signals to the LLM that this is a skip/warning, not file content.
    return json.dumps({
        "error": (
            "SKIP: File unchanged since last read. You already have this content "
            "from the earlier read_file call in this conversation. "
            "STOP re-reading and proceed with your task."
        ),
        "path": path,
        "dedup": True,
    }, ensure_ascii=False)

Benefits of This Fix

  1. Clear semantics: error field indicates this is not file content
  2. No pollution: The hint won't be appended to file content
  3. LLM-friendly: Explicitly tells the LLM to stop re-reading
  4. Consistent: Matches the pattern used elsewhere in file_tools.py for error returns

Testing

I've verified this fix works correctly:

First read  → ✅ Returns normal content (with line numbers)
Second read → ✅ Returns error field (dedup: true)
Third read  → ✅ Still returns error, no accumulation

Impact

  • Affected scenarios: Re-reading the same file region within a session
  • Behavior change: Dedup hits return error instead of content, LLM receives explicit "skip" signal
  • Backward compatibility: No impact on first reads or reads after file changes

Additional Context

This bug was discovered during a Wiki knowledge base audit where 10 files were corrupted with 395 lines of cache pollution text. The files have been cleaned, and the fix has been applied locally with successful verification.

Environment

  • Hermes Agent version: Latest (as of 2026-04-20)
  • Python version: 3.x
  • OS: Linux/Ubuntu

Labels: bug, tools, high-priority


## Reference

- Fix documentation: `/root/wiki/plans/fix-read-file-dedup.md`
- Audit report: `/root/wiki/logs/lint/audit-2026-04-20.md`

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundtool/fileFile tools (read, write, patch, search)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions