Skip to content

file_read default path materializes the whole file (OOM risk) and never redacts secrets #1301

@Aaronontheweb

Description

@Aaronontheweb

What happens

file_read's default path (no offset/limit) reads the entire file into memory and only then truncates it:

That's the same allocate-then-cap pattern #1293 removed from the shell path — File.ReadAllTextAsync materializes the whole thing, then TruncateFileOutput throws most of it away. The bounded, line-by-line path right above it (ReadLinesAsync, used when offset/limit are supplied) does the right thing; the default path does not.

Why it matters

Reading a large authorized file (a multi-hundred-MB log or data dump) spikes the heap by O(file size) before truncation runs, which can OOM the same memory-limited daemon. Since file_read recently gained the ability to pull arbitrary on-disk files into context, the surface for this is wider than it used to be.

Second, related problem: no redaction on read

FileReadTool runs no SecretOutputRedactor at all — reading a file that contains an API key, connection string, or private key hands those bytes straight to the model in cleartext. Every other "return external content to the LLM" path (shell output, background-job output) redacts; this one doesn't.

Suggested direction

  • Bound the read. For the default path, either read a bounded head+tail window instead of the whole file, or — since we know the file size up front — refuse an unbounded read of a large file and steer the model to offset/limit or grep rather than silently materializing it. (Industry harnesses do the latter: cap inline, point at the file, tell the model to read ranges / grep.)
  • Redact on read. Run the secret redactor over whatever file_read returns. The content is already bounded, so this is a cheap pass over a small string. This complements redact-on-write for files we emit ourselves (shell/job spill): we can scrub files we write, but only redact-on-read covers files someone else put on disk.

Related

#1293 (same allocate-then-cap pattern, shell path).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions