What happens
file_read's default path (no offset/limit) reads the entire file into memory and only then truncates it:
|
var content = await File.ReadAllTextAsync(authorizedPath, encoding, ct); |
|
RecordSkillReadIfApplicable(authorizedPath); |
|
return TruncateFileOutput(content, _config.MaxOutputChars); |
That's the same allocate-then-cap pattern #1293 removed from the shell path — File.ReadAllTextAsync materializes the whole thing, then TruncateFileOutput throws most of it away. The bounded, line-by-line path right above it (ReadLinesAsync, used when offset/limit are supplied) does the right thing; the default path does not.
Why it matters
Reading a large authorized file (a multi-hundred-MB log or data dump) spikes the heap by O(file size) before truncation runs, which can OOM the same memory-limited daemon. Since file_read recently gained the ability to pull arbitrary on-disk files into context, the surface for this is wider than it used to be.
Second, related problem: no redaction on read
FileReadTool runs no SecretOutputRedactor at all — reading a file that contains an API key, connection string, or private key hands those bytes straight to the model in cleartext. Every other "return external content to the LLM" path (shell output, background-job output) redacts; this one doesn't.
Suggested direction
- Bound the read. For the default path, either read a bounded head+tail window instead of the whole file, or — since we know the file size up front — refuse an unbounded read of a large file and steer the model to
offset/limit or grep rather than silently materializing it. (Industry harnesses do the latter: cap inline, point at the file, tell the model to read ranges / grep.)
- Redact on read. Run the secret redactor over whatever
file_read returns. The content is already bounded, so this is a cheap pass over a small string. This complements redact-on-write for files we emit ourselves (shell/job spill): we can scrub files we write, but only redact-on-read covers files someone else put on disk.
Related
#1293 (same allocate-then-cap pattern, shell path).
What happens
file_read's default path (nooffset/limit) reads the entire file into memory and only then truncates it:netclaw/src/Netclaw.Actors/Tools/FileReadTool.cs
Lines 99 to 101 in 60601c6
That's the same allocate-then-cap pattern #1293 removed from the shell path —
File.ReadAllTextAsyncmaterializes the whole thing, thenTruncateFileOutputthrows most of it away. The bounded, line-by-line path right above it (ReadLinesAsync, used whenoffset/limitare supplied) does the right thing; the default path does not.Why it matters
Reading a large authorized file (a multi-hundred-MB log or data dump) spikes the heap by O(file size) before truncation runs, which can OOM the same memory-limited daemon. Since
file_readrecently gained the ability to pull arbitrary on-disk files into context, the surface for this is wider than it used to be.Second, related problem: no redaction on read
FileReadToolruns noSecretOutputRedactorat all — reading a file that contains an API key, connection string, or private key hands those bytes straight to the model in cleartext. Every other "return external content to the LLM" path (shell output, background-job output) redacts; this one doesn't.Suggested direction
offset/limitorgreprather than silently materializing it. (Industry harnesses do the latter: cap inline, point at the file, tell the model to read ranges / grep.)file_readreturns. The content is already bounded, so this is a cheap pass over a small string. This complements redact-on-write for files we emit ourselves (shell/job spill): we can scrub files we write, but only redact-on-read covers files someone else put on disk.Related
#1293 (same allocate-then-cap pattern, shell path).