Skip to content

shell_execute buffers entire child output into memory before truncating (LOH/OOM risk) #1293

@Aaronontheweb

Description

@Aaronontheweb

What happens

shell_execute reads the entire child process stdout and stderr into memory before any size limit is applied. A command that emits a few hundred MB — kubectl logs, journalctl, curl of a large API response, cat of a big file — allocates that whole payload (and then some) on the Large Object Heap in one shot. In a memory-limited container that is an instant OOM, even though the model never sees more than a few KB of the output.

This is the most likely trigger for OOM kills on an autonomous agent that runs shell commands to pull logs.

Why

The drain reads to end with no cap:

Then the result is assembled and copied again, redacted (another copy), and only then truncated:

  • outputBuilder.Append(await stdoutTask);
    errorBuilder.Append(await stderrTask);
    var result = new StringBuilder();
    if (outputBuilder.Length > 0)
    result.Append(outputBuilder);
    if (errorBuilder.Length > 0)
    {
    if (result.Length > 0)
    result.AppendLine();
    result.Append(errorBuilder);
    }
    var sanitized = SecretOutputRedactor.Redact(result.ToString());
    var output = TruncateOutput(sanitized, _config.MaxOutputChars);

So for an N-byte output you transiently hold several multiples of N live at once: the ReadToEndAsync string, the StringBuilder append, the result.ToString(), and the redactor's output — all before TruncateOutput cuts it down. Strings are UTF-16, so an N-byte ASCII log is a 2N-byte .NET string, and anything over 85KB lands on the LOH (which doesn't compact, so the segments linger). A single ~300MB log can momentarily need over 1GB.

The MaxOutputChars cap (default 32000) protects the model's context, not the process. It runs last:

There's even a second cap downstream — ClampToolResult clamps again before the result enters session history — so the model is doubly protected while the process is not protected at all:

  • public static string ClampToolResult(string resultText, int maxInlineToolResultChars)
    {
    if (maxInlineToolResultChars <= 0 || resultText.Length <= maxInlineToolResultChars)
    return resultText;
    var omittedChars = resultText.Length - maxInlineToolResultChars;
    return resultText[..maxInlineToolResultChars]
    + $"\n[tool result truncated: omitted {omittedChars} chars to protect context window]";
    }

Suggested direction

  • Bound the read instead of reading to end. Drain into a fixed-size buffer (something like MaxOutputChars plus a margin) and stop once it's full — kill the process or keep discarding so the pipe doesn't deadlock. We already kill on timeout, so the kill path exists.
  • Once the read is bounded, collapse the copies: truncate first, then redact the small result, then build the final string. No reason to redact or ToString a 300MB buffer we're about to throw away.
  • Consider capturing a head+tail window rather than head-only, since the exit summary often matters — but that's secondary to bounding the allocation.

Notes

This pairs with the GC default issue (Server GC in a small container amplifies the spike and is slow to give the memory back). Bounding the read is the real fix; the GC change just widens the margin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceMemory, CPU, and I/O optimization issues.reliabilityRetries, resilience, graceful degradationshellIssues related to the shell tool, since it has the largest security perimeter.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions