Skip to content

Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998

@alt-glitch

Description

@alt-glitch

Summary

When a streaming API call goes stale mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery path sets finish_reason="stop" with tool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.

Reproduction

  1. Ask the agent (via gateway) to produce a large write_file output (e.g. 15-30K+ tokens of HTML)
  2. The inference stream goes stale (no chunks for 180s)
  3. Stale-stream detector kills the connection
  4. Agent returns: "⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."
  5. User says "continue" → same result, loops forever

Root Cause

In agent/chat_completion_helpers.py (~line 2186-2207):

  • When a partial stream has pending tool call names, _stub_finish_reason is set to "stop" (not "length")
  • The stub has tool_calls=None

In agent/conversation_loop.py:

  • finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
  • assistant_message.tool_calls is None, so it falls through to the text-response path
  • The 116-char warning becomes final_response and the turn ends
  • On retry, the model attempts the identical large tool call → same timeout → same result

Design Issue

The finish_reason="stop" choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.

Suggested Fixes

  1. Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")

  2. Consider using finish_reason="length" for partial-stream-stubs with dropped tool calls so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.

  3. Upstream: Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.

Affected Code

  • agent/chat_completion_helpers.pyinterruptible_streaming_api_call(), partial-stream-stub construction
  • agent/conversation_loop.py — finish_reason branching, no detection of repeated stale-stream patterns

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions