Summary
When a streaming API call goes stale mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery path sets finish_reason="stop" with tool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.
Reproduction
- Ask the agent (via gateway) to produce a large
write_file output (e.g. 15-30K+ tokens of HTML)
- The inference stream goes stale (no chunks for 180s)
- Stale-stream detector kills the connection
- Agent returns:
"⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."
- User says "continue" → same result, loops forever
Root Cause
In agent/chat_completion_helpers.py (~line 2186-2207):
- When a partial stream has pending tool call names,
_stub_finish_reason is set to "stop" (not "length")
- The stub has
tool_calls=None
In agent/conversation_loop.py:
finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
assistant_message.tool_calls is None, so it falls through to the text-response path
- The 116-char warning becomes
final_response and the turn ends
- On retry, the model attempts the identical large tool call → same timeout → same result
Design Issue
The finish_reason="stop" choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.
Suggested Fixes
-
Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")
-
Consider using finish_reason="length" for partial-stream-stubs with dropped tool calls so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.
-
Upstream: Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.
Affected Code
agent/chat_completion_helpers.py — interruptible_streaming_api_call(), partial-stream-stub construction
agent/conversation_loop.py — finish_reason branching, no detection of repeated stale-stream patterns
Related
Summary
When a streaming API call goes stale mid-tool-call (e.g. a large
write_file), the partial-stream-stub recovery path setsfinish_reason="stop"withtool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.Reproduction
write_fileoutput (e.g. 15-30K+ tokens of HTML)"⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."Root Cause
In
agent/chat_completion_helpers.py(~line 2186-2207):_stub_finish_reasonis set to"stop"(not"length")tool_calls=NoneIn
agent/conversation_loop.py:finish_reason="stop"skips thefinish_reason == "length"continuation/retry branch entirelyassistant_message.tool_callsisNone, so it falls through to the text-response pathfinal_responseand the turn endsDesign Issue
The
finish_reason="stop"choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.Suggested Fixes
Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")
Consider using
finish_reason="length"for partial-stream-stubs with dropped tool calls so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.Upstream: Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.
Affected Code
agent/chat_completion_helpers.py—interruptible_streaming_api_call(), partial-stream-stub constructionagent/conversation_loop.py— finish_reason branching, no detection of repeated stale-stream patternsRelated