Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls

## Summary

When a streaming API call goes stale mid-tool-call (e.g. a large `write_file`), the partial-stream-stub recovery path sets `finish_reason="stop"` with `tool_calls=None`. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.

## Reproduction

1. Ask the agent (via gateway) to produce a large `write_file` output (e.g. 15-30K+ tokens of HTML)
2. The inference stream goes stale (no chunks for 180s)
3. Stale-stream detector kills the connection
4. Agent returns: `"⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."`
5. User says "continue" → same result, loops forever

## Root Cause

In `agent/chat_completion_helpers.py` (~line 2186-2207):

- When a partial stream has pending tool call names, `_stub_finish_reason` is set to `"stop"` (not `"length"`)
- The stub has `tool_calls=None`

In `agent/conversation_loop.py`:

- `finish_reason="stop"` skips the `finish_reason == "length"` continuation/retry branch entirely
- `assistant_message.tool_calls` is `None`, so it falls through to the text-response path
- The 116-char warning becomes `final_response` and the turn ends
- On retry, the model attempts the identical large tool call → same timeout → same result

## Design Issue

The `finish_reason="stop"` choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.

## Suggested Fixes

1. **Detect repeated stale-stream failures on the same tool call pattern.** After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")

2. **Consider using `finish_reason="length"` for partial-stream-stubs with dropped tool calls** so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.

3. **Upstream:** Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.

## Affected Code

- `agent/chat_completion_helpers.py` — `interruptible_streaming_api_call()`, partial-stream-stub construction
- `agent/conversation_loop.py` — finish_reason branching, no detection of repeated stale-stream patterns

## Related

- #25689 (stale stream timeout does not trigger fallback chain)
- #31128 (stale-stream handler tries to rebuild OpenAI client when provider is Anthropic)
- #28161 (Anthropic streaming: stale/retry paths cause hangs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998

Summary

Reproduction

Root Cause

Design Issue

Suggested Fixes

Affected Code

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls #31998

Description

Summary

Reproduction

Root Cause

Design Issue

Suggested Fixes

Affected Code

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions