Skip to content

Auto-retry doesn't cover "Anthropic stream ended before message_stop" #4433

@mustafa01ali

Description

@mustafa01ali

Summary

After the fix for #3936, premature Anthropic stream terminations correctly surface as errors with the message Anthropic stream ended before message_stop (thrown from the Anthropic SDK's MessageStream). However, this error string is not matched by AgentSession._isRetryableError, so pi does not auto-retry it — the error is shown to the user and the turn ends.

This appears to be the same pattern as #2892 (request ended without sending any chunks) and #3594 (http2 request did not get a response): a new "premature stream end" error string was introduced, but the retry regex was not updated to cover it.

Steps to Reproduce

  1. Run a long pi session against an Anthropic model (e.g. claude-sonnet-4-5) over a flaky/proxied connection.
  2. Eventually a stream terminates before message_stop (display sleep, VPN proxy timeout, transient TCP drop, etc.).
  3. pi shows Anthropic stream ended before message_stop and stops. Auto-retry does not kick in even with retry enabled.

Expected Behaviour

The error is treated as retryable, same as other transient transport errors (socket hang up, terminated, ended without, etc.).

Actual Behaviour

The error is surfaced to the user and the turn ends. The user has to manually re-prompt to continue.

Root Cause

packages/coding-agent/src/core/agent-session.ts (_isRetryableError, around dist/core/agent-session.js:1922 in v0.73.0) has a regex covering many transport errors but does not include the phrase produced by the SDK when the stream closes before message_stop:

return /overloaded|provider.?returned.?error|rate.?limit|too many requests|429|500|502|503|504|service.?unavailable|server.?error|internal.?error|network.?error|connection.?error|connection.?refused|connection.?lost|websocket.?closed|websocket.?error|other side closed|fetch failed|upstream.?connect|reset before headers|socket hang up|ended without|http2 request did not get a response|timed? out|timeout|terminated|retry delay/i.test(err);

The Anthropic SDK throws errors with messages like:

  • stream ended without producing a Message with role=assistant
  • stream ended without producing a content block with type=text
  • Anthropic stream ended before message_stop

The first two happen to match ended without. The third does not match anything.

Suggested Fix

Add message_stop|stream ended (or similar) to the regex so all three SDK premature-termination messages are covered:

|ended without|message_stop|stream ended|http2 request did not get a response|

Environment

  • pi v0.73.0
  • macOS, Anthropic provider via OAuth

Metadata

Metadata

Assignees

No one assigned

    Labels

    closed-because-refactorClosed while the project refactor is in progressinprogressIssue is being worked on

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions