fix(provider): replay an SSE stream cut before any token#3161
Merged
Conversation
A local proxy (v2rayN/sing-box) idle-closes the long-lived streaming connection during a reasoner's first-token gap, surfacing as "read stream: ... wsarecv: An existing connection was forcibly closed". The turn failed even though nothing had been emitted yet. The OpenAI provider now replays the request when the connection resets before any model output has been forwarded — safe and idempotent, and a cache hit on the resent prefix. Once a token has streamed, the error is surfaced instead, since a replay would duplicate visible output. Capped at a few reconnects. IsConnReset gates strictly on connection-level errors so decode/4xx failures still fail fast. Closes #3148
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
With a local proxy in front of the API (v2rayN / sing-box core), a streaming turn dies mid-flight:
The dominant trigger: a reasoner has a long time-to-first-token (prefill + thinking) during which no bytes flow, so the proxy treats the long-lived SSE connection as idle and forcibly closes (RST) it — before any token reaches us.
SendWithRetryonly covers the connect + header phase; once the body streams, a drop was surfaced as a hard error, failing the whole turn.Fix
The OpenAI provider (which DeepSeek uses) now drives the stream through
streamWithReconnect:maxStreamReconnects(3); each replay still getsSendWithRetry's header-phase backoff.provider.IsConnResetgates the decision strictly on connection-level errors (peer reset, truncated EOF, closed socket vianet.Error/ECONNRESET/io.ErrUnexpectedEOF), so decode errors and 4xx still fail fast — no silent masking.readStreamwas refactored to return(emitted bool, err error)rather than emitting the terminal error itself, so the reconnect wrapper owns the replay-or-surface decision and channel lifecycle.Tests
TestIsConnReset— only connection-level drops count; ctx-cancel and protocol errors don't.TestStreamReconnectsOnEarlyConnReset— a mock that force-RSTs the first SSE connection (zero tokens) then serves a full stream; the caller sees one clean stream, server takes 2 requests.TestStreamDoesNotReplayAfterOutput— a reset after a token surfaces aChunkError, no replay (server takes 1 request).End-to-end verification
Built the real
reasonixCLI and ranreasonix runagainst a mock DeepSeek endpoint that force-RSTs the first stream connection mid-flight (before any token), then serves the answer on retry. Full stack (boot → controller → provider):The answer is only served on the second (reconnected) attempt — the turn recovered transparently with no error, exit 0.
Scope: this lands on the OpenAI provider (the one DeepSeek uses, and the one in the report). The Anthropic provider has the same
readStreamshape and can take the same treatment in a follow-up.Closes #3148