fix(provider): replay an SSE stream cut before any token by esengine · Pull Request #3161 · esengine/DeepSeek-Reasonix

esengine · 2026-06-05T01:44:26Z

Problem

With a local proxy in front of the API (v2rayN / sing-box core), a streaming turn dies mid-flight:

deepseek-flash: read stream: read tcp 127.0.0.1:7442->127.0.0.1:10808:
wsarecv: An existing connection was forcibly closed by the remote host.

The dominant trigger: a reasoner has a long time-to-first-token (prefill + thinking) during which no bytes flow, so the proxy treats the long-lived SSE connection as idle and forcibly closes (RST) it — before any token reaches us. SendWithRetry only covers the connect + header phase; once the body streams, a drop was surfaced as a hard error, failing the whole turn.

Fix

The OpenAI provider (which DeepSeek uses) now drives the stream through streamWithReconnect:

If the connection resets before any model output has been forwarded, the request is replayed from scratch. That window is exactly the idle-prefill case the proxy kills, and a replay is idempotent — under prompt caching the resent prefix is a cache hit, so recovery is cheap.
Once a token (reasoning / text / tool-call) has streamed, a reset is surfaced as an error instead — replaying would duplicate already-visible output.
Bounded at maxStreamReconnects (3); each replay still gets SendWithRetry's header-phase backoff.

provider.IsConnReset gates the decision strictly on connection-level errors (peer reset, truncated EOF, closed socket via net.Error / ECONNRESET / io.ErrUnexpectedEOF), so decode errors and 4xx still fail fast — no silent masking.

readStream was refactored to return (emitted bool, err error) rather than emitting the terminal error itself, so the reconnect wrapper owns the replay-or-surface decision and channel lifecycle.

Tests

TestIsConnReset — only connection-level drops count; ctx-cancel and protocol errors don't.
TestStreamReconnectsOnEarlyConnReset — a mock that force-RSTs the first SSE connection (zero tokens) then serves a full stream; the caller sees one clean stream, server takes 2 requests.
TestStreamDoesNotReplayAfterOutput — a reset after a token surfaces a ChunkError, no replay (server takes 1 request).

End-to-end verification

Built the real reasonix CLI and ran reasonix run against a mock DeepSeek endpoint that force-RSTs the first stream connection mid-flight (before any token), then serves the answer on retry. Full stack (boot → controller → provider):

$ reasonix run "Reply with the recovery token"
E2E_OK_RECOVERED
  · 10 tok · in 7 (0 cached / 7 new) · out 3
exit 0

The answer is only served on the second (reconnected) attempt — the turn recovered transparently with no error, exit 0.

Scope: this lands on the OpenAI provider (the one DeepSeek uses, and the one in the report). The Anthropic provider has the same readStream shape and can take the same treatment in a follow-up.

Closes #3148

A local proxy (v2rayN/sing-box) idle-closes the long-lived streaming connection during a reasoner's first-token gap, surfacing as "read stream: ... wsarecv: An existing connection was forcibly closed". The turn failed even though nothing had been emitted yet. The OpenAI provider now replays the request when the connection resets before any model output has been forwarded — safe and idempotent, and a cache hit on the resent prefix. Once a token has streamed, the error is surfaced instead, since a replay would duplicate visible output. Capped at a few reconnects. IsConnReset gates strictly on connection-level errors so decode/4xx failures still fail fast. Closes #3148

esengine requested a review from SivanCola as a code owner June 5, 2026 01:44

github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 5, 2026

esengine merged commit d5df0e5 into main-v2 Jun 5, 2026
9 checks passed

esengine deleted the fix/3148-stream-reconnect branch June 5, 2026 01:50

Bernardxu123 mentioned this pull request Jun 5, 2026

[Meta] Issues 集合审核帖 — 待关闭 Issue 汇总 & 回复模板 #3246

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): replay an SSE stream cut before any token#3161

fix(provider): replay an SSE stream cut before any token#3161
esengine merged 1 commit into
main-v2from
fix/3148-stream-reconnect

esengine commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented Jun 5, 2026

Problem

Fix

Tests

End-to-end verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant