Skip to content

fix(provider): time out a stalled SSE stream instead of hanging (#3374)#3745

Merged
esengine merged 2 commits into
main-v2from
fix/3374-stream-idle-timeout
Jun 10, 2026
Merged

fix(provider): time out a stalled SSE stream instead of hanging (#3374)#3745
esengine merged 2 commits into
main-v2from
fix/3374-stream-idle-timeout

Conversation

@esengine

Copy link
Copy Markdown
Owner

Problem (#3374)

When an API stream stalls on a half-open connection — a proxy switched mid-stream, so the TCP connection is dead but sends no RST — scanner.Scan() blocks forever. The UI sits on "thinking", and because the read goroutine never returns, the session can only be killed with kill -9. The reporter captured all threads parked in futex_wait with no active sockets.

Fix

Both stream readers (openai-compatible and anthropic) now run an idle watchdog: if a started stream goes silent past streamIdleTimeout (120s), the response body is closed, scanner.Scan() unblocks, and a clear stream stalled — connection likely dropped error surfaces, returning the session to the input prompt instead of hanging.

  • The watchdog owns the timer; the read loop only does a non-blocking send on a buffered channel, so there's no Timer.Reset race.
  • 120s is generous on purpose — live streams emit tokens/keepalives far more often, so a normal slow reasoner is never cut. Time-to-first-token is still covered separately by ResponseHeaderTimeout.
  • The openai idempotent-replay path ([Feature]: Reasonix 用的是长连接(SSE 流式请求),sing-box 可能对这种连接处理有问题。 #3148) is untouched: a stall surfaces as a normal (non-IsConnReset) error, so it doesn't trigger a replay loop on a persistently bad connection.

Note: the in-flight Ctrl+C interrupt half of the report (SIGINT should cancel the turn, not exit) is already handled on main-v2 — the chat TUI maps Ctrl+C while running to ctrl.Cancel(). This PR closes the remaining "hangs forever with no key working" half.

Tests

TestStreamStallTimesOut in each provider: a mock server sends the SSE head then stalls; with the watchdog the stream surfaces a stall error in ~150ms (test override), where before it hung to the test deadline. Existing reconnect/streaming tests still pass.

Closes #3374

@
fix(provider): time out a stalled SSE stream instead of hanging

A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan()
blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the
only way out was kill -9. Add an idle watchdog to both the openai-compatible and
anthropic stream readers — if a started stream goes silent past streamIdleTimeout
the body is closed and a clear "stream stalled" error surfaces, returning the
session to the input prompt. The watchdog owns the timer; the read loop pings a
buffered channel, so there is no Timer.Reset race, and the openai replay path
(#3148) is untouched. Closes #3374.
@
@esengine esengine requested a review from SivanCola as a code owner June 10, 2026 00:40
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development provider Model providers & selection (internal/provider) labels Jun 10, 2026
The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default.
@esengine esengine merged commit 9976e6b into main-v2 Jun 10, 2026
20 of 21 checks passed
@esengine esengine deleted the fix/3374-stream-idle-timeout branch June 10, 2026 00:59
SuMuxi66 pushed a commit to SuMuxi66/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
…gine#3374) (esengine#3745)

* @
fix(provider): time out a stalled SSE stream instead of hanging

A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan()
blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the
only way out was kill -9. Add an idle watchdog to both the openai-compatible and
anthropic stream readers — if a started stream goes silent past streamIdleTimeout
the body is closed and a clear "stream stalled" error surfaces, returning the
session to the input prompt. The watchdog owns the timer; the read loop pings a
buffered channel, so there is no Timer.Reset race, and the openai replay path
(esengine#3148) is untouched. Closes esengine#3374.
@

* fix(provider): make the SSE idle timeout per-client, not a global var

The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
dorokuma pushed a commit to dorokuma/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
…gine#3374) (esengine#3745)

* @
fix(provider): time out a stalled SSE stream instead of hanging

A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan()
blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the
only way out was kill -9. Add an idle watchdog to both the openai-compatible and
anthropic stream readers — if a started stream goes silent past streamIdleTimeout
the body is closed and a clear "stream stalled" error surfaces, returning the
session to the input prompt. The watchdog owns the timer; the read loop pings a
buffered channel, so there is no Timer.Reset race, and the openai replay path
(esengine#3148) is untouched. Closes esengine#3374.
@

* fix(provider): make the SSE idle timeout per-client, not a global var

The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider Model providers & selection (internal/provider) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: CLI中API 请求挂起时 Ctrl+C 无响应,正常时 SIGINT 直接退出而非中断请求

1 participant