fix(provider): time out a stalled SSE stream instead of hanging (#3374)#3745
Merged
Conversation
fix(provider): time out a stalled SSE stream instead of hanging A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan() blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the only way out was kill -9. Add an idle watchdog to both the openai-compatible and anthropic stream readers — if a started stream goes silent past streamIdleTimeout the body is closed and a clear "stream stalled" error surfaces, returning the session to the input prompt. The watchdog owns the timer; the read loop pings a buffered channel, so there is no Timer.Reset race, and the openai replay path (#3148) is untouched. Closes #3374. @
The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default.
SuMuxi66
pushed a commit
to SuMuxi66/DeepSeek-Reasonix
that referenced
this pull request
Jun 10, 2026
…gine#3374) (esengine#3745) * @ fix(provider): time out a stalled SSE stream instead of hanging A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan() blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the only way out was kill -9. Add an idle watchdog to both the openai-compatible and anthropic stream readers — if a started stream goes silent past streamIdleTimeout the body is closed and a clear "stream stalled" error surfaces, returning the session to the input prompt. The watchdog owns the timer; the read loop pings a buffered channel, so there is no Timer.Reset race, and the openai replay path (esengine#3148) is untouched. Closes esengine#3374. @ * fix(provider): make the SSE idle timeout per-client, not a global var The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
dorokuma
pushed a commit
to dorokuma/DeepSeek-Reasonix
that referenced
this pull request
Jun 10, 2026
…gine#3374) (esengine#3745) * @ fix(provider): time out a stalled SSE stream instead of hanging A half-open connection (a proxy switched mid-stream, no RST) left scanner.Scan() blocked forever: the UI sat on "thinking", Ctrl+C/Esc were unresponsive, and the only way out was kill -9. Add an idle watchdog to both the openai-compatible and anthropic stream readers — if a started stream goes silent past streamIdleTimeout the body is closed and a clear "stream stalled" error surfaces, returning the session to the input prompt. The watchdog owns the timer; the read loop pings a buffered channel, so there is no Timer.Reset race, and the openai replay path (esengine#3148) is untouched. Closes esengine#3374. @ * fix(provider): make the SSE idle timeout per-client, not a global var The first cut used a package-level var so tests could shorten it, but a test writing it raced another test stream watchdog reading it under go test -race. Store the window on the client (defaultStreamIdleTimeout via New); tests set the field on their own client before Stream, so the go statement orders the write before the watchdog read. A zero-value client falls back to the default. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#3374)
When an API stream stalls on a half-open connection — a proxy switched mid-stream, so the TCP connection is dead but sends no RST —
scanner.Scan()blocks forever. The UI sits on "thinking", and because the read goroutine never returns, the session can only be killed withkill -9. The reporter captured all threads parked infutex_waitwith no active sockets.Fix
Both stream readers (openai-compatible and anthropic) now run an idle watchdog: if a started stream goes silent past
streamIdleTimeout(120s), the response body is closed,scanner.Scan()unblocks, and a clearstream stalled — connection likely droppederror surfaces, returning the session to the input prompt instead of hanging.Timer.Resetrace.ResponseHeaderTimeout.IsConnReset) error, so it doesn't trigger a replay loop on a persistently bad connection.Note: the in-flight Ctrl+C interrupt half of the report (SIGINT should cancel the turn, not exit) is already handled on
main-v2— the chat TUI maps Ctrl+C while running toctrl.Cancel(). This PR closes the remaining "hangs forever with no key working" half.Tests
TestStreamStallTimesOutin each provider: a mock server sends the SSE head then stalls; with the watchdog the stream surfaces a stall error in ~150ms (test override), where before it hung to the test deadline. Existing reconnect/streaming tests still pass.Closes #3374