You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PawWork's 30-second "first provider progress" watchdog (CONNECT_STREAM_TIMEOUT_MS in packages/opencode/src/session/llm.ts:31) aborts in-flight reasoning-model streams when the model spends more than 30 seconds on internal reasoning before emitting any event that isProviderProgressEvent (L546-561) whitelists. The whitelist only counts text-* / reasoning-* / tool-input-* / tool-call / tool-result / tool-error as progress, so connection establishment, the synthetic start envelope, and start-step do not reset the timer. With OpenAI gpt-5.5 (reasoning-capable, default reasoningEffort: "medium" from provider/transform.ts:1185), 14 active tools, and a long session, first-chunk latency reproducibly exceeds 30s and the watchdog kills an otherwise-healthy stream.
The error reaches the user as UnknownError: LLM stream connection timed out after 30000ms without provider progress. SessionRetry.policy.retryable() (packages/opencode/src/session/retry.ts:55-105) does not classify this local error as retryable, so there is no automatic retry. This is distinct from #728 / PR #729: that PR fixed the timer starting before the HTTP request was actually sent, and the build in this report already includes that fix. The residual issue is the 30s ceiling itself, which #729's body explicitly deferred.
Which area seems affected?
Model harness, prompts, tools, or session mechanics
How much does this affect you?
Breaks an important workflow
Steps to reproduce
Open a long-running build-agent session with OpenAI gpt-5.5 (or a comparable reasoning-capable model at reasoningEffort: "medium" or higher).
Let the model run several tool-call rounds so the session accumulates meaningful context (this report: 269 messages / 1063 parts).
Issue a follow-up turn whose first model action requires non-trivial reasoning before any text or tool-input chunk.
Occasionally observe the assistant message fail with the 30000ms timeout error before any provider chunk is received.
What did you expect to happen?
The assistant message completes, or, if the stream must be aborted, the retry policy attempts it again automatically and only surfaces a hard error after repeated failures, rather than failing on the first occurrence with no provider chunk ever received.
Trace counters from the session export: dur_ms: 30204, stream_events.start: 1, all other counters (start_step, text_*, reasoning_*, tool_input_*, tool_call, tool_result, tool_error, error, finish_step, finish) 0, tokens.input/output/reasoning: 0, flags.stream_error: true, flags.empty_completion: false. Provider emitted no error event; PawWork's watchdog aborted the stream.
The preceding trace msg_e3e96c8030015laTiPT5gmzjpD finished cleanly with finish_reason: tool-calls 17 seconds earlier, so this is not a stale connection. A user retry 16 seconds after the failure (msg_e3e97d832001lbotXyQzAOF05y) succeeded in 16.6s with 26 text deltas, confirming the model and account were healthy.
Investigation chain confirming this is the fix: defer LLM stream connect timeout to after HTTP request is sent #729 residual: grep for the error literal points to session/llm.ts:466. git log -- packages/opencode/src/session/llm.ts shows 610241905 fix: defer LLM stream connect timeout to after HTTP request is sent (#729) as the most recent change to that file. PR fix: defer LLM stream connect timeout to after HTTP request is sent #729's body explicitly defers two follow-ups — (1) SessionRetry.policy not treating connect timeouts as retryable, (2) connectTimeoutMs not being configurable end-to-end — and the build identifier in this report (0.0.0-prod-202605181651, built 2026-05-18 16:51) postdates the PR fix: defer LLM stream connect timeout to after HTTP request is sent #729 merge (2026-05-18 08:06 UTC), so the timer-start fix is present and the residual 30s ceiling is what fired here.
Full session export (pawwork-session-neon-orchid-2026-05-19-04-58-27-...json, ~5.1MB) available locally on request.
What happened?
PawWork's 30-second "first provider progress" watchdog (
CONNECT_STREAM_TIMEOUT_MSinpackages/opencode/src/session/llm.ts:31) aborts in-flight reasoning-model streams when the model spends more than 30 seconds on internal reasoning before emitting any event thatisProviderProgressEvent(L546-561) whitelists. The whitelist only countstext-*/reasoning-*/tool-input-*/tool-call/tool-result/tool-erroras progress, so connection establishment, the syntheticstartenvelope, andstart-stepdo not reset the timer. With OpenAIgpt-5.5(reasoning-capable, defaultreasoningEffort: "medium"fromprovider/transform.ts:1185), 14 active tools, and a long session, first-chunk latency reproducibly exceeds 30s and the watchdog kills an otherwise-healthy stream.The error reaches the user as
UnknownError: LLM stream connection timed out after 30000ms without provider progress.SessionRetry.policy.retryable()(packages/opencode/src/session/retry.ts:55-105) does not classify this local error as retryable, so there is no automatic retry. This is distinct from #728 / PR #729: that PR fixed the timer starting before the HTTP request was actually sent, and the build in this report already includes that fix. The residual issue is the 30s ceiling itself, which #729's body explicitly deferred.Which area seems affected?
Model harness, prompts, tools, or session mechanics
How much does this affect you?
Breaks an important workflow
Steps to reproduce
gpt-5.5(or a comparable reasoning-capable model atreasoningEffort: "medium"or higher).What did you expect to happen?
The assistant message completes, or, if the stream must be aborted, the retry policy attempts it again automatically and only surfaces a hard error after repeated failures, rather than failing on the first occurrence with no provider chunk ever received.
PawWork version
0.0.0-prod-202605181651OS version
macOS 26 (Darwin 25.4.0)
Can you reproduce it again?
Sometimes
Diagnostics
ses_1c1b6ccdbffes5qfwa7ovaOcLH. Failing assistant message:msg_e3e9723a10015WNXnu81BTQeXD.dur_ms: 30204,stream_events.start: 1, all other counters (start_step,text_*,reasoning_*,tool_input_*,tool_call,tool_result,tool_error,error,finish_step,finish)0,tokens.input/output/reasoning: 0,flags.stream_error: true,flags.empty_completion: false. Provider emitted noerrorevent; PawWork's watchdog aborted the stream.msg_e3e96c8030015laTiPT5gmzjpDfinished cleanly withfinish_reason: tool-calls17 seconds earlier, so this is not a stale connection. A user retry 16 seconds after the failure (msg_e3e97d832001lbotXyQzAOF05y) succeeded in 16.6s with 26 text deltas, confirming the model and account were healthy.session/llm.ts:466.git log -- packages/opencode/src/session/llm.tsshows610241905 fix: defer LLM stream connect timeout to after HTTP request is sent (#729)as the most recent change to that file. PR fix: defer LLM stream connect timeout to after HTTP request is sent #729's body explicitly defers two follow-ups — (1)SessionRetry.policynot treating connect timeouts as retryable, (2)connectTimeoutMsnot being configurable end-to-end — and the build identifier in this report (0.0.0-prod-202605181651, built 2026-05-18 16:51) postdates the PR fix: defer LLM stream connect timeout to after HTTP request is sent #729 merge (2026-05-18 08:06 UTC), so the timer-start fix is present and the residual 30s ceiling is what fired here.pawwork-session-neon-orchid-2026-05-19-04-58-27-...json, ~5.1MB) available locally on request.