You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want PawWork to recover faster when GPT/reasoning model connections stall before the provider produces any content.
Recent production session exports showed repeated failures where OpenAI gpt-5.5 never reached first provider progress. PawWork waited the reasoning-model connect watchdog ceiling (120000ms) each time before deciding the connection was interrupted. Once #914 lands, these early no-output/no-tool failures can be retried safely, but the first attempt can still leave the user waiting for roughly two minutes before recovery begins.
Which area would this change affect?
Model harness, prompts, tools, or session mechanics
What do you do today?
Today, when a reasoning model connection stalls before first provider progress, PawWork waits up to 120 seconds before it can retry or show recovery. This is especially painful for high-frequency GPT usage because one provider/network wobble can freeze the visible workflow for two minutes even though the attempt produced no output and ran no tools.
The 120-second ceiling was introduced by #758 for a real reason: some reasoning-model runs can take more than 30 seconds before first observable provider progress. That means simply reverting to the default 30-second connect watchdog would risk false timeouts on legitimate slow starts.
What would a good result look like?
PawWork should fail faster on the first stalled connection while still giving legitimate slow reasoning-model starts enough room to complete.
Let the automatic retry keep the longer 120s ceiling, so a legitimate slow first-progress run is not killed repeatedly.
Keep conservative behavior for any attempt that reaches final text, tool input, tool call materialization, tool execution, provider-executed capability, external boundary, user cancel, lifecycle close, quota, context overflow, or another non-retryable error.
The exact timeout values and user-visible behavior should be discussed before implementation because this affects the trade-off between faster recovery and false-positive timeouts.
What would count as done?
The issue has an agreed timeout strategy documented in a comment before implementation.
Two May 26 production exports showed five failures clustered within about nine minutes, all before first provider progress, with no output and no tool activity:
What task are you trying to do?
I want PawWork to recover faster when GPT/reasoning model connections stall before the provider produces any content.
Recent production session exports showed repeated failures where OpenAI
gpt-5.5never reached first provider progress. PawWork waited the reasoning-model connect watchdog ceiling (120000ms) each time before deciding the connection was interrupted. Once #914 lands, these early no-output/no-tool failures can be retried safely, but the first attempt can still leave the user waiting for roughly two minutes before recovery begins.Which area would this change affect?
Model harness, prompts, tools, or session mechanics
What do you do today?
Today, when a reasoning model connection stalls before first provider progress, PawWork waits up to 120 seconds before it can retry or show recovery. This is especially painful for high-frequency GPT usage because one provider/network wobble can freeze the visible workflow for two minutes even though the attempt produced no output and ran no tools.
The 120-second ceiling was introduced by #758 for a real reason: some reasoning-model runs can take more than 30 seconds before first observable provider progress. That means simply reverting to the default 30-second connect watchdog would risk false timeouts on legitimate slow starts.
What would a good result look like?
PawWork should fail faster on the first stalled connection while still giving legitimate slow reasoning-model starts enough room to complete.
A likely direction to evaluate after #914:
The exact timeout values and user-visible behavior should be discussed before implementation because this affects the trade-off between faster recovery and false-positive timeouts.
What would count as done?
What should stay out of scope?
Which audience does this matter to most?
Both
Extra context
Related work:
gpt-5.5could legitimately exceed the previous 30s ceiling.pawwork-session-proud-comet-2026-05-26-02-33-44.jsonpawwork-session-sunny-meadow-2026-05-26-02-31-46.jsonObserved failure shapes:
watchdog_timeout/connect, after120000mswithout provider progress.provider_transport_disconnectbefore first provider progress, withUND_ERR_SOCKET/other side closed.This should be treated as a follow-up design/implementation slice after #914 rather than being folded into #914.