Goal
When a model stream is interrupted after it has started, PawWork should not leave the session feeling dead or mysterious. It should explain what happened in plain language and offer the safest next step.
This follows #794 and complements #803:
Target user-facing behavior:
The connection broke while PawWork was preparing the next step. No tool was executed. You can safely continue.
Scope
In scope:
- Add a recovery policy for interrupted LLM runs based on already-recorded run facts:
- whether provider progress was seen,
- whether visible assistant output was shown,
- whether a tool call fully materialized,
- whether tool execution started,
- whether unsafe side effects may have started,
- whether side-effect facts are complete.
- Avoid treating every interruption as a terminal session failure.
- Show a calm, user-readable interruption state in the session UI.
- Provide the right next action:
- auto-retry only when nothing visible and no tool side effects happened,
- offer a one-click Continue/Resume when visible text exists but no tool executed,
- require explicit user confirmation when tool execution or unsafe side effects may have started,
- explain when recovery is unsafe or unknown.
- Prevent duplicate user-visible text or duplicate tool execution after retry/resume.
Out of scope:
Relevant files or context
Observed failure case:
- GPT-5.5 stream started normally.
- The assistant showed user-visible text.
- The model began producing an
enter-worktree tool call.
- Tool input did not finish.
- The tool call did not materialize.
- Tool execution did not start.
- The stream failed with
TypeError: terminated, caused by SocketError: other side closed, UND_ERR_SOCKET.
- Current run observability says
do_not_auto_retry because visible output was seen. That is cautious, but the UI should offer a safe resume path instead of leaving the session stuck.
Related issues/PRs:
Proposed recovery matrix
- No visible output, no tool call, no tool execution:
- Safe to auto-retry once with backoff.
- Visible output, partial tool input, no tool execution:
- Do not silently replay the same visible text.
- Offer Continue/Resume and preserve the existing transcript.
- Tool call completed, tool execution did not start:
- Usually safe to re-run the assistant turn, but the UI should say no tool ran.
- Read-only tool execution started/completed:
- May auto-resume if side-effect facts are complete and the tool is known read-only.
- Unsafe or unknown side-effect tool started:
- Do not auto-retry.
- Ask the user before continuing.
- Side-effect facts incomplete:
- Prefer confirmation over automation.
Verification
- Add tests for the recovery-policy matrix above.
- Add a fixture matching the observed GPT-5.5
UND_ERR_SOCKET case: visible output seen, partial tool input, no tool execution.
- Confirm that case presents a Continue/Resume path rather than only a terminal error.
- Confirm auto-retry is limited to safe cases and capped, with backoff.
- Confirm unsafe/unknown side-effect cases never auto-repeat tools.
- Confirm the UI copy is plain-language and does not expose raw
terminated as the primary message.
- Manually verify the session page still behaves correctly after an interrupted run and after using Continue/Resume.
Execution mode
Investigate and propose a plan first — the agent must post the plan as an issue comment and wait for an explicit "approved" comment before writing code or opening a PR.
Goal
When a model stream is interrupted after it has started, PawWork should not leave the session feeling dead or mysterious. It should explain what happened in plain language and offer the safest next step.
This follows #794 and complements #803:
Target user-facing behavior:
Scope
In scope:
Out of scope:
Relevant files or context
Observed failure case:
enter-worktreetool call.TypeError: terminated, caused bySocketError: other side closed,UND_ERR_SOCKET.do_not_auto_retrybecause visible output was seen. That is cautious, but the UI should offer a safe resume path instead of leaving the session stuck.Related issues/PRs:
Proposed recovery matrix
Verification
UND_ERR_SOCKETcase: visible output seen, partial tool input, no tool execution.terminatedas the primary message.Execution mode
Investigate and propose a plan first — the agent must post the plan as an issue comment and wait for an explicit "approved" comment before writing code or opening a PR.