Goal
Define a unified Run Incident Framework for assistant runs that end unexpectedly. This is the umbrella direction for #802, #803, and #804: do not fix each observed interruption as an isolated patch; build one shared language for facts, cause, phase, recovery policy, and user-facing presentation.
Why this exists
Recent terminated-session exports showed that PawWork can collect many useful low-level signals, but the signals are still too ad hoc:
The common problem is not just one missing field. The common problem is that PawWork needs a unified run-incident model.
Direction
A run incident should be split into five layers:
- Facts — what actually happened.
- Cause — why the run ended.
- Phase — where the run was interrupted.
- Policy — what recovery action is safe.
- Presentation — what export reviewers and users should see.
The first design draft is recorded in the first comment on this issue.
Related work
Direct implementation issues:
Related active or residual reliability work:
Foundational completed work:
Series index:
Scope
In scope:
Out of scope:
- One giant implementation PR.
- Provider SDK/network fixes.
- Telemetry sink integration.
- Broad UI redesign.
Execution mode
Investigate and get the design plan approved first. Here, "plan" means the issue-level design / scope proposal, not a PR-level implementation checklist. Once the approved design exists, agents may proceed with implementation plans inside the agreed scope; post a new issue comment and wait for explicit "approved" only when the implementation would change that design scope.
Goal
Define a unified Run Incident Framework for assistant runs that end unexpectedly. This is the umbrella direction for #802, #803, and #804: do not fix each observed interruption as an isolated patch; build one shared language for facts, cause, phase, recovery policy, and user-facing presentation.
Why this exists
Recent terminated-session exports showed that PawWork can collect many useful low-level signals, but the signals are still too ad hoc:
The common problem is not just one missing field. The common problem is that PawWork needs a unified run-incident model.
Direction
A run incident should be split into five layers:
The first design draft is recorded in the first comment on this issue.
Related work
Direct implementation issues:
Related active or residual reliability work:
Foundational completed work:
Series index:
Scope
In scope:
Out of scope:
Execution mode
Investigate and get the design plan approved first. Here, "plan" means the issue-level design / scope proposal, not a PR-level implementation checklist. Once the approved design exists, agents may proceed with implementation plans inside the agreed scope; post a new issue comment and wait for explicit "approved" only when the implementation would change that design scope.