feat(session): add run incident diagnostics by Astro-Han · Pull Request #812 · Astro-Han/pawwork

Astro-Han · 2026-05-21T04:14:13Z

Summary

Adds PR1 for #808's Run Incident Framework: a structured run-incident model, append-only evidence derivation, and sanitized incident export beside existing run observability diagnostics.

Why

#803 showed a concrete misclassification: a provider transport disconnect during partial tool-input generation could be summarized as tool_failure.tool_execution_failed even though no tool execution started. This PR separates tool-call generation from execution and derives the terminal cause from ordered evidence so later cleanup/finalizer records cannot overwrite the initiating failure.

Related Issue

Part of #808 Phase 1 / PR1 and directly addresses #803. Builds on #788 and #794. Does not implement #802's full lifecycle causal spine or #804 recovery UX.

Human Review Status

Pending

Review Focus

Whether RunIncident fields stay derived from append-only evidence rather than mutable summary overwrites.
Whether partial tool-input transport failures classify as provider_transport_disconnect and not tool execution failure.
Whether exported evidence is bounded and sanitized, with raw tool input/private paths/secrets omitted.
Whether PR scope stays diagnostic/export-only with no recovery UI or retry behavior change.

Risk Notes

Diagnostic/export-only change: no automatic retry, no Continue/Resume API, no recovery card, and no provider SDK behavior change.
Legacy classification, summary_key, and retry_safety remain for compatibility; new export surfaces run_incidents as the authoritative structured path when present.
Lifecycle origin remains the feat(session): trace lifecycle close provenance #794 level in this PR; immutable lifecycle snapshots and full [Feature] Add lifecycle causality diagnostics for interrupted runs #802 causal chain are follow-ups.
Visible UI/copy check skipped: no visible UI changed.
Platform/packaging check skipped: no macOS/Windows packaging, updater, signing, shell, or permission surface changed.

How To Verify

Red test first: bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — failed on missing RunIncident fields/API and missing run_incidents export
Run observability + export tests: bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — 54 pass, 0 fail
Typecheck: bun run typecheck from packages/opencode — passed
Whitespace check: git diff --check — passed

Screenshots or Recordings

Not applicable — no visible UI changes.

Checklist

How to use this checklist:

Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.

The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.

Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.

Summary by CodeRabbit

Release Notes

New Features
- Comprehensive incident diagnostics with detailed evidence tracking, cause analysis, and recovery recommendations to help understand and address execution failures
- Enhanced tool execution observability with improved phase tracking and state transition visibility
- More actionable diagnostic information providing guidance on failure recovery and safety considerations

coderabbitai · 2026-05-21T04:14:20Z

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 27 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ab85ebe6-c0d2-4283-bf59-721f82611242

📥 Commits

Reviewing files that changed from the base of the PR and between ab51942 and 01d3734.

📒 Files selected for processing (4)

packages/opencode/src/session/run-incident/derive.ts
packages/opencode/src/session/run-incident/sanitize.ts
packages/opencode/src/session/run-observability/recorder.ts
packages/opencode/test/session/run-observability.test.ts

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive run-incident framework that captures, analyzes, and reports terminal conditions during session execution. It derives incidents from collected observability evidence, computes recovery recommendations and user-facing summaries, sanitizes evidence with redaction rules, integrates incident collection into the run-observability recorder during execution, exports incidents as top-level snapshot diagnostics, and refines observability instrumentation in the event processor.

Changes

Run Incident Framework and Observability Integration

Layer / File(s)	Summary
Run Incident Data Contract and Public API `packages/opencode/src/session/run-incident/types.ts`, `packages/opencode/src/session/run-incident/index.ts`	Defines the complete incident data contract (evidence events, terminal causes, phases, facts, provenance, recovery decisions, and top-level `RunIncident` shape). Exposes a `RunIncident` namespace facade with schema version, derivation, policy, presentation, and sanitization functions alongside incident-related types.
Incident Derivation from Evidence `packages/opencode/src/session/run-incident/derive.ts`	Converts `IncidentEvidenceEvent` streams into materialized `RunIncident` objects. Selects terminal evidence by cause presence and timestamp ordering, derives facts with optional attempt scoping, computes recovery/summaries, builds provenance/lifecycle information, and sanitizes payloads. Includes helpers for event ordering, fact extraction (with conditional unsafe-side-effect exposure), tool boundary summarization, phase mapping, and provider transport-disconnect cause construction.
Incident Recovery Policy and Presentation `packages/opencode/src/session/run-incident/policy.ts`, `packages/opencode/src/session/run-incident/presentation.ts`	`recoveryFor` computes `RecoveryDecision` from terminal cause and facts by branching on cause categories and lifecycle/side-effect/tool-execution progress. `userSummary` builds i18n-keyed presentation with action and severity mapping; `plainSummary` returns English cause-specific strings. Includes internal helpers for action-key and severity mapping.
Incident Evidence Sanitization and Bounding `packages/opencode/src/session/run-incident/sanitize.ts`	`sanitizeIncident` clones and rebinds evidence via `boundEvidence`, selecting orders from early/late slices and terminal/cleanup windows, truncating to export limit, and appending `evidence_omitted` markers. `sanitizeEvidence` redacts `redactions` array items via rules that trim, normalize empty values, redact URLs/paths and credentials, and normalize disallowed characters.
Run-Observability Recorder: Evidence Collection and Incident Derivation `packages/opencode/src/session/run-observability/recorder.ts`, `packages/opencode/src/session/run-observability/types.ts`	Recorder extends state tracking tool-input/tool-call phases and interruptions. Adds `appendEvidence` helper to accumulate `RunIncident.EvidenceEvent`s with monotonic ordering. Updates lifecycle handlers (`beginAttempt`, `recordProviderProgress`, `recordVisibleOutput`, `recordToolInputStarted/Completed/Materialized`, tool execution/failure/interrupt flows, `recordScopeClosed`) to emit evidence. Rewrites `finalize` to derive `incident` via `RunIncident.derive`, compute `classification` and `summary_key` from terminal cause via helper functions, and include incident in finalized `Summary`. Type contracts updated: `AttemptSummary` gains tool-phase and incident fields; `Recorder` interface gains new methods and `kind` parameter for visible output.
Processor Observability Instrumentation Updates `packages/opencode/src/session/processor.ts`	`handleEvent` now records visible output with distinct `kind` values (text vs reasoning) and replaces consolidated tool-call tracking with separate tool-input start/end records (including `providerExecuted`) and tool-call materialization. Cleanup interruption handling branches on tool part `state.status` to emit either `recordToolInterrupted` or `recordPendingToolPartInterrupted`.
Snapshot Export: Incident Collection and Sanitization `packages/opencode/src/session/export.ts`	Extends `Export.Snapshot["diagnostics"]` with optional `run_incident_schema_version` and `run_incidents`. Updates `Export.deriveSnapshotDiagnostics` to collect incidents from run-observability diagnostics and emit them with schema version. During sanitization, `sanitizeDiagnostics` populates incident schema/version and sanitizes payloads via `RunIncident.sanitize`; `sanitizeRunObservability` also sanitizes nested `incident` field.
Comprehensive Test Coverage `packages/opencode/test/session/export.test.ts`, `packages/opencode/test/session/run-observability.test.ts`	`export.test.ts` updates fixtures with tool-phase flags and adds `sanitizeSnapshot` test validating incident schema, terminal cause, evidence details, and sensitive-data redaction. `run-observability.test.ts` significantly expands coverage: provider-transport incident derivation, evidence bounding/omission, terminal-cause ordering, provenance precedence, terminal-vs-run-level facts separation, recovery behaviors for materialized tool calls (safe/unsafe/unknown with confirmation), boundary persistence across attempts, and tool-name sanitization in evidence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Astro-Han/pawwork#802: Directly implements the lifecycle-causality diagnostics and incident-chain export framework requested by this issue.
Astro-Han/pawwork#808: Implements the concrete RunIncident framework (types, derive, sanitize, recorder integration, export changes) that realizes the design goals described in the issue.

Possibly related PRs

Astro-Han/pawwork#563: Both PRs modify export.ts's deriveSnapshotDiagnostics/sanitizeDiagnostics to extend Export.Snapshot["diagnostics"] with new nested fields (run_incidents vs aborts/title_generations), so changes are directly connected at the export-schema and sanitization code level.
Astro-Han/pawwork#264: Both PRs populate Snapshot.diagnostics by traversing the session tree; main PR adds run_incidents extraction from run-observability, while retrieved PR adds loop.last derivation from loop-gate metadata, making them directly connected in the same export/diagnostics derivation pathway.

Poem

🐰 Behold the incident framework hops into place,
Evidence trails guide recovery's embrace,
With facts and phases, each terminal cause found,
The recorder leaps onward, diagnostics abound!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(session): add run incident diagnostics' clearly and concisely describes the main change—introducing run incident diagnostics to the session module. It follows Conventional Commits format and directly reflects the primary objective of this PR.
Description check	✅ Passed	The description is comprehensive and well-structured. It includes all required sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, and a completed Checklist with most items ticked. All essential information is present and properly formatted.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pawwork/issue-808-phase1-pr1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

gemini-code-assist

Code Review

This pull request introduces a comprehensive 'Run Incident' diagnostic system to track and report failures during session execution. It adds granular tracking for tool input generation and materialization, a new RunIncident module for deriving and sanitizing incident data, and logic for recovery recommendations. Feedback includes restoring state-downgrade guards in the recorder, removing redundant methods, preserving sanitized tool names in evidence for better diagnostics, and ensuring all defined stream phases are handled in the derivation logic.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

packages/opencode/src/session/run-observability/recorder.ts (1)
44-44: 💤 Low value

Unused variable: toolExecutionCompleted is set but never read.

This variable is assigned in recordToolCompleted (line 256) but is never used in finalize() or elsewhere. The completion state is already captured in evidence events, so this flag may be dead code.

Consider removing if unneeded, or add a TODO if it's intended for future use.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/run-observability/recorder.ts` at line 44, The
variable toolExecutionCompleted is assigned but never read—remove the dead
variable declaration and its assignments (references in recordToolCompleted)
since completion is already captured by evidence events, or if you intend to
keep it for future use add a clear TODO comment where toolExecutionCompleted is
declared and where it’s set in recordToolCompleted and ensure finalize() or
other consumers will use it; update/remove any related tests or code paths that
reference toolExecutionCompleted to keep the module consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/run-incident/derive.ts`:
- Around line 140-150: streamPhase mapping mislabels provider progress and omits
"reasoning_generation"; update the ternary in phaseFor so
input.facts.provider_progress_seen maps to "reasoning_generation" (replace the
current "before_first_provider_progress" value) and keep the other branches the
same; reference the streamPhase variable and input.facts.provider_progress_seen
when making this single-value change so incident stream diagnostics include
"reasoning_generation".

In `@packages/opencode/src/session/run-incident/sanitize.ts`:
- Around line 27-33: selectedEvents is naively sliced from the front which can
drop explicitly-selected terminal/cleanup anchors; change the capping logic so
that when selectedEvents.length > MAX_EXPORTED_EVIDENCE you always preserve
events whose anchor indicates terminal/cleanup: compute anchors =
selectedEvents.filter(e => e.anchor === 'terminal' || e.anchor === 'cleanup'),
then build the final selectedEvents by taking the newest non-anchor items (from
the end) up to cap - anchors.length and append the anchors (dedupe by order),
update selectedOrders/omitted accordingly so omitted only contains truly
unselected events; use the existing symbols selectedEvents,
MAX_EXPORTED_EVIDENCE, selectedOrders, omitted and evidence to implement this
replacement.

In `@packages/opencode/test/session/run-observability.test.ts`:
- Around line 479-487: The assertion for summary.incident?.phase is
inconsistent: since the terminal attempt (second.attemptID) has provider
progress recorded earlier, change the expected stream_phase from
"before_first_provider_progress" to a phase that reflects progress during the
terminal attempt (e.g., "during_provider_progress") so the phase assertion
matches the recorded terminal-attempt events; keep terminal_attempt_id:
second.attemptID unchanged and ensure the new value aligns with the event names
used elsewhere in the test.

---

Nitpick comments:
In `@packages/opencode/src/session/run-observability/recorder.ts`:
- Line 44: The variable toolExecutionCompleted is assigned but never read—remove
the dead variable declaration and its assignments (references in
recordToolCompleted) since completion is already captured by evidence events, or
if you intend to keep it for future use add a clear TODO comment where
toolExecutionCompleted is declared and where it’s set in recordToolCompleted and
ensure finalize() or other consumers will use it; update/remove any related
tests or code paths that reference toolExecutionCompleted to keep the module
consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 28cde99c-d83a-4c26-bb1d-9b248b5c754c

📥 Commits

Reviewing files that changed from the base of the PR and between 220659f and ab51942.

📒 Files selected for processing (12)

packages/opencode/src/session/export.ts
packages/opencode/src/session/processor.ts
packages/opencode/src/session/run-incident/derive.ts
packages/opencode/src/session/run-incident/index.ts
packages/opencode/src/session/run-incident/policy.ts
packages/opencode/src/session/run-incident/presentation.ts
packages/opencode/src/session/run-incident/sanitize.ts
packages/opencode/src/session/run-incident/types.ts
packages/opencode/src/session/run-observability/recorder.ts
packages/opencode/src/session/run-observability/types.ts
packages/opencode/test/session/export.test.ts
packages/opencode/test/session/run-observability.test.ts

feat(session): add run incident diagnostics

1d982e2

github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels May 21, 2026

github-actions Bot reviewed May 21, 2026

View reviewed changes

Astro-Han added enhancement New feature or request P1 High priority and removed P2 Medium priority labels May 21, 2026

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Astro-Han added 5 commits May 21, 2026 12:53

fix(session): preserve run incident evidence

70b3865

fix(session): preserve incident tool names

21710f9

fix(session): classify materialized tool recovery

99ce72a

fix(session): aggregate materialized tool risk

ac1f230

fix(session): include run materialized recovery

ab51942

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread packages/opencode/src/session/run-incident/derive.ts Outdated

Comment thread packages/opencode/src/session/run-incident/sanitize.ts Outdated

Comment thread packages/opencode/test/session/run-observability.test.ts

Astro-Han added 2 commits May 21, 2026 14:47

fix(session): harden incident stream phases

bac5cde

fix(session): preserve capped incident anchors

01d3734

Astro-Han merged commit fda7f5f into dev May 21, 2026
24 checks passed

Astro-Han deleted the pawwork/issue-808-phase1-pr1 branch May 21, 2026 07:19

Astro-Han mentioned this pull request May 21, 2026

fix(session): guard run incident tool cleanup #822

Merged

13 tasks

coderabbitai Bot mentioned this pull request May 21, 2026

feat(session): add lifecycle causality diagnostics #825

Merged

13 tasks

Astro-Han mentioned this pull request May 23, 2026

[Feature] Define Run Incident Framework for interrupted runs #808

Closed

This was referenced May 23, 2026

fix: gate interrupted stream retries on recovery facts #863

Merged

fix: allow safe retry before provider progress #914

Merged

refactor(session): extract safe recovery gate #929

Merged

coderabbitai Bot mentioned this pull request Jun 10, 2026

fix: drain tool results after stream disconnect #1252

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): add run incident diagnostics#812

feat(session): add run incident diagnostics#812
Astro-Han merged 8 commits into
devfrom
pawwork/issue-808-phase1-pr1

Astro-Han commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading