Skip to content

feat(session): add run incident diagnostics#812

Merged
Astro-Han merged 8 commits into
devfrom
pawwork/issue-808-phase1-pr1
May 21, 2026
Merged

feat(session): add run incident diagnostics#812
Astro-Han merged 8 commits into
devfrom
pawwork/issue-808-phase1-pr1

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 21, 2026

Copy link
Copy Markdown
Owner

Summary

Adds PR1 for #808's Run Incident Framework: a structured run-incident model, append-only evidence derivation, and sanitized incident export beside existing run observability diagnostics.

Why

#803 showed a concrete misclassification: a provider transport disconnect during partial tool-input generation could be summarized as tool_failure.tool_execution_failed even though no tool execution started. This PR separates tool-call generation from execution and derives the terminal cause from ordered evidence so later cleanup/finalizer records cannot overwrite the initiating failure.

Related Issue

Part of #808 Phase 1 / PR1 and directly addresses #803. Builds on #788 and #794. Does not implement #802's full lifecycle causal spine or #804 recovery UX.

Human Review Status

Pending

Review Focus

  • Whether RunIncident fields stay derived from append-only evidence rather than mutable summary overwrites.
  • Whether partial tool-input transport failures classify as provider_transport_disconnect and not tool execution failure.
  • Whether exported evidence is bounded and sanitized, with raw tool input/private paths/secrets omitted.
  • Whether PR scope stays diagnostic/export-only with no recovery UI or retry behavior change.

Risk Notes

  • Diagnostic/export-only change: no automatic retry, no Continue/Resume API, no recovery card, and no provider SDK behavior change.
  • Legacy classification, summary_key, and retry_safety remain for compatibility; new export surfaces run_incidents as the authoritative structured path when present.
  • Lifecycle origin remains the feat(session): trace lifecycle close provenance #794 level in this PR; immutable lifecycle snapshots and full [Feature] Add lifecycle causality diagnostics for interrupted runs #802 causal chain are follow-ups.
  • Visible UI/copy check skipped: no visible UI changed.
  • Platform/packaging check skipped: no macOS/Windows packaging, updater, signing, shell, or permission surface changed.

How To Verify

Red test first: bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — failed on missing RunIncident fields/API and missing run_incidents export
Run observability + export tests: bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — 54 pass, 0 fail
Typecheck: bun run typecheck from packages/opencode — passed
Whitespace check: git diff --check — passed

Screenshots or Recordings

Not applicable — no visible UI changes.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

Release Notes

  • New Features
    • Comprehensive incident diagnostics with detailed evidence tracking, cause analysis, and recovery recommendations to help understand and address execution failures
    • Enhanced tool execution observability with improved phase tracking and state transition visibility
    • More actionable diagnostic information providing guidance on failure recovery and safety considerations

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 27 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ab85ebe6-c0d2-4283-bf59-721f82611242

📥 Commits

Reviewing files that changed from the base of the PR and between ab51942 and 01d3734.

📒 Files selected for processing (4)
  • packages/opencode/src/session/run-incident/derive.ts
  • packages/opencode/src/session/run-incident/sanitize.ts
  • packages/opencode/src/session/run-observability/recorder.ts
  • packages/opencode/test/session/run-observability.test.ts
📝 Walkthrough

Walkthrough

This PR introduces a comprehensive run-incident framework that captures, analyzes, and reports terminal conditions during session execution. It derives incidents from collected observability evidence, computes recovery recommendations and user-facing summaries, sanitizes evidence with redaction rules, integrates incident collection into the run-observability recorder during execution, exports incidents as top-level snapshot diagnostics, and refines observability instrumentation in the event processor.

Changes

Run Incident Framework and Observability Integration

Layer / File(s) Summary
Run Incident Data Contract and Public API
packages/opencode/src/session/run-incident/types.ts, packages/opencode/src/session/run-incident/index.ts
Defines the complete incident data contract (evidence events, terminal causes, phases, facts, provenance, recovery decisions, and top-level RunIncident shape). Exposes a RunIncident namespace facade with schema version, derivation, policy, presentation, and sanitization functions alongside incident-related types.
Incident Derivation from Evidence
packages/opencode/src/session/run-incident/derive.ts
Converts IncidentEvidenceEvent streams into materialized RunIncident objects. Selects terminal evidence by cause presence and timestamp ordering, derives facts with optional attempt scoping, computes recovery/summaries, builds provenance/lifecycle information, and sanitizes payloads. Includes helpers for event ordering, fact extraction (with conditional unsafe-side-effect exposure), tool boundary summarization, phase mapping, and provider transport-disconnect cause construction.
Incident Recovery Policy and Presentation
packages/opencode/src/session/run-incident/policy.ts, packages/opencode/src/session/run-incident/presentation.ts
recoveryFor computes RecoveryDecision from terminal cause and facts by branching on cause categories and lifecycle/side-effect/tool-execution progress. userSummary builds i18n-keyed presentation with action and severity mapping; plainSummary returns English cause-specific strings. Includes internal helpers for action-key and severity mapping.
Incident Evidence Sanitization and Bounding
packages/opencode/src/session/run-incident/sanitize.ts
sanitizeIncident clones and rebinds evidence via boundEvidence, selecting orders from early/late slices and terminal/cleanup windows, truncating to export limit, and appending evidence_omitted markers. sanitizeEvidence redacts redactions array items via rules that trim, normalize empty values, redact URLs/paths and credentials, and normalize disallowed characters.
Run-Observability Recorder: Evidence Collection and Incident Derivation
packages/opencode/src/session/run-observability/recorder.ts, packages/opencode/src/session/run-observability/types.ts
Recorder extends state tracking tool-input/tool-call phases and interruptions. Adds appendEvidence helper to accumulate RunIncident.EvidenceEvents with monotonic ordering. Updates lifecycle handlers (beginAttempt, recordProviderProgress, recordVisibleOutput, recordToolInputStarted/Completed/Materialized, tool execution/failure/interrupt flows, recordScopeClosed) to emit evidence. Rewrites finalize to derive incident via RunIncident.derive, compute classification and summary_key from terminal cause via helper functions, and include incident in finalized Summary. Type contracts updated: AttemptSummary gains tool-phase and incident fields; Recorder interface gains new methods and kind parameter for visible output.
Processor Observability Instrumentation Updates
packages/opencode/src/session/processor.ts
handleEvent now records visible output with distinct kind values (text vs reasoning) and replaces consolidated tool-call tracking with separate tool-input start/end records (including providerExecuted) and tool-call materialization. Cleanup interruption handling branches on tool part state.status to emit either recordToolInterrupted or recordPendingToolPartInterrupted.
Snapshot Export: Incident Collection and Sanitization
packages/opencode/src/session/export.ts
Extends Export.Snapshot["diagnostics"] with optional run_incident_schema_version and run_incidents. Updates Export.deriveSnapshotDiagnostics to collect incidents from run-observability diagnostics and emit them with schema version. During sanitization, sanitizeDiagnostics populates incident schema/version and sanitizes payloads via RunIncident.sanitize; sanitizeRunObservability also sanitizes nested incident field.
Comprehensive Test Coverage
packages/opencode/test/session/export.test.ts, packages/opencode/test/session/run-observability.test.ts
export.test.ts updates fixtures with tool-phase flags and adds sanitizeSnapshot test validating incident schema, terminal cause, evidence details, and sensitive-data redaction. run-observability.test.ts significantly expands coverage: provider-transport incident derivation, evidence bounding/omission, terminal-cause ordering, provenance precedence, terminal-vs-run-level facts separation, recovery behaviors for materialized tool calls (safe/unsafe/unknown with confirmation), boundary persistence across attempts, and tool-name sanitization in evidence.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • Astro-Han/pawwork#802: Directly implements the lifecycle-causality diagnostics and incident-chain export framework requested by this issue.
  • Astro-Han/pawwork#808: Implements the concrete RunIncident framework (types, derive, sanitize, recorder integration, export changes) that realizes the design goals described in the issue.

Possibly related PRs

  • Astro-Han/pawwork#563: Both PRs modify export.ts's deriveSnapshotDiagnostics/sanitizeDiagnostics to extend Export.Snapshot["diagnostics"] with new nested fields (run_incidents vs aborts/title_generations), so changes are directly connected at the export-schema and sanitization code level.
  • Astro-Han/pawwork#264: Both PRs populate Snapshot.diagnostics by traversing the session tree; main PR adds run_incidents extraction from run-observability, while retrieved PR adds loop.last derivation from loop-gate metadata, making them directly connected in the same export/diagnostics derivation pathway.

Poem

🐰 Behold the incident framework hops into place,
Evidence trails guide recovery's embrace,
With facts and phases, each terminal cause found,
The recorder leaps onward, diagnostics abound!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(session): add run incident diagnostics' clearly and concisely describes the main change—introducing run incident diagnostics to the session module. It follows Conventional Commits format and directly reflects the primary objective of this PR.
Description check ✅ Passed The description is comprehensive and well-structured. It includes all required sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, and a completed Checklist with most items ticked. All essential information is present and properly formatted.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pawwork/issue-808-phase1-pr1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels May 21, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@Astro-Han Astro-Han added enhancement New feature or request P1 High priority and removed P2 Medium priority labels May 21, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive 'Run Incident' diagnostic system to track and report failures during session execution. It adds granular tracking for tool input generation and materialization, a new RunIncident module for deriving and sanitizing incident data, and logic for recovery recommendations. Feedback includes restoring state-downgrade guards in the recorder, removing redundant methods, preserving sanitized tool names in evidence for better diagnostics, and ensuring all defined stream phases are handled in the derivation logic.

Comment thread packages/opencode/src/session/run-observability/recorder.ts
Comment thread packages/opencode/src/session/run-observability/recorder.ts Outdated
Comment thread packages/opencode/src/session/run-incident/derive.ts Outdated
Comment thread packages/opencode/src/session/run-incident/derive.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/opencode/src/session/run-observability/recorder.ts (1)

44-44: 💤 Low value

Unused variable: toolExecutionCompleted is set but never read.

This variable is assigned in recordToolCompleted (line 256) but is never used in finalize() or elsewhere. The completion state is already captured in evidence events, so this flag may be dead code.

Consider removing if unneeded, or add a TODO if it's intended for future use.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/run-observability/recorder.ts` at line 44, The
variable toolExecutionCompleted is assigned but never read—remove the dead
variable declaration and its assignments (references in recordToolCompleted)
since completion is already captured by evidence events, or if you intend to
keep it for future use add a clear TODO comment where toolExecutionCompleted is
declared and where it’s set in recordToolCompleted and ensure finalize() or
other consumers will use it; update/remove any related tests or code paths that
reference toolExecutionCompleted to keep the module consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/run-incident/derive.ts`:
- Around line 140-150: streamPhase mapping mislabels provider progress and omits
"reasoning_generation"; update the ternary in phaseFor so
input.facts.provider_progress_seen maps to "reasoning_generation" (replace the
current "before_first_provider_progress" value) and keep the other branches the
same; reference the streamPhase variable and input.facts.provider_progress_seen
when making this single-value change so incident stream diagnostics include
"reasoning_generation".

In `@packages/opencode/src/session/run-incident/sanitize.ts`:
- Around line 27-33: selectedEvents is naively sliced from the front which can
drop explicitly-selected terminal/cleanup anchors; change the capping logic so
that when selectedEvents.length > MAX_EXPORTED_EVIDENCE you always preserve
events whose anchor indicates terminal/cleanup: compute anchors =
selectedEvents.filter(e => e.anchor === 'terminal' || e.anchor === 'cleanup'),
then build the final selectedEvents by taking the newest non-anchor items (from
the end) up to cap - anchors.length and append the anchors (dedupe by order),
update selectedOrders/omitted accordingly so omitted only contains truly
unselected events; use the existing symbols selectedEvents,
MAX_EXPORTED_EVIDENCE, selectedOrders, omitted and evidence to implement this
replacement.

In `@packages/opencode/test/session/run-observability.test.ts`:
- Around line 479-487: The assertion for summary.incident?.phase is
inconsistent: since the terminal attempt (second.attemptID) has provider
progress recorded earlier, change the expected stream_phase from
"before_first_provider_progress" to a phase that reflects progress during the
terminal attempt (e.g., "during_provider_progress") so the phase assertion
matches the recorded terminal-attempt events; keep terminal_attempt_id:
second.attemptID unchanged and ensure the new value aligns with the event names
used elsewhere in the test.

---

Nitpick comments:
In `@packages/opencode/src/session/run-observability/recorder.ts`:
- Line 44: The variable toolExecutionCompleted is assigned but never read—remove
the dead variable declaration and its assignments (references in
recordToolCompleted) since completion is already captured by evidence events, or
if you intend to keep it for future use add a clear TODO comment where
toolExecutionCompleted is declared and where it’s set in recordToolCompleted and
ensure finalize() or other consumers will use it; update/remove any related
tests or code paths that reference toolExecutionCompleted to keep the module
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 28cde99c-d83a-4c26-bb1d-9b248b5c754c

📥 Commits

Reviewing files that changed from the base of the PR and between 220659f and ab51942.

📒 Files selected for processing (12)
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/run-incident/derive.ts
  • packages/opencode/src/session/run-incident/index.ts
  • packages/opencode/src/session/run-incident/policy.ts
  • packages/opencode/src/session/run-incident/presentation.ts
  • packages/opencode/src/session/run-incident/sanitize.ts
  • packages/opencode/src/session/run-incident/types.ts
  • packages/opencode/src/session/run-observability/recorder.ts
  • packages/opencode/src/session/run-observability/types.ts
  • packages/opencode/test/session/export.test.ts
  • packages/opencode/test/session/run-observability.test.ts

Comment thread packages/opencode/src/session/run-incident/derive.ts Outdated
Comment thread packages/opencode/src/session/run-incident/sanitize.ts Outdated
Comment thread packages/opencode/test/session/run-observability.test.ts
@Astro-Han Astro-Han merged commit fda7f5f into dev May 21, 2026
24 checks passed
@Astro-Han Astro-Han deleted the pawwork/issue-808-phase1-pr1 branch May 21, 2026 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant