feat(session): add run observability diagnostics by Astro-Han · Pull Request #788 · Astro-Han/pawwork

Astro-Han · 2026-05-20T12:58:11Z

Summary

Add a run-level observability summary beside existing llm_trace diagnostics for session exports. This PR introduces a small diagnostic spine that records provider progress, visible output, tool-call/tool-execution facts, terminal failure classification, and retry-safety facts without changing retry behavior or user-facing UI.

Why

#783 needs recurring terminated / UND_ERR_SOCKET and local 已中断 failures to be debuggable from exports instead of appearing as opaque failures. This PR is the first diagnostic foundation: it distinguishes success, external stream disconnects, unknown local scope closes, setup failures, and tool failures, while keeping exported diagnostics bounded and safe.

Related Issue

Addresses part of #783. It does not close #783; follow-ups remain for watchdog/setup taxonomy wiring, lifecycle action provenance, and broader deterministic harness coverage.

Human Review Status

Pending

Review Focus

Whether run_observability captures only safe control-flow/failure facts and no prompt/tool/body/path content.
Whether success, stream disconnect, scope-close, setup, and tool classifications have conservative retry-safety semantics.
Whether export projection and sanitizeSnapshot preserve useful safe evidence like UND_ERR_SOCKET without leaking sensitive data.

Risk Notes

Diagnostic-only change: no automatic retry, UI copy, provider credentials, or runtime replacement.
Generated SDK type changed because the assistant diagnostics schema gained run_observability?: unknown.
Visible UI/copy check skipped: no visible UI or user-facing copy changed.
Platform/packaging check skipped: no macOS/Windows packaging, updater, signing, shell, or permissions surface changed.

How To Verify

Run observability + export tests: bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — 50 pass, 0 fail
Typecheck: bun run typecheck from packages/opencode — passed
Whitespace check: git diff --check — passed
SDK schema generation: bun run --cwd ../../packages/sdk/js build from packages/opencode — passed after replacing z.custom with schema-safe z.any()

Screenshots or Recordings

Not applicable — no visible UI changes.

Checklist

How to use this checklist:

Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.

The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.

Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.

coderabbitai · 2026-05-20T12:58:19Z

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 45 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 4a092be0-de80-4903-b758-082db5acf5e7

📥 Commits

Reviewing files that changed from the base of the PR and between aac7f10 and 3a00bf0.

⛔ Files ignored due to path filters (1)

packages/sdk/js/src/v2/gen/types.gen.ts is excluded by !**/gen/**

📒 Files selected for processing (10)

packages/opencode/src/session/export.ts
packages/opencode/src/session/message-v2.ts
packages/opencode/src/session/processor.ts
packages/opencode/src/session/prompt.ts
packages/opencode/src/session/run-observability/index.ts
packages/opencode/src/session/run-observability/recorder.ts
packages/opencode/src/session/run-observability/sanitize.ts
packages/opencode/src/session/run-observability/types.ts
packages/opencode/test/session/export.test.ts
packages/opencode/test/session/run-observability.test.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pawwork/issue-783-run-observability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

gemini-code-assist

Code Review

This pull request introduces a comprehensive run observability system designed to track, classify, and sanitize the execution of LLM runs and tool calls. It implements a RunObservability module that records provider progress, visible output, and tool lifecycle events to provide automated retry safety recommendations. Feedback from the review highlights the need to add a "success" state to the Classification enum and recorder logic to prevent successful runs from being mislabeled as failures. Additionally, the reviewer recommended replacing hardcoded tool names with centralized constants to adhere to naming conventions.

feat(session): trace lifecycle close provenance Adds the second #783 run-observability slice after PR #788: local instance lifecycle closes now carry bounded parent provenance instead of stopping at generic scope-close diagnostics. Change boundary: - add local_instance_reload and local_instance_dispose run-observability classifications - record bounded lifecycle action IDs for InstanceStore.reload, dispose, disposeDirectory, and disposeAll - propagate lifecycle action metadata through SessionRunState interrupts and processor run diagnostics - keep disposeAll fan-out under one parent action across affected in-flight runs - harden review feedback by using a per-directory action stack for overlapping lifecycle operations and by capturing processor directory from InstanceState.context instead of static Instance.directory Verification: - bun test test/session/run-observability.test.ts test/session/run-state.test.ts test/session/export.test.ts --timeout 30000 (54 pass, 0 fail) - bun run typecheck from packages/opencode - git diff --check - PR CI green, including unit-opencode, typecheck, CodeQL, desktop-smoke, and e2e-artifacts - review threads resolved: 0 unresolved Notes: - Diagnostic-only change. No retry policy, provider credential, user-facing copy, UI, packaging, or release behavior changed. - #721, #754, and #755 remain separate behavior/follow-up investigations; this PR only improves causal exports for local lifecycle closes.

Astro-Han · 2026-05-21T03:14:05Z

Back-reference from #808.

This merged run-observability PR is the immediate foundation for the Run Incident Framework. #808 builds on the run facts added here and turns them into structured incident cause/phase/policy/export semantics.

Add the first #808 RunIncident diagnostic/export layer so provider transport disconnects, partial tool input interruptions, cleanup/finalizer evidence, and materialized-but-not-executed tool boundaries are derived from ordered append-only evidence instead of mutable summary overwrites. This keeps the PR diagnostic-only: it adds structured sanitized run_incidents export while preserving legacy classification, summary_key, and retry_safety compatibility, without adding recovery UI, retry behavior, or provider SDK changes. Verification: - bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — passed - bun run typecheck — passed - git diff --check — passed - PR CI for #812 — all checks passed Review follow-ups: - Preserved newest terminal/cleanup anchors when bounded evidence exceeds the export cap. - Aligned stream phase derivation with provider progress and terminal cause. - Resolved all Gemini and CodeRabbit review threads. Related: #808, #803, #788, #794

Astro-Han added enhancement New feature or request P2 Medium priority harness Model harness, prompts, tool descriptions, and session mechanics tech-debt Supplemental cleanup, maintainability, architecture, test, or quality debt context labels May 20, 2026

github-actions Bot reviewed May 20, 2026

View reviewed changes

Astro-Han removed the tech-debt Supplemental cleanup, maintainability, architecture, test, or quality debt context label May 20, 2026

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Comment thread packages/opencode/src/session/run-observability/types.ts

Comment thread packages/opencode/src/session/run-observability/recorder.ts

Comment thread packages/opencode/src/session/run-observability/recorder.ts

Comment thread packages/opencode/src/session/run-observability/sanitize.ts

Astro-Han added 3 commits May 20, 2026 21:32

feat(session): add run observability recorder

834c552

feat(session): persist run observability summaries

ac22914

feat(session): export run observability diagnostics

8dab8b8

Astro-Han force-pushed the pawwork/issue-783-run-observability branch from 54069e4 to 8dab8b8 Compare May 20, 2026 13:33

Astro-Han added 4 commits May 20, 2026 21:44

fix(session): classify successful run diagnostics

4e1027b

fix(session): preserve safe run error fingerprints

914a045

refactor(session): name run observability tool constants

466fde0

fix(session): tighten run observability event facts

3a00bf0

Astro-Han merged commit 120fea0 into dev May 20, 2026
27 checks passed

Astro-Han deleted the pawwork/issue-783-run-observability branch May 20, 2026 15:36

Astro-Han mentioned this pull request May 20, 2026

feat(session): trace lifecycle close provenance #794

Merged

13 tasks

This was referenced May 21, 2026

[Feature] Add lifecycle causality diagnostics for interrupted runs #802

Closed

[Feature] Define Run Incident Framework for interrupted runs #808

Closed

This was referenced May 21, 2026

[Task] Track harness improvement series #195

Closed

feat(session): add run incident diagnostics #812

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): add run observability diagnostics#788

feat(session): add run observability diagnostics#788
Astro-Han merged 7 commits into
devfrom
pawwork/issue-783-run-observability

Astro-Han commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Astro-Han commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Astro-Han commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading