feat(session): add run observability diagnostics#788
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (10)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive run observability system designed to track, classify, and sanitize the execution of LLM runs and tool calls. It implements a RunObservability module that records provider progress, visible output, and tool lifecycle events to provide automated retry safety recommendations. Feedback from the review highlights the need to add a "success" state to the Classification enum and recorder logic to prevent successful runs from being mislabeled as failures. Additionally, the reviewer recommended replacing hardcoded tool names with centralized constants to adhere to naming conventions.
54069e4 to
8dab8b8
Compare
feat(session): trace lifecycle close provenance Adds the second #783 run-observability slice after PR #788: local instance lifecycle closes now carry bounded parent provenance instead of stopping at generic scope-close diagnostics. Change boundary: - add local_instance_reload and local_instance_dispose run-observability classifications - record bounded lifecycle action IDs for InstanceStore.reload, dispose, disposeDirectory, and disposeAll - propagate lifecycle action metadata through SessionRunState interrupts and processor run diagnostics - keep disposeAll fan-out under one parent action across affected in-flight runs - harden review feedback by using a per-directory action stack for overlapping lifecycle operations and by capturing processor directory from InstanceState.context instead of static Instance.directory Verification: - bun test test/session/run-observability.test.ts test/session/run-state.test.ts test/session/export.test.ts --timeout 30000 (54 pass, 0 fail) - bun run typecheck from packages/opencode - git diff --check - PR CI green, including unit-opencode, typecheck, CodeQL, desktop-smoke, and e2e-artifacts - review threads resolved: 0 unresolved Notes: - Diagnostic-only change. No retry policy, provider credential, user-facing copy, UI, packaging, or release behavior changed. - #721, #754, and #755 remain separate behavior/follow-up investigations; this PR only improves causal exports for local lifecycle closes.
Add the first #808 RunIncident diagnostic/export layer so provider transport disconnects, partial tool input interruptions, cleanup/finalizer evidence, and materialized-but-not-executed tool boundaries are derived from ordered append-only evidence instead of mutable summary overwrites. This keeps the PR diagnostic-only: it adds structured sanitized run_incidents export while preserving legacy classification, summary_key, and retry_safety compatibility, without adding recovery UI, retry behavior, or provider SDK changes. Verification: - bun test test/session/run-observability.test.ts test/session/export.test.ts --timeout 30000 — passed - bun run typecheck — passed - git diff --check — passed - PR CI for #812 — all checks passed Review follow-ups: - Preserved newest terminal/cleanup anchors when bounded evidence exceeds the export cap. - Aligned stream phase derivation with provider progress and terminal cause. - Resolved all Gemini and CodeRabbit review threads. Related: #808, #803, #788, #794
Summary
Add a run-level observability summary beside existing
llm_tracediagnostics for session exports. This PR introduces a small diagnostic spine that records provider progress, visible output, tool-call/tool-execution facts, terminal failure classification, and retry-safety facts without changing retry behavior or user-facing UI.Why
#783 needs recurring
terminated/UND_ERR_SOCKETand local已中断failures to be debuggable from exports instead of appearing as opaque failures. This PR is the first diagnostic foundation: it distinguishes success, external stream disconnects, unknown local scope closes, setup failures, and tool failures, while keeping exported diagnostics bounded and safe.Related Issue
Addresses part of #783. It does not close #783; follow-ups remain for watchdog/setup taxonomy wiring, lifecycle action provenance, and broader deterministic harness coverage.
Human Review Status
Pending
Review Focus
run_observabilitycaptures only safe control-flow/failure facts and no prompt/tool/body/path content.sanitizeSnapshotpreserve useful safe evidence likeUND_ERR_SOCKETwithout leaking sensitive data.Risk Notes
run_observability?: unknown.How To Verify
Screenshots or Recordings
Not applicable — no visible UI changes.
Checklist
bug,enhancement,task,documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.app,ui,platform,harness,ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.P0,P1,P2,P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.Pending,Approved by @<reviewer>, orNot required: <reason>(default isPending; "not required" is restricted to bot-authored low-risk PRs).dev, and my PR title and commit messages use Conventional Commits in English.