Skip to content

fix(session): refine transport interruption phases#838

Merged
Astro-Han merged 8 commits into
devfrom
pawwork/issue-803-run-diagnostics-2
May 21, 2026
Merged

fix(session): refine transport interruption phases#838
Astro-Han merged 8 commits into
devfrom
pawwork/issue-803-run-diagnostics-2

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 21, 2026

Copy link
Copy Markdown
Owner

Summary

  • Refines RunIncident provider transport diagnostics when streaming interrupts after tool execution has started.
  • Tracks attempt-level tool execution completion so transport disconnects after a tool result derive after_tool_result and post_tool instead of a pre-execution phase.
  • Adds regression coverage for post-materialization transport failures without changing recovery UX/API behavior.

Why

Issue #803 requires provider/transport streaming interruptions to distinguish tool-call generation from tool execution. The landed Run Incident Framework already fixed partial tool-input disconnects; this follow-up closes the remaining phase gap where a transport failure after tool execution could still look like after_tool_call_before_execution.

Related Issue

Closes #803.
Part of #808.

Human Review Status

Pending

Review Focus

  • Whether transport failures after tool_execution_started no longer report a pre-execution provider phase.
  • Whether tool_execution_completed is tracked at attempt scope and only upgrades completed-tool transport failures to after_tool_result / post_tool.
  • Whether the change remains diagnostic-only and does not alter [Feature] Add safe recovery for interrupted streaming runs #804 recovery UX/API/idempotency behavior.

Risk Notes

Latest review fix: terminal phase derivation now ignores same-attempt evidence recorded after the terminal event, and processor stream/tool execution diagnostics stay bound to the stream event or tool-call attempt instead of a later current attempt.

Latest follow-up review fix: halt/interrupt transport failures now record against the process-local failing attempt instead of mutable currentAttemptID, and same-monotonic phase truncation uses evidence order as a tie-breaker.

Latest tie-break review fix: terminal phase ordering now treats monotonic time as primary, uses evidence order only when monotonic times are equal, and transport cause facts are derived at the failure timestamp.

Latest lifecycle terminal review fix: terminal phase derivation now applies terminal-time evidence truncation even when the terminal event has no attempt_id, so lifecycle closes cannot be upgraded by later tool completion evidence.

  • Diagnostic-only change: no recovery card, Continue/Resume API, auto-retry, provider SDK behavior, or user-visible recovery copy changed.
  • Privacy posture is unchanged: no raw prompt, raw tool input, raw provider payload, local paths, or secrets are added to incidents or exports.
  • No visible UI changed, so no snap/manual UI check was run.
  • Platform impact is limited to diagnostic classification data in opencode session observability.

How To Verify

Red/green TDD:
- New regression for transport failure after tool execution start failed first because it was classified as after_tool_call_before_execution, then passed after the fix.
- New regression for transport failure after tool execution completion failed first because it was classified as unknown_stream_phase, then passed after the fix.

Final targeted checks:
- bun test test/session/run-observability.test.ts test/session/processor-effect.test.ts --timeout 30000 → 68 pass, 0 fail
- bun run typecheck from packages/opencode → passed
- git diff --check → passed

Screenshots or Recordings

Not applicable — no visible UI change.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 54 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d0a79dc2-9816-4f67-9e3b-f46c4a10ca8f

📥 Commits

Reviewing files that changed from the base of the PR and between 7742021 and d5c859c.

📒 Files selected for processing (6)
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/run-incident/derive.ts
  • packages/opencode/src/session/run-observability/recorder.ts
  • packages/opencode/src/session/run-observability/types.ts
  • packages/opencode/test/session/processor-effect.test.ts
  • packages/opencode/test/session/run-observability.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pawwork/issue-803-run-diagnostics-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the harness Model harness, prompts, tool descriptions, and session mechanics label May 21, 2026
@Astro-Han Astro-Han added bug Something isn't working P2 Medium priority labels May 21, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tracking for the completion of tool execution within the observability and incident derivation modules. Key changes include adding a tool_execution_completed flag to the attempt state and summary types, updating the recorder to capture this state, and refining the incident phase derivation logic to correctly categorize events occurring after tool execution. Additionally, new test cases verify that transport failures are accurately classified based on whether tool execution has started or completed. I have no feedback to provide.

@Astro-Han

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Astro-Han Astro-Han merged commit b320787 into dev May 21, 2026
27 checks passed
@Astro-Han Astro-Han deleted the pawwork/issue-803-run-diagnostics-2 branch May 21, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Refine run diagnostics for interrupted streaming tool calls

1 participant