fix: allow safe retry before provider progress#914
Conversation
|
Warning Review limit reached
More reviews will be available in 53 minutes and 2 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes a production bug where safe retries were blocked by unknown exposed-tool boundaries even when no tools were called. It adds an early eligibility path in the incident retry policy that allows automatic retry for retryable transport failures occurring before first provider progress when no output or tool activity has been observed, regardless of incomplete boundary metadata. ChangesBefore-progress auto-retry eligibility
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces logic to allow automatic retries before the first provider progress occurs, provided there has been no visible output or tool activity. It adds the helper function canAutoRetryBeforeFirstProviderProgress along with several supporting checks in policy.ts, and updates the recovery policy evaluation order. Additionally, comprehensive unit tests have been added in run-observability.test.ts to validate these new retry conditions and ensure they fail closed when output, tool activity, or external boundaries are detected. There are no review comments, so I have no further feedback to provide.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opencode/test/session/run-observability.test.ts (1)
1067-1105: ⚡ Quick winBoundary fail-closed coverage is currently confounded by incomplete-facts fallback.
In Line 1068 case setup, provider/external-boundary scenarios inherit
side_effect_facts_complete: false, so these assertions can pass without proving boundary evidence itself blocks auto-retry. Add variants withside_effect_facts_complete: trueto lock this behavior directly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/test/session/run-observability.test.ts` around lines 1067 - 1105, The provider/external-boundary cases in the test rely on beforeProgressFacts() defaulting to side_effect_facts_complete: false, so the assertions don't prove boundary evidence itself blocks auto-retry; update the test by adding variants for the two boundary cases that set side_effect_facts_complete: true inside the overrides (i.e., add entries where the override includes side_effect_facts_complete: true alongside side_effect_boundary_snapshot with provider_executed_capability_present or external_boundary_present and their proof_reason) and then call recoveryForBeforeProgress(overrides) and assert the decision does not match { recommendation: "auto_retry_once" } to lock the behavior when facts are complete.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/session/run-incident/policy.ts`:
- Around line 27-47: The before-progress causes were excluded only from the
reasoning-only branch but not from the generic transport auto-retry branch,
allowing auto_retry_once to be returned incorrectly; update the transport
auto-retry condition (the branch that currently checks retryableTransport &&
noToolActivity && !isBeforeFirstProviderProgressCause(input.cause) &&
terminalFacts.reasoning_output_started — or if that exact check is in a
different branch, add the missing guard) to explicitly check for and exclude
before-first-provider-progress causes using
isBeforeFirstProviderProgressCause(input.cause) (or reuse
canAutoRetryBeforeFirstProviderProgress) so that any branch that returns
auto_retry_once also requires the before-progress gate to have passed.
---
Nitpick comments:
In `@packages/opencode/test/session/run-observability.test.ts`:
- Around line 1067-1105: The provider/external-boundary cases in the test rely
on beforeProgressFacts() defaulting to side_effect_facts_complete: false, so the
assertions don't prove boundary evidence itself blocks auto-retry; update the
test by adding variants for the two boundary cases that set
side_effect_facts_complete: true inside the overrides (i.e., add entries where
the override includes side_effect_facts_complete: true alongside
side_effect_boundary_snapshot with provider_executed_capability_present or
external_boundary_present and their proof_reason) and then call
recoveryForBeforeProgress(overrides) and assert the decision does not match {
recommendation: "auto_retry_once" } to lock the behavior when facts are
complete.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b027bff3-d429-4c58-8165-d75f5e3b6404
📒 Files selected for processing (2)
packages/opencode/src/session/run-incident/policy.tspackages/opencode/test/session/run-observability.test.ts
Summary
Why
Issue #912 showed a production session where the provider returned
Service Unavailablebefore any provider progress, assistant output, tool input, tool call, or tool execution. PawWork still asked the user before retrying because unknown exposed tool metadata made the static side-effect boundary proof incomplete. Retry safety should be based on what the terminal attempt actually reached; unknown metadata for uncalled tools should not block this early safe-retry case.Related Issue
Closes #912
Human Review Status
Pending— waiting for a human reviewer to approve.Review Focus
canAutoRetryBeforeFirstProviderProgresspredicate inpackages/opencode/src/session/run-incident/policy.ts, especially the fail-closed guards for missing snapshots, global lifecycle/user cancel, provider/external boundaries, and contradictory output/tool facts.packages/opencode/test/session/run-observability.test.tsto ensure the fast path only applies to truly early no-output/no-tool failures.Risk Notes
The behavior change is intentionally narrow, but it affects retry policy for model transport failures. The main risk is over-retrying if evidence is incomplete; the fast path requires a recorded boundary snapshot and explicit absence of provider/external boundaries to mitigate that.
Skipped conditional checklist items:
How To Verify
Screenshots or Recordings
Not applicable; no visible UI changes.
Checklist
bug,enhancement,task,documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.app,ui,platform,harness,ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.P0,P1,P2,P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.Pending,Approved by @<reviewer>, orNot required: <reason>(default isPending; "not required" is restricted to bot-authored low-risk PRs).dev, and my PR title and commit messages use Conventional Commits in English.Summary by CodeRabbit
Bug Fixes
Tests