Skip to content

fix: scope reasoning safe retry timeouts by attempt#922

Merged
Astro-Han merged 7 commits into
devfrom
codex/i918-reasoning-safe-retry-timeouts
May 26, 2026
Merged

fix: scope reasoning safe retry timeouts by attempt#922
Astro-Han merged 7 commits into
devfrom
codex/i918-reasoning-safe-retry-timeouts

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Scope reasoning-model safe-retry connect timeouts by processor attempt instead of changing the global reasoning stream timeout.

This PR also updates the safe-recovery retry/failure copy to the agreed short English and Chinese wording.

Why

Reasoning-model streams currently keep the global 120s connect watchdog everywhere. That protects legitimate slow starts, but it also makes users wait two minutes before PawWork can safely recover from a no-output/no-tool stalled connection.

#914 made the before-progress no-output/no-tool boundary safe to retry. This follow-up applies the #918 two-attempt strategy: fail fast on the first stalled reasoning-model main response, then protect the one automatic safe retry with the previous 120s ceiling.

Related Issue

Closes #918

Human Review Status

Pending

Review Focus

Please review the processor-level timeout merge order and the safe-recovery presentation predicate. The important invariant is that only the session processor main-response recovery flow gets 60s -> 120s; title generation and other non-recovery llm.stream() callers still use the global reasoning timeout.

Risk Notes

The intentional behavior change is that retryable no-output/no-tool failures that exhaust the one automatic recovery retry now render the safe_retry_failed notice instead of a generic assistant error.

Platform checklist left unticked: no platform, packaging, updater, signing, shell, or native path behavior changed.

How To Verify

Dependency setup: bun install --frozen-lockfile completed in this worktree.
RED check: processor attempt-timeout test first failed with [120000, 120000] before implementation.
Processor tests: bun test --timeout 30000 test/session/processor-effect.test.ts -> 30 passed.
Run observability/provider tests: bun test --timeout 30000 test/session/run-observability.test.ts test/provider/transform.test.ts -t "RunObservability|ProviderTransform.streamTimeouts" -> 62 passed.
UI contract: bun test --timeout 30000 src/components/session-safe-retry-contract.test.ts -> 3 passed.
Typecheck: bun run typecheck in packages/opencode -> passed.
Typecheck: bun run typecheck in packages/ui -> passed.
Lint: bun run lint -> passed.
Visual snap: bun run snap safe-retry -> 1 passed; grid written to docs/design/preview/screenshots/safe-retry.png.
Diff check: git diff --check -> passed.

Screenshots or Recordings

Safe-retry snap target was checked with bun run snap safe-retry.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

  • Bug Fixes

    • Updated safe-retry error messaging to clarify when models aren't responding across multiple languages.
    • Improved timeout handling for reasoning-capable models during recovery attempts.
    • Enhanced safe recovery mechanism to handle additional failure scenarios.
  • Tests

    • Extended test coverage for model retry and timeout behaviors.
    • Added localization verification for retry messaging.

Review Change Stack

@Astro-Han Astro-Han added bug Something isn't working P2 Medium priority ui Design system and user interface harness Model harness, prompts, tool descriptions, and session mechanics labels May 26, 2026
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 22 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f16de5e4-1c49-433b-b91a-4cbe5cb5a5df

📥 Commits

Reviewing files that changed from the base of the PR and between 1768085 and 7b4c99c.

📒 Files selected for processing (6)
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/run-incident/policy.ts
  • packages/opencode/src/session/run-observability/boundary.ts
  • packages/opencode/src/session/run-observability/index.ts
  • packages/opencode/test/session/processor-effect.test.ts
  • packages/ui/src/components/session-safe-retry-contract.test.ts
📝 Walkthrough

Walkthrough

This PR implements faster recovery from stalled reasoning-model connections by introducing per-attempt connection timeout tuning. Reasoning model first attempts use a shorter 60-second timeout to fail fast, while retry attempts use the longer 120-second timeout to protect legitimate slow starts. It broadens safe recovery conditions and updates observability to record per-attempt timeout values, with corresponding i18n messaging updates.

Changes

Reasoning model connect timeout recovery

Layer / File(s) Summary
Timeout constants and per-attempt selection logic
packages/opencode/src/provider/transform.ts, packages/opencode/src/session/processor.ts
Exports REASONING_GLOBAL_CONNECT_TIMEOUT_MS (120s), REASONING_FIRST_ATTEMPT_CONNECT_TIMEOUT_MS (60s), and REASONING_SAFE_RETRY_CONNECT_TIMEOUT_MS (120s). Implements attemptStreamTimeouts() to select per-attempt timeout based on automaticStreamRetriesUsed.
Observability types for per-attempt timeout capture
packages/opencode/src/session/run-observability/types.ts, packages/opencode/src/session/run-observability/recorder.ts
Extends AttemptSummary and BeginAttemptInput with optional connect_timeout_ms / connectTimeoutMs fields and updates recorder to persist the timeout value for each attempt.
Per-attempt timeout wiring into stream execution
packages/opencode/src/session/processor.ts
Computes sessionTimeouts during each attempt and wires the resolved connectTimeoutMs to both ctx.runTrace.beginAttempt() and llm.stream() for enforcement and observability.
Safe recovery condition broadening
packages/opencode/src/session/processor.ts
Introduces beforeProgressSafeRetry condition and combines it with reasoningOnlySafeRetry into safeRecoveryRetry, then updates retry status messaging and safe-retry-failed notice gating to depend on the broader condition.
Processor effect test coverage
packages/opencode/test/session/processor-effect.test.ts, packages/opencode/test/provider/transform.test.ts
Adds reasoning-model test fixture, refactors environment construction for per-test LLM injection, replaces reasoning watchdog timeout assertions with per-attempt timeout scoping verification, and rewrites two retry tests to expect safe_retry_failed notices instead of error propagation.
i18n and e2e validation
packages/ui/src/i18n/en.ts, packages/ui/src/i18n/zh.ts, packages/ui/src/i18n/zht.ts, packages/ui/src/components/session-safe-retry-contract.test.ts, packages/app/e2e/snap/safe-retry.snap.ts
Updates safe-retry messaging from "network connection dropped" framing to "model temporarily not responding" framing across all three languages, adds i18n content verification test, and updates e2e snapshot expectations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Astro-Han/pawwork#758: Both PRs modify reasoning-model streaming connect-timeout behavior and introduce per-attempt timeout tuning via ProviderTransform.streamTimeouts() and connectTimeoutMs values in provider and processor modules.
  • Astro-Han/pawwork#729: Both PRs modify LLM stream connection-timeout handling; this PR propagates per-attempt connectTimeoutMs into llm.stream() and observability, while the related PR changes when the connect-timeout timer is armed.
  • Astro-Han/pawwork#558: Both PRs extend LLM stream connectTimeoutMs handling; the related PR adds connect-timeout enforcement in session/llm.ts, while this PR updates retry logic to pass per-attempt connectTimeoutMs and record it.

Poem

🐇 Fast connections for the swift and wise,
Shorter waits on first attempt rise,
Then longer timeouts for safe retry's grace—
Models' breath returns at a measured pace.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: scoping reasoning safe retry timeouts by attempt rather than globally.
Description check ✅ Passed The description is comprehensive with all required sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, and a complete checklist with most items ticked.
Linked Issues check ✅ Passed The PR successfully implements the core requirements from #918: scopes reasoning-model connect timeouts by attempt (60s first attempt, 120s retry), preserves legitimate slow starts, limits scope to #914 recovery boundary, updates UI copy, and provides comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the stated objective: timeout constants, attempt-scoped timeout logic, observability recording, and agreed UI/copy updates. No unrelated refactors or dependencies introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/i918-reasoning-safe-retry-timeouts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the app Application behavior and product flows label May 26, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the connection timeout logic for reasoning-capable models, introducing a fast-fail timeout of 60 seconds on the first attempt and a longer 120-second timeout on the safe retry. It also updates the copy for safe retries and failures to be less technical across English, Simplified Chinese, and Traditional Chinese. The reviewer feedback suggests extending the contract tests to also assert the Traditional Chinese (zht) translation changes, ensuring they do not regress.

Comment thread packages/ui/src/components/session-safe-retry-contract.test.ts
Comment thread packages/ui/src/components/session-safe-retry-contract.test.ts
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown

Perf delta summary

Comparator: pass

Profile / Scenario interaction median interaction worst long task max tbt frame gap p95 frame gap max jank count cls status
default / homepage-cold 40 -> 24 (-16) 40 -> 56 (+16) 68 -> 71 (+3) 18 -> 21 (+3) 33.3 -> 16.8 (-16.5) 150.1 -> 150 (-0.1) 3 -> 3 (0) 0 -> 0 (0) pass
default / long-session-input-lag 48 -> 48 (0) 48 -> 56 (+8) 0 -> 0 (0) 0 -> 0 (0) 16.7 -> 16.7 (0) 16.7 -> 16.8 (+0.1) 0 -> 0 (0) 0 -> 0 (0) pass
default / session-streaming-long 48 -> 48 (0) 64 -> 64 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.7 (-0.1) 33.4 -> 33.3 (-0.1) 0 -> 0 (0) 0 -> 0 (0) pass
default / tool-call-expand 40 -> 40 (0) 40 -> 40 (0) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.7 (-0.1) 16.8 -> 16.7 (-0.1) 0 -> 0 (0) 0 -> 0 (0) pass
default / tool-default-open-heavy-bash 24 -> 16 (-8) 24 -> 16 (-8) 84 -> 102 (+18) 47 -> 68 (+21) 50 -> 50.1 (+0.1) 116.7 -> 133.3 (+16.6) 3 -> 4 (+1) 0.004 -> 0.004 (0) pass
default / terminal-side-panel-open 48 -> 48 (0) 48 -> 48 (0) 0 -> 0 (0) 0 -> 0 (0) 33.3 -> 33.4 (+0.1) 33.4 -> 33.4 (0) 0 -> 0 (0) 0 -> 0 (0) pass
default / session-scroll-reading 16 -> 16 (0) 32 -> 16 (-16) 0 -> 0 (0) 0 -> 0 (0) 16.8 -> 16.7 (-0.1) 16.8 -> 16.7 (-0.1) 0 -> 0 (0) 0 -> 0 (0) pass

@Astro-Han Astro-Han merged commit 9d1b7a8 into dev May 26, 2026
27 checks passed
Astro-Han added a commit that referenced this pull request May 26, 2026
## Summary

- Added a `retry-decision` metadata layer for model execution retry decisions.
- Kept technical retryability, safety-gate output, stream attempts, safe-recovery replay budget, timeout policy, and presentation as separate facts.
- Lightly wired `session.processor` to consume the new decision metadata without moving the existing retry loop or changing #922 behavior.
- Addressed review feedback by using `modelStreamAttempt` instead of provider-budget wording and by representing blocked-boundary reasoning attempts as `reasoning_global_protected`.

## Verification

- `bun test test/session/retry-decision.test.ts` -> 6 passed.
- `bun test test/session/retry.test.ts` -> 32 passed.
- `bun test test/session/processor-effect.test.ts` -> 33 passed.
- `bun run typecheck` in `packages/opencode` -> passed.
- PR CI passed: typecheck, lint, unit-app, unit-ui, unit-opencode, unit-desktop, codeql, dependency-review, desktop-smoke, e2e-artifacts, commit/title/triage checks.

## Related

Part of #925.
Astro-Han added a commit that referenced this pull request May 26, 2026
## Summary
- Extract safe-recovery replay safety into RunIncident.evaluateReplaySafety so the gate owns only replay permission, recovery mode, and blocked reason.
- Move side-effect boundary snapshot construction into RunObservability and keep processor retry scheduling unchanged.
- Keep retry attempt metadata and presentation mapping in retry-decision, preserving #922 safe-retry behavior.

## Verification
- bun test test/session/run-incident-safety-gate.test.ts test/session/retry-decision.test.ts: 10 passed
- bun test test/session/run-observability-boundary.test.ts test/session/run-incident-safety-gate.test.ts test/session/retry-decision.test.ts: 11 passed
- bun test test/session/run-observability.test.ts: 63 passed
- bun test test/session/processor-effect.test.ts: 33 passed
- bun run typecheck: ok
- PR CI: green, including unit-opencode and smoke-macos-arm64

## Review Follow-up
- Addressed Gemini import/test coverage comments.
- Addressed P3 layer-boundary feedback by keeping presentation and attempt kind outside the safety gate.

Closes part of #925.
Astro-Han added a commit that referenced this pull request May 26, 2026
Summary:\n- Add a dedicated safe-recovery retry policy with separate one-replay budget metadata and lightweight retry presentation.\n- Route session.processor safe-recovery status/backoff scheduling through that policy instead of local retry status and sleep mechanics.\n- Preserve #922 reasoning timeout behavior, lifecycle-close checks, safe_retry_failed notice behavior, and visible-output replay guards.\n\nReview follow-up:\n- Removed the remaining processor-local sleep so the retry policy step owns the backoff delay.\n- Added policy-level coverage for the safe-recovery budget exhaustion branch and tightened it to assert schedule completion rather than generic failure.\n- Verified Cause.done exists in the current Effect runtime; the review concern was a false positive on API existence, but it exposed a useful test-strengthening gap.\n\nCI recovery note:\n- The initial empty retrigger commit did not create pull_request workflow runs. Two temporary marker commits were used to trigger the missing required checks; they leave no final PR diff and are intentionally collapsed by this squash merge.\n\nVerification:\n- TDD RED: safeRecoveryPolicy initially missing; focused test failed for that reason.\n- bun test test/session/retry.test.ts test/session/retry-decision.test.ts test/session/run-incident-safety-gate.test.ts: 44 passed.\n- bun test test/session/processor-effect.test.ts -t "reasoning connect watchdog is attempt-scoped for before-progress safe retry": passed.\n- bun test test/session/processor-effect.test.ts -t "reasoning-only retry writes a notice after the one safe retry is exhausted": passed.\n- bun test test/session/processor-effect.test.ts -t "connect timeout auto retry stops if lifecycle closes during backoff": passed.\n- bun test test/session/processor-effect.test.ts -t "retryable stream error after visible output does not replay the assistant message": passed before the final review follow-up.\n- bun run typecheck: passed.\n- git diff --check: passed.\n- PR CI required checks: passed after rerun.\n\nRelated:\n- Closes part of #925.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Application behavior and product flows bug Something isn't working harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority ui Design system and user interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Recover faster from stalled reasoning-model connections before safe retry

1 participant