fix(agents): retry same model across short rate-limit windows#91911
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 10, 2026, 10:30 AM ET / 14:30 UTC. Summary PR surface: Source +106, Tests +229. Total +335 across 5 files. Reproducibility: yes. from source: pass a rate-limit assistant error containing Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Retry the same model only when structured provider/runtime data or narrowly validated message text establishes a short throttle window, honor an available retry interval within a maintainer-approved bound, emit a distinct same-model-rate-limit observation, and retain immediate profile/model escalation for long-window or ambiguous quota failures. Do we have a high-confidence way to reproduce the issue? Yes, from source: pass a rate-limit assistant error containing Is this the best way to solve the issue? No, not in its current form. Same-model retry is a reasonable repair direction, but generic Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: reasoning high; reviewed against 4ecec2f9e2f8. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +106, Tests +229. Total +335 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Port the rate-limit same-model retry subset from internal commit b954516f29a. Provider RPM caps are commonly minute-scale, so wait out the current provider/model/profile window before spending a profile rotation or model fallback. The retry budget is bounded and the existing profile/fallback path runs once it is exhausted. (cherry picked from commit b954516f29ac39ac178c7bf0e0818443e8f183ca)
Port the same-model rate-limit backoff tuning from internal commit 8590a90bde0. Use three deterministic linear retries (10s, 20s, 30s) so the bounded retry budget spans roughly one RPM window while giving early recovery a faster first attempt. (cherry picked from commit 8590a90bde0586ceb50dfa4a3b19f5055329e98d)
1908d7f to
0841b69
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Maintainer fix pushed to this PR branch. Summary:
Verification:
GitHub reports this PR is mergeable and has no status checks attached to the current head. |
…aw#91911) Bound same-model rate-limit retries to explicit short-window signals or parsed short Retry-After values, honor Retry-After in the retry sleep, preserve zero-rotation fallback behavior, and record same-model rate-limit retries separately from profile rotations. Verification: - node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/assistant-failover.test.ts src/agents/embedded-agent-runner/run/helpers.test.ts - Azure Crabbox cbx_bdb5a7807a1f / coral-shrimp: OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed - .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
…aw#91911) Bound same-model rate-limit retries to explicit short-window signals or parsed short Retry-After values, honor Retry-After in the retry sleep, preserve zero-rotation fallback behavior, and record same-model rate-limit retries separately from profile rotations. Verification: - node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/assistant-failover.test.ts src/agents/embedded-agent-runner/run/helpers.test.ts - Azure Crabbox cbx_bdb5a7807a1f / coral-shrimp: OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed - .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
…aw#91911) Bound same-model rate-limit retries to explicit short-window signals or parsed short Retry-After values, honor Retry-After in the retry sleep, preserve zero-rotation fallback behavior, and record same-model rate-limit retries separately from profile rotations. Verification: - node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/assistant-failover.test.ts src/agents/embedded-agent-runner/run/helpers.test.ts - Azure Crabbox cbx_bdb5a7807a1f / coral-shrimp: OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed - .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
…aw#91911) Bound same-model rate-limit retries to explicit short-window signals or parsed short Retry-After values, honor Retry-After in the retry sleep, preserve zero-rotation fallback behavior, and record same-model rate-limit retries separately from profile rotations. Verification: - node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/assistant-failover.test.ts src/agents/embedded-agent-runner/run/helpers.test.ts - Azure Crabbox cbx_bdb5a7807a1f / coral-shrimp: OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed - .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main
Summary
Verification
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/helpers.test.ts src/agents/embedded-agent-runner/run/assistant-failover.test.tsnode scripts/run-oxlint.mjs src/agents/embedded-agent-runner/run.ts src/agents/embedded-agent-runner/run/assistant-failover.ts src/agents/embedded-agent-runner/run/assistant-failover.test.ts src/agents/embedded-agent-runner/run/helpers.ts src/agents/embedded-agent-runner/run/helpers.test.tsgit diff --check upstream/main..HEAD.agents/skills/autoreview/scripts/autoreview --mode branch --base upstream/main— clean before the final conflict-free rebase; no review rerun after the final upstream rebase per maintainer instruction.