fix: cap retry-after delay to prevent silent multi-hour hangs#10384
fix: cap retry-after delay to prevent silent multi-hour hangs#10384NgoQuocViet2001 wants to merge 2 commits into
Conversation
When a server returns a large retry-after value (e.g. 3 hours from Gemini quota exhaustion), the retry decorator would silently wait for the entire duration. Add a maxRetryAfter option (default 60s) that throws immediately when the retry-after delay exceeds the cap, letting callers surface the error to users instead of hanging. Fixes cline#10139
|
Greptile SummaryThis PR adds a Confidence Score: 5/5Safe to merge; all remaining feedback is P2 and does not affect production behaviour. The production change in retry.ts is minimal, correct, and covered by tests. The only finding is a pre-existing test design issue (Unix-timestamp path not actually exercised) that was made more visible by this PR but does not affect the new feature. src/core/api/retry.test.ts – the Unix-timestamp test case should be updated to use a future timestamp or mock Date.now so it covers the intended code path.
|
| Filename | Overview |
|---|---|
| src/core/api/retry.ts | Adds maxRetryAfter cap (default 60 s) that throws immediately when a retry-after header would delay beyond the limit; logic is correct, parseInt → Number.parseInt and redundant type annotation cleaned up. |
| src/core/api/retry.test.ts | Two new test cases for the cap feature are correct; the existing Unix-timestamp test was patched with maxRetryAfter: Infinity but now silently exercises the delta-seconds code path instead of the Unix-timestamp one. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[429 / RetriableError caught] --> B{isLastAttempt?}
B -- Yes --> Z[throw error]
B -- No --> C{retry-after header present?}
C -- No --> D[Exponential backoff delay\ncapped at maxDelay]
C -- Yes --> E[Parse retry-after value]
E --> F{Is it a Unix timestamp?}
F -- Yes --> G[delay = timestamp × 1000 − Date.now]
F -- No --> H[delay = value × 1000]
G --> I{delay > maxRetryAfter?}
H --> I
I -- Yes --> Z
I -- No --> J[Call onRetryAttempt callback]
D --> J
J --> K[await setTimeout delay]
K --> L[Retry attempt]
Comments Outside Diff (1)
-
src/core/api/retry.test.ts, line 112-146 (link)Unix timestamp test no longer tests Unix timestamp path
The
fixedDateis2010-01-01, givingretryTimestamp ≈ 1,262,304,000. In 2026,Date.now() / 1000 ≈ 1,745,000,000, so the conditionretryValue > Date.now() / 1000evaluates to false — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of1262304000 * 1000 === fixedDate.getTime()), but it isn't covering the intended path anymore.Adding
maxRetryAfter: Number.POSITIVE_INFINITYwas necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mockingDate.now) so the Unix path is actually exercised.Prompt To Fix With AI
This is a comment left during a code review. Path: src/core/api/retry.test.ts Line: 112-146 Comment: **Unix timestamp test no longer tests Unix timestamp path** The `fixedDate` is `2010-01-01`, giving `retryTimestamp ≈ 1,262,304,000`. In 2026, `Date.now() / 1000 ≈ 1,745,000,000`, so the condition `retryValue > Date.now() / 1000` evaluates to **false** — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of `1262304000 * 1000 === fixedDate.getTime()`), but it isn't covering the intended path anymore. Adding `maxRetryAfter: Number.POSITIVE_INFINITY` was necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mocking `Date.now`) so the Unix path is actually exercised. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/core/api/retry.test.ts
Line: 112-146
Comment:
**Unix timestamp test no longer tests Unix timestamp path**
The `fixedDate` is `2010-01-01`, giving `retryTimestamp ≈ 1,262,304,000`. In 2026, `Date.now() / 1000 ≈ 1,745,000,000`, so the condition `retryValue > Date.now() / 1000` evaluates to **false** — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of `1262304000 * 1000 === fixedDate.getTime()`), but it isn't covering the intended path anymore.
Adding `maxRetryAfter: Number.POSITIVE_INFINITY` was necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mocking `Date.now`) so the Unix path is actually exercised.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: cap retry-after delay to prevent si..." | Re-trigger Greptile
Use a stubbed Date.now so retryValue > Date.now()/1000 is true, ensuring the test actually covers the Unix-timestamp branch instead of silently falling into the delta-seconds branch. Addresses greptile P2 review feedback.
|
No changeset needed — per project guidelines, contributors don't create changelog entries. Maintainers handle versioning and changelog curation during the release process. |
|
Thanks for the thorough review! The P2 feedback about the Unix timestamp test is spot on — addressed in 65b1c5e. The test now stubs |
- OpenHands/OpenHands#14101 merge-after-nits (pending message dual task-id encoding) - charmbracelet/crush#2702 needs-discussion (super yollo: split yolo into two modes) - cline/cline#10384 merge-after-nits (cap retry-after delay at 60s default) INDEX: +8 entries under All-Hands-AI/OpenHands, anomalyco/opencode (sst), BerriAI/litellm, charmbracelet/crush, cline/cline, continuedev/continue, openai/codex. Verdict mix: 4 merge-as-is, 2 merge-after-nits, 2 needs-discussion, 1 request-changes.
- aaif-goose/goose#8842: lifecycle hooks system (needs-discussion: trust model + timeout + error taxonomy) - cline/cline#10384: cap retry-after delay to prevent silent multi-hour hangs - INDEX.md: drip-72 entry, count bumped to drip-72
- QwenLM/qwen-code#3642: BackgroundShellRegistry + /bashes (merge-after-nits) - cline/cline#10384: maxRetryAfter cap with test-cleanup hazard (request-changes) - aaif-goose/goose#8836: drop -i flag SIGTTOU fix (merge-as-is)
Summary
When a server returns a large
retry-afterheader value (e.g., 3 hours from Gemini quota exhaustion), thewithRetrydecorator silently waits for the full duration instead of surfacing the error. This causes multi-hour hangs with no user feedback.This PR adds a
maxRetryAfteroption (default: 60 seconds) to the retry decorator. When the computed retry delay from aretry-afterheader exceeds this cap, the error is thrown immediately instead of waiting, letting callers surface it to the user.Fixes #10139
Changes
src/core/api/retry.ts: AddedmaxRetryAftertoRetryOptionsinterface and defaults (60s). After computing delay fromretry-afterheader, throws immediately ifdelay > maxRetryAfter.src/core/api/retry.test.ts: Added two test cases:maxRetryAfter: Infinitysince its synthetic timestamp produces a delay that exceeds the default capTest plan
npx mocha --no-config src/core/api/retry.test.ts)