Skip to content

fix: cap retry-after delay to prevent silent multi-hour hangs#10384

Open
NgoQuocViet2001 wants to merge 2 commits into
cline:mainfrom
NgoQuocViet2001:ai/fix-retry-after-cap
Open

fix: cap retry-after delay to prevent silent multi-hour hangs#10384
NgoQuocViet2001 wants to merge 2 commits into
cline:mainfrom
NgoQuocViet2001:ai/fix-retry-after-cap

Conversation

@NgoQuocViet2001

Copy link
Copy Markdown

Summary

When a server returns a large retry-after header value (e.g., 3 hours from Gemini quota exhaustion), the withRetry decorator silently waits for the full duration instead of surfacing the error. This causes multi-hour hangs with no user feedback.

This PR adds a maxRetryAfter option (default: 60 seconds) to the retry decorator. When the computed retry delay from a retry-after header exceeds this cap, the error is thrown immediately instead of waiting, letting callers surface it to the user.

Fixes #10139

Changes

  • src/core/api/retry.ts: Added maxRetryAfter to RetryOptions interface and defaults (60s). After computing delay from retry-after header, throws immediately if delay > maxRetryAfter.
  • src/core/api/retry.test.ts: Added two test cases:
    • Verifies immediate throw when retry-after (3 hours) exceeds the cap
    • Verifies normal retry when retry-after (10ms) is within the cap
    • Updated existing Unix timestamp test to pass maxRetryAfter: Infinity since its synthetic timestamp produces a delay that exceeds the default cap

Test plan

  • All 10 retry decorator tests pass (npx mocha --no-config src/core/api/retry.test.ts)
  • Pre-commit hooks (biome lint) pass
  • Existing behavior preserved: exponential backoff, maxDelay, maxRetries, non-429 errors all unchanged
  • New guard only activates when retry-after header produces a delay > maxRetryAfter

When a server returns a large retry-after value (e.g. 3 hours from
Gemini quota exhaustion), the retry decorator would silently wait
for the entire duration. Add a maxRetryAfter option (default 60s)
that throws immediately when the retry-after delay exceeds the cap,
letting callers surface the error to users instead of hanging.

Fixes cline#10139
@changeset-bot

changeset-bot Bot commented Apr 24, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 65b1c5e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps

greptile-apps Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a maxRetryAfter guard (default 60 s) to the withRetry decorator: when the retry-after response header specifies a delay longer than the cap, the error is re-thrown immediately rather than sleeping for hours. The core implementation in retry.ts is correct and well-tested.

Confidence Score: 5/5

Safe to merge; all remaining feedback is P2 and does not affect production behaviour.

The production change in retry.ts is minimal, correct, and covered by tests. The only finding is a pre-existing test design issue (Unix-timestamp path not actually exercised) that was made more visible by this PR but does not affect the new feature.

src/core/api/retry.test.ts – the Unix-timestamp test case should be updated to use a future timestamp or mock Date.now so it covers the intended code path.

Important Files Changed

Filename Overview
src/core/api/retry.ts Adds maxRetryAfter cap (default 60 s) that throws immediately when a retry-after header would delay beyond the limit; logic is correct, parseInt → Number.parseInt and redundant type annotation cleaned up.
src/core/api/retry.test.ts Two new test cases for the cap feature are correct; the existing Unix-timestamp test was patched with maxRetryAfter: Infinity but now silently exercises the delta-seconds code path instead of the Unix-timestamp one.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[429 / RetriableError caught] --> B{isLastAttempt?}
    B -- Yes --> Z[throw error]
    B -- No --> C{retry-after header present?}
    C -- No --> D[Exponential backoff delay\ncapped at maxDelay]
    C -- Yes --> E[Parse retry-after value]
    E --> F{Is it a Unix timestamp?}
    F -- Yes --> G[delay = timestamp × 1000 − Date.now]
    F -- No --> H[delay = value × 1000]
    G --> I{delay > maxRetryAfter?}
    H --> I
    I -- Yes --> Z
    I -- No --> J[Call onRetryAttempt callback]
    D --> J
    J --> K[await setTimeout delay]
    K --> L[Retry attempt]
Loading

Comments Outside Diff (1)

  1. src/core/api/retry.test.ts, line 112-146 (link)

    P2 Unix timestamp test no longer tests Unix timestamp path

    The fixedDate is 2010-01-01, giving retryTimestamp ≈ 1,262,304,000. In 2026, Date.now() / 1000 ≈ 1,745,000,000, so the condition retryValue > Date.now() / 1000 evaluates to false — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of 1262304000 * 1000 === fixedDate.getTime()), but it isn't covering the intended path anymore.

    Adding maxRetryAfter: Number.POSITIVE_INFINITY was necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mocking Date.now) so the Unix path is actually exercised.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/core/api/retry.test.ts
    Line: 112-146
    
    Comment:
    **Unix timestamp test no longer tests Unix timestamp path**
    
    The `fixedDate` is `2010-01-01`, giving `retryTimestamp ≈ 1,262,304,000`. In 2026, `Date.now() / 1000 ≈ 1,745,000,000`, so the condition `retryValue > Date.now() / 1000` evaluates to **false** — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of `1262304000 * 1000 === fixedDate.getTime()`), but it isn't covering the intended path anymore.
    
    Adding `maxRetryAfter: Number.POSITIVE_INFINITY` was necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mocking `Date.now`) so the Unix path is actually exercised.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/core/api/retry.test.ts
Line: 112-146

Comment:
**Unix timestamp test no longer tests Unix timestamp path**

The `fixedDate` is `2010-01-01`, giving `retryTimestamp ≈ 1,262,304,000`. In 2026, `Date.now() / 1000 ≈ 1,745,000,000`, so the condition `retryValue > Date.now() / 1000` evaluates to **false** — the code takes the delta-seconds branch, not the Unix-timestamp branch. The test passes numerically (delta-seconds delay of `1262304000 * 1000 === fixedDate.getTime()`), but it isn't covering the intended path anymore.

Adding `maxRetryAfter: Number.POSITIVE_INFINITY` was necessary to avoid the new cap kicking in, but it also highlights that the test silently stopped testing Unix-timestamp behaviour some time ago. Consider using a future timestamp (or mocking `Date.now`) so the Unix path is actually exercised.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: cap retry-after delay to prevent si..." | Re-trigger Greptile

Use a stubbed Date.now so retryValue > Date.now()/1000 is true,
ensuring the test actually covers the Unix-timestamp branch instead
of silently falling into the delta-seconds branch.

Addresses greptile P2 review feedback.
@NgoQuocViet2001

Copy link
Copy Markdown
Author

No changeset needed — per project guidelines, contributors don't create changelog entries. Maintainers handle versioning and changelog curation during the release process.

@NgoQuocViet2001

Copy link
Copy Markdown
Author

Thanks for the thorough review! The P2 feedback about the Unix timestamp test is spot on — addressed in 65b1c5e.

The test now stubs Date.now to a fixed value (1700000000000) and sets retryTimestamp = fakeNow / 1000 + 1 (1 second in the future), ensuring retryValue > Date.now() / 1000 is true and the Unix-timestamp branch is actually exercised. The maxRetryAfter: Infinity workaround is no longer needed since the computed delay is just 1000ms, well within the default cap.

Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 25, 2026
- OpenHands/OpenHands#14101 merge-after-nits (pending message dual task-id encoding)
- charmbracelet/crush#2702 needs-discussion (super yollo: split yolo into two modes)
- cline/cline#10384 merge-after-nits (cap retry-after delay at 60s default)

INDEX: +8 entries under All-Hands-AI/OpenHands, anomalyco/opencode (sst),
BerriAI/litellm, charmbracelet/crush, cline/cline, continuedev/continue,
openai/codex. Verdict mix: 4 merge-as-is, 2 merge-after-nits, 2 needs-discussion,
1 request-changes.
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 26, 2026
- aaif-goose/goose#8842: lifecycle hooks system (needs-discussion: trust model + timeout + error taxonomy)
- cline/cline#10384: cap retry-after delay to prevent silent multi-hour hangs
- INDEX.md: drip-72 entry, count bumped to drip-72
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 26, 2026
- QwenLM/qwen-code#3642: BackgroundShellRegistry + /bashes (merge-after-nits)
- cline/cline#10384: maxRetryAfter cap with test-cleanup hazard (request-changes)
- aaif-goose/goose#8836: drop -i flag SIGTTOU fix (merge-as-is)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

429 with long retry-after retried instead of surfaced — WAIT vs STOP not distinguished

1 participant