transport: retry HTTP 429 (Too Many Requests)#2301
Merged
Conversation
Fixes google#2111. As filed by knqyf263, 429 responses are never retried by the default retry transport even though TooManyRequestsErrorCode is already classified temporary in transport/error.go. The retry transport in pkg/v1/remote/transport/retry.go short-circuits on its codes list before consulting Temporary(); since 429 isn't in defaultRetryStatusCodes, the conversion to a temporary transport.Error never happens and retry.IsTemporary never gets called. The fix is symmetric across the two parallel classifications: * defaultRetryStatusCodes in pkg/v1/remote/options.go now includes http.StatusTooManyRequests, so the retry transport wraps a 429 response into a temporary *transport.Error and retries it through the existing backoff (Duration 1s, Factor 3, Steps 3). * temporaryStatusCodes in pkg/v1/remote/transport/error.go also gains http.StatusTooManyRequests, bringing the raw-status fallback path into alignment with temporaryErrorCodes which has already been classifying TooManyRequestsErrorCode as temporary. The exhausted *transport.Error still carries StatusCode == 429, so pkg/gcrane/copy.go's hasStatusCode helper continues to route 429s through GCRBackoff at the application layer. The two retry layers compose: ~1.3s of transport-level retry per request, then ~6s+ of copy-level retry per image. No existing test expectations change. Out of scope here: honoring the Retry-After response header. The retry backoff is currently static; honoring server-dictated delays would need plumbing through the retry/wait package. Worth a follow-up. Tests: * pkg/v1/remote/transport/retry_test.go gains TestRetryTransport_TooManyRequests, which exercises three 429s through the retry transport and asserts both that all three attempts are made and that the final response surfaces with StatusCode == 429 (so gcrane's hasStatusCode keeps matching). It also pins the new entry in temporaryStatusCodes. * pkg/v1/remote/options_test.go (new file) pins 429 in the default retry status code list. A future refactor that drops 429 will fail loudly before the regression ships. Verified locally: go test ./... (full suite green) gofmt + goimports clean golangci-lint v2.11: 0 issues woke clean on touched files reviewdog/action-misspell US: clean on the diff
Subserial
approved these changes
May 18, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2301 +/- ##
=======================================
Coverage 56.89% 56.89%
=======================================
Files 165 165
Lines 11331 11331
=======================================
Hits 6447 6447
Misses 4120 4120
Partials 764 764 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2111.
As filed by @knqyf263, the retry transport never retries HTTP 429 responses even though
TooManyRequestsErrorCodeis already classified temporary inpkg/v1/remote/transport/error.go. The retry transport short-circuits on itscodeslist before consultingTemporary(), and since 429 isn't indefaultRetryStatusCodes, the conversion to a temporary*transport.Errornever happens andretry.IsTemporarynever gets called.The fix is symmetric across the two parallel "is this transient" classifications:
defaultRetryStatusCodesinpkg/v1/remote/options.gohttp.StatusTooManyRequeststemporaryStatusCodesinpkg/v1/remote/transport/error.gohttp.StatusTooManyRequestsfor parity withtemporaryErrorCodes[TooManyRequestsErrorCode]429 is now retried by the default transport (
Duration: 1s, Factor: 3, Steps: 3, ~1.3s total).Interaction with
pkg/gcrane/copy.gogcrane's copy path has its own outer retry loop that explicitly handles 429:After this change the exhausted
*transport.Errorstill carriesStatusCode == 429, sohasStatusCodekeeps matching andGCRBackoffcontinues to drive longer-term retry at the copy-operation layer. The two layers compose cleanly:A test asserts the final response still surfaces with
StatusCode == 429, so this composition is verified.Out of scope
Honoring the
Retry-Afterresponse header. The retry backoff is currently static — honoring a server-dictated delay would need plumbing throughinternal/retry/wait. Worth a follow-up; not included here to keep the change focused on the reported bug.Test plan
TestRetryTransport_TooManyRequestsinpkg/v1/remote/transport/retry_test.go: three 429s through the transport, asserts all three attempts made and the final response surfaces withStatusCode == 429.TestTemporaryStatusCodes_Includes429in the same file: pins 429 into thetemporaryStatusCodesmap so the raw-status fallback stays in sync withtemporaryErrorCodes[TooManyRequestsErrorCode].TestDefaultRetryStatusCodes_Includes429in newpkg/v1/remote/options_test.go: pins 429 into the default list itself.go test ./... -count=1full suite passes locally.gofmt,goimports,golangci-lint v2.11clean.wokeand US-localemisspellclean on the diff.