cloud/gcp: add custom retryer for gcs storage, retry on stream INTERNAL_ERROR#85024
cloud/gcp: add custom retryer for gcs storage, retry on stream INTERNAL_ERROR#85024craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
14d7dcf to
f4f6450
Compare
adityamaru
left a comment
There was a problem hiding this comment.
Nice, this'll solve a bunch of failing roachtests 🤞 LGTM, it's a bummer they haven't exposed ShouldRetry yet but keeping the logic in a different file is a good idea.
pkg/cloud/gcp/gcs_storage.go
Outdated
There was a problem hiding this comment.
This looks correct but is it more intuitive to write if ok := e.(errors.Wrapper); ok { ... }? @knz is there a more canonical way to check if an error is wrapped?
There was a problem hiding this comment.
FYI I had to write it like this because of a checker that we have. This pattern is the suggested one:
https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/passes/errcmp/errcmp.go#L97
There was a problem hiding this comment.
OH! never mind me then 👍
|
Before merging can you also add all the failing roachtests on master as |
pkg/cloud/gcp/gcs_storage.go
Outdated
There was a problem hiding this comment.
maybe add a small comment on why these specific cases are outside google's default retries, and why we retry them? Linking the google sdk issues is probably sufficient.
52cca37 to
4b84443
Compare
…AL_ERROR Currently, errors like `stream error: stream ID <x>; INTERNAL_ERROR; received from peer` are not being retried. Create a custom retryer to retry these errors as suggested by: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Fixes: cockroachdb#85217, cockroachdb#85216, cockroachdb#85204, cockroachdb#84162 Release note: None
|
bors r+ |
|
Build succeeded: |
|
blathers backport 22.1 |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from cb92673 to blathers/backport-release-22.1-85024: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
…o v1.21.0 This commit bumps the `cloud.google.com/go/storage` vendor to include the ability to inject custom retry functions when reading and writing from the underlying SDK - https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry. The motivation for this change is to combat the high rate of restores we are seeing fail due to an internal http2 stream error that is being surfaced by the SDK in our roachtests. As seen in cockroachdb#85024 we would like to wrap the default retry logic with our custom retry handling for this particular error. This is the recommended solution as per: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Note, the dependencies have been bumped to the version that we have been running on master since the 22.1 branch was cut. Release note (general change): bump cloud.google.com/go/storage from v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the SDK
…o v1.21.0 This commit bumps the `cloud.google.com/go/storage` vendor to include the ability to inject custom retry functions when reading and writing from the underlying SDK - https://pkg.go.dev/cloud.google.com/go/storage#Client.SetRetry. The motivation for this change is to combat the high rate of restores we are seeing fail due to an internal http2 stream error that is being surfaced by the SDK in our roachtests. As seen in cockroachdb#85024 we would like to wrap the default retry logic with our custom retry handling for this particular error. This is the recommended solution as per: googleapis/google-cloud-go#3735 googleapis/google-cloud-go#784 Note, the dependencies have been bumped to the version that we have been running on master since the 22.1 branch was cut. Release note (general change): bump cloud.google.com/go/storage from v18.2.0 to v1.21.0 to allow for injection of custom retry logic in the SDK
Currently, errors like
stream error: stream ID <x>; INTERNAL_ERROR; received from peerare not being retried. Create a custom retryer to retry these errors as
suggested by:
googleapis/google-cloud-go#3735
googleapis/google-cloud-go#784
Fixes: #85217, #85216, #85204, #84162
Release note: None