Skip to content

feat(storage): implement RetryChunkDeadline for grpc writes#11476

Merged
BrennaEpp merged 8 commits into
googleapis:mainfrom
BrennaEpp:feat-chunkretrydeadline
Jan 29, 2025
Merged

feat(storage): implement RetryChunkDeadline for grpc writes#11476
BrennaEpp merged 8 commits into
googleapis:mainfrom
BrennaEpp:feat-chunkretrydeadline

Conversation

@BrennaEpp

@BrennaEpp BrennaEpp commented Jan 21, 2025

Copy link
Copy Markdown
Contributor

Branched from #11475

Adds a timer to the upload buffer. This is equivalent to the http implementation.

Changed from implementing directly in the upload code to implementing in invoke.go. This makes it extensible (if we ever want to add a similar deadline to other calls), but also more prone to affecting other ops (just to be clear, I don't think it will given testing/how it is written). Reason for changing it is because we need to be able to stop the retries while still wrapping the error. If we do it a level above, if we return a "deadline done, here is last error" wrapping the error, it will continue to retry if the last error is retriable.

@product-auto-label product-auto-label Bot added the api: storage Issues related to the Cloud Storage API. label Jan 21, 2025
@BrennaEpp BrennaEpp marked this pull request as ready for review January 28, 2025 07:43
@BrennaEpp BrennaEpp requested review from a team January 28, 2025 07:43

@tritone tritone left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, a few minor things

Comment thread storage/storage.go Outdated
policy RetryPolicy
shouldRetry func(err error) bool
maxAttempts *int
maxDuration time.Duration

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like this isn't passed in by anything except writer currently. Maybe add a comment to note that?

Comment thread storage/storage.go Outdated
policy RetryPolicy
shouldRetry func(err error) bool
maxAttempts *int
maxDuration time.Duration

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's call it maxRetryDuration since it doesn't actually limit the total duration of calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
It stutters a bit (retryConfig.maxRetryDuration) but that was my initial instinct as well. It's clearer anyway since retryConfig can have any name.

return (errors.As(err, &httpErr) && httpErr.Code == httpStatusCode) ||
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode)
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode) ||
status.Code(err) == grpcStatusCode

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need 3 cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testbench errors are Status errors but not apierror.APIErrors, so the added test fails when checking the status code without this change

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That seems like an issue with wrapping at some layer. But in any case L6974 should work for all grpc errors (both real service and testbench)

Comment thread storage/invoke.go Outdated
if lastErr == nil {
return true, errors.New("storage: request not sent, choose a larger value for the retry deadline")
}
return true, fmt.Errorf("storage: retry timed out after %v attempts; last error: %w", attempts, lastErr)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This err should include the value of the timeout as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants