Add general request retry mechanism#2194
Add general request retry mechanism#2194jtopjian merged 5 commits intogophercloud:masterfrom andrewbaxter:general-retries
Conversation
|
This is passing tests (at least the ones I tried locally) but I haven't played around with it much yet. I wanted initial feedback if you thought this needed to be redesigned. Notes on the implementation:
|
|
I've tested this myself with |
|
Build failed.
|
|
@andrewbaxter Thanks for working on this. This is a really good start. The first area that stuck out was with the RetryBackoff implementation. To make things better named, I think it's best to rename the existing After that, we can rename Looking at the implementation, instead of creating a ...
// existing code
case http.StatusServiceUnavailable:
err = ErrDefault503{respErr}
if error503er, ok := errType.(Err503er); ok {
err = error503er.Error503(respErr)
}
}
if err == nil {
err = respErr
}
// add retry mechanism here
if err != nil && client.RetryFunc != nil {
var e error
state.retries = state.retries + 1
e = client.RetryFunc(client.Context, method, url, options, err, state.retries)
if e != nil {
return resp, e
}
return client.doRequest(method, url, options, state)
}
// resume existing code
return resp, err
}
...I tested this out and the unit tests pass, but maybe I am missing something. Let me know your thoughts on the above. Once we discuss that, we can talk about extra tests, default functions, etc. I'm not concerned about the failing acceptance test yet. OpenLab can be quirky sometimes and exhibit transient behaviour. However, if they continue to fail, it might be this RetryFunc somehow causing request issues. |
|
Thanks! I'll get on those changes. For the single Okay yeah, I didn't look at the acceptance tests quite yet (wasn't sure where to find the failing output exactly) 😅 I hope they do pass but I'll dig deeper after resolving the above. Thanks for the review! |
Correct. We're already doing reauth retries and backoff retries in the single
If you click on the link for the failed test, you can access |
|
recheck |
|
Build succeeded.
|
|
I made the above changes, but there are at least 4 separate error return statements in |
|
@andrewbaxter Thanks for continuing to work on this. I think the first two additions of calling RetryFunc are all that's needed (Lines 428 and 573). On Line 593, the error would be an IO-based error, where the content is being discarded in order to trigger an EOF so that the connection can safely close. I don't think this is something that should be retried if it fails since it's not part of the HTTP/network request (other than ensuring a connection is closed, and if we retry, we might end up accumulating open connections). And on Line 606, the error would be from unmarshalling a JSON response. At this point, the HTTP connection was successful, data was successfully retrieved, the HTTP connection was closed, but there was some kind of problem unmarshalling the data. IMO, this error should be handled within the application. Thoughts? |
|
Okay, makes sense for 593. Retrying would be a strange response to some internal connection management issue here, and it looks like the response is returned (just along with the error). For 606, I've seen servers partially write responses (with matching content lengths) randomly, and that's a situation where I'd want to retry - similarly to other temporary upstream issues. |
|
Build succeeded.
|
|
Build succeeded.
|
That's fair and sounds good to me. From what I can tell, this is looking good. @kayrus @ozerovandrei do either of you have any thoughts on this one? |
|
@andrewbaxter Thank you for working on this! |
|
Awesome, thanks for the reviews and merge! |
|
This change must be reflected in gophercloud/utils as well: |
Prior to starting a PR, please make sure you have read our
contributor tutorial.
Prior to a PR being reviewed, there needs to be a Github issue that the PR
addresses. Replace the brackets and text below with that issue number.
For #2183
Links to the line numbers/files in the OpenStack source code that support the
code in this PR:
None - this is purely client-side.