-
Notifications
You must be signed in to change notification settings - Fork 17
Retry handler design spec. update #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
middleware/RetryHandler.md
Outdated
| - The client code can specify a custom value for the maximum delay. | ||
| - Retries will be based on the cumulative retry delay compared against the maximum delay specified by the client code. | ||
| - Upon expiry of the maximum delay, when the cumulative retry delay is greater than the specified maximum delay, the `Task` should be cancelled and an `exception` thrown with a relevant message. | ||
| - In the case whereby the client code specifies a maximum delay that is less than the received `retry-after` header value from the server, the cumulative retry delay will run for the duration of the `retry-after` value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client code specifies a maximum delay less than the retry-after, then we should respect the client's choice and fail despite the service indicating a retry-after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MIchaelMainer by failing, do you propose we fail fast and do not even attempt to delay the retry request as specified by the retry-after or exponential back-off values? Or do we enter the Delay() Task and retry at least once then fail? (see my comment on Peter's review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I need to be more specific.
- If the retry-after exceeds the max delay, we should fail fast. The service is telling us to wait longer than value max delay submitted by the dev's code. No need to wait until we reach the max delay time as we don't expect the server to be able to handle the request.
- If the exponential back-off values exceeds the max delay value, I think we could wait to retry once more at the max delay time, and then fail if that fails.
No retry should occur after reaching the max delay time. Customer specified should always be conservatively respected.
middleware/RetryHandler.md
Outdated
| - Retries MUST respect the `retry-after` response header if provided. | ||
| - Where no `retry-after` header is provided by the server, an exponential backoff with random offset hueristic should be used to determine the retry delay. | ||
| - Retries should be limited to a maximum value. | ||
| - Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have a minimum max delay value to avoid customers specifying max delay values such as 0 or 0.5, which would essentially result in no retries. This minimum allowed value can be the current DEFAULT_MAX_RETRY value of 3.
cc/ @MIchaelMainer @darrelmiller I'm not sure whether we want this behavior since we already have a ShouldRetry delegate that customers should use to disable retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peombwa I believe your concern is similar to the case whereby a customer specifies a MaxRetryTime (max delay) value less than the retry-after value, in that, in both cases the Delay Task will be delayed with the server-specified retry-after or exponential back-off delay values at least once. See the implementation:
cumulativeDelay = 0.0;
while (cumulativeDelay < RetryOption.MaxRetryTime)
{
Task Delay(response, retryCount, RetryOption.Delay, cancellationToken)
}
The cumulativeDelay value will be incremented by the retry-after or exponential back-off values in the call to Task Delay() and then fail in the subsequent retry attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have a minimum delay value. We should always respect a customer specified MaxRetryTime. If the customer doesn't want to wait more than 500ms, then we should always respect that even if that means not retrying, or retrying once more at their specified MaxRetryTime (for exponential back off only - see my comment above for retry-after). Short circuiting retry before MaxRetryTime should only occur in ShouldRetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @MIchaelMainer . I don't think we should add the complexity of a minimum MaxRetryTime. I also wonder if RetriesTimeLimit is a clearer name for this property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this statement should change from
Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds.
to
Retries MAY be limited to a maximum delay value. The default value for this is set at 180 seconds.
We should leave our current behavior as the default behavior and make this behavior a new option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we implement a default then this isn't an option unless we have a new overload that says add a timelimit and use the default value. I'd rather just make the time limit only have an effect if one is explicitly provided. And we shouldn't use the word "delay" to describe the TimeLimit, it is confusing.
middleware/RetryHandler.md
Outdated
| - Retries will be based on the cumulative retry delay against the maximum retry time specified by the client code. | ||
| - Upon expiry of the maximum retry time, the Task should be cancelled and an exception thrown. | ||
| - Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds. | ||
| - The client code can specify a custom value for the maximum delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darrelmiller, @peombwa, @MIchaelMainer Open question: Are we going to require the max delay time to be explicitly set by customers or do we leave it optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it as optional.
middleware/RetryHandler.md
Outdated
| - Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds. | ||
| - The client code can specify a custom value for the maximum delay. | ||
| - Retries will be based on the cumulative retry delays compared against the maximum delay specified by the client code, cumulatively added across all retries within the context of a single call. This is applicable to both 429 and 503/504. | ||
| - Upon expiry of the maximum delay, when the cumulative retry delay is greater than the specified maximum delay, the `Task` should be cancelled and an `exception` thrown with a relevant message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be throwing exception because we exceeded the RetriesTimeLimit. We should simply not retry again. We could try and cancel a currently running task, but I'm not sure this is wise. This would could cause the server to do a bunch of work that is then wasted.
The goal of this property should be to set a limit, that if exceeded prevents additional retries. We should also account for exceeding the limit while waiting to retry, which is covered in the next point.
We are not trying to create a hard limit that guarantees that the call will only take X number of seconds. I think that would be challenging to achieve reliably.
Content update Co-Authored-By: Darrel <darrmi@microsoft.com>
Content update Co-Authored-By: Darrel <darrmi@microsoft.com>
|
@darrelmiller, @peombwa, @MIchaelMainer I've edited this doc. to address both scenarios for time-based and count-based retries. From Darrel's point of view - the time-based retry option wasn't supposed to replace count-based retry that was already in place, but to be added as a new feature. Are we all in agreement with this? |
middleware/RetryHandler.md
Outdated
| - Retries MUST respect the `retry-after` response header if provided. | ||
| - Where no `retry-after` header is provided by the server, an exponential backoff with random offset hueristic should be used to determine the retry delay. | ||
| - Retries should be limited to a maximum value. | ||
| - Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this statement should change from
Retries should be limited to a maximum delay value. The default value for this is set at 180 seconds.
to
Retries MAY be limited to a maximum delay value. The default value for this is set at 180 seconds.
We should leave our current behavior as the default behavior and make this behavior a new option.
|
We are in agreement. |
middleware/RetryHandler.md
Outdated
| - Upon expiry of the maximum delay, when the cumulative retry delay is greater than the specified maximum delay, the `Task` should be cancelled and an `exception` thrown with a relevant message. | ||
| - In the case where the client receives a `retry-after` value that is greater than the remaining `RetriesTimeLimit` the client should return the failed response immediately. | ||
| - Only requests with payloads that are buffered/rewindable are supported. Payloads with forward only streams will be have the responses returned without any retry attempt. | ||
| - Customers can specify a custom value for the `RetriesTimeLimit` greater than 0 to introduce time-based evaluated request retries alongside the default count-based request retry. The default is set at 0 second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to say anything about the default. There is effectively no default.
Co-Authored-By: Darrel <darrmi@microsoft.com>
This PR updates the design spec. for the retry handler middleware proposing to change the count-based request retries to time-based request retries.
From task: microsoftgraph/msgraph-sdk-dotnet/issues/441