Allow remote retry max delay to be user configurable#16058
Closed
joeljeske wants to merge 1 commit intobazelbuild:masterfrom
Closed
Allow remote retry max delay to be user configurable#16058joeljeske wants to merge 1 commit intobazelbuild:masterfrom
joeljeske wants to merge 1 commit intobazelbuild:masterfrom
Conversation
71a4cbe to
b14af12
Compare
Author
|
@tjgq could I trouble you for a quick review of this configuration option? 🙇 🙏 |
tjgq
requested changes
Dec 13, 2022
Contributor
tjgq
left a comment
There was a problem hiding this comment.
Sorry about the delay. This PR seems reasonable to me, I have just a small comment.
| */ | ||
| ExponentialBackoff(Duration initial, Duration max, double multiplier, double jitter, | ||
| int maxAttempts) { | ||
| Preconditions.checkArgument(max.compareTo(initial) > 0, "max must be > initial"); |
Contributor
There was a problem hiding this comment.
As a rule, Bazel should not crash due to malformed input. To avoid additional validation logic, how about we use max(initial, remoteRetryMaxDelay) as the argument to ExponentialBackoff below?
b14af12 to
10886cf
Compare
Author
|
Thanks for checking this out, @tjgq! I made your suggested change, preventing Bazel from crashing on an invalid value here. |
Author
|
Friendly ping, @tjgq |
Contributor
|
Sorry for the delay, importing it now. |
Contributor
|
@bazel-io flag |
Member
|
@bazel-io fork 6.2.0 |
ShreeM01
pushed a commit
to ShreeM01/bazel
that referenced
this pull request
Apr 12, 2023
This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes bazelbuild#16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3
keertk
pushed a commit
that referenced
this pull request
Apr 21, 2023
* Allow remote retry max delay to be user configurable This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes #16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3 * Replace RemoteDurationConverter with RemoteTimeoutConverter --------- Co-authored-by: Joel Jeske <joel.jeske@robinhood.com>
fweikert
pushed a commit
to fweikert/bazel
that referenced
this pull request
May 25, 2023
This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`. Rational `remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover. The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action. If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs. Closes bazelbuild#16058. PiperOrigin-RevId: 523680725 Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This introduces a new option
--remote_retry_max_delaycan be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to5s.Rational
remote_retriesis useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number forremote_retries, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover.The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until
remote_timeoutbefore failing. A largeremote_timeoutcombined with a largeremote_retriescould lead to waiting for a very long time before finally bailing on a given action.If a user can bump the
remote_retry_max_delay, they can control the retry waiting semantics to their own needs.