-
Notifications
You must be signed in to change notification settings - Fork 161
fix: BulkWriter: add backoff on retries #1447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1447 +/- ##
==========================================
+ Coverage 98.20% 98.22% +0.01%
==========================================
Files 32 32
Lines 19509 19565 +56
Branches 1276 1283 +7
==========================================
+ Hits 19159 19217 +58
+ Misses 346 344 -2
Partials 4 4
Continue to review full report at Codecov.
|
dev/src/bulk-writer.ts
Outdated
| // An array of pending write operations. Only contains writes that have not | ||
| // been resolved. | ||
| private pendingOps: Array<BulkWriterOperation> = []; | ||
| private _pendingOps: Array<BulkWriterOperation> = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be renamed to pendingOps and made readonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
dev/src/bulk-writer.ts
Outdated
| if (failureCount === 0) return 0; | ||
| if (useMaxDelay) return DEFAULT_BACKOFF_MAX_DELAY_MS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is correct for our use case, this reads a bit strange. I would flip these two statements. That means that we would end up using useMaxDelay no matter what the failure count is, which I think is easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| return Math.min( | ||
| DEFAULT_BACKOFF_MAX_DELAY_MS, | ||
| backoffMs + Math.random() * 0.3 * backoffMs | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying the jitter to the entire duration of the backoff creates a pretty strong variance, and is different from how we compute the backoff jitter elsewhere. Can we use the last backoff and then only apply the jitter to the next section? This would either be:
DEFAULT_BACKOFF_INITIAL_DELAY_MS *
Math.pow(DEFAULT_BACKOFF_FACTOR, failureCount-1) + (DEFAULT_BACKOFF_FACTOR * DEFAULT_BACKOFF_INITIAL_DELAY_MS * jitter); (Maybe?)
or a simple backoff time in Delayed operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the ExponentialBackoff class has even more variance. We use currentBaseMs + (Math.random-0.5)*currentBaseMs, where currentBaseMs = initialDelay * pow(1.5, numAttempts).
I modeled the BulkWriter approach after ExponentialBackoff, and used 0.3 instead of 0.5 to ensure that the backoff increases each time.
This is bit more of a nit (and it probably won't affect performance), but I feel that adding the delay with jitter rather than multiplying it goes against the spirit of exponential backoff, because 100x^k != 100x^(k-1)+x
| get failedAttempts(): number { | ||
| return this._failedAttempts; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts/ideas:
-
Could each operation calculate its own backoff? We can then use a simpler calculation where each backoff depends on the previous backoff time (including jitter). You would just keep a simple "last_backoff_duration" in BulkWriterOperation and multiple it per failed attempt.
-
Would it be possible to use the
Backoffclass? We should just need to expose a method to return the backoff duration. -
I think it would be cleaner if we stored a
lastErrorinstead of the more specificresourceExhausted. It is cleaner and more versatile for later modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could each operation calculate its own backoff?
Done.
Would it be possible to use the Backoff class?
See my below comment on how the backoff used here is different than ExponentialBackoff. I don't think there is much to reuse in terms of logic here.
I think it would be cleaner if we stored a lastError instead of the more specific resourceExhausted.
Done.
| }); | ||
| }); | ||
|
|
||
| it('applies maximum backoff on retries for RESOURCE_EXHAUSTED', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a test that uses multiple different backoffs in a batch and a test that verifies that we send a second batch before the first batch if the first batch has a backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| /*! | ||
| * The default jitter to apply. For example, a factor of 0.3 means a 30% jitter | ||
| * is applied. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that this relates to backoff. It needs some context in this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dev/src/bulk-writer.ts
Outdated
| private deferred = new Deferred<WriteResult>(); | ||
| private failedAttempts = 0; | ||
| private _failedAttempts = 0; | ||
| private _lastError?: Status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastStatus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dev/src/bulk-writer.ts
Outdated
| return this.deferred.promise; | ||
| } | ||
|
|
||
| get lastError(): Status | undefined { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised this need to be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too, removed.
dev/src/bulk-writer.ts
Outdated
| const delayMs = this._rateLimiter.getNextRequestDelayMs( | ||
| pendingBatch._opCount | ||
| ); | ||
| const finalDelayMs = Math.max(backoffMsWithJitter, delayMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move out of the if statement and then use it in the if(...) check itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dev/src/bulk-writer.ts
Outdated
| const jitter = | ||
| Math.random() * | ||
| DEFAULT_JITTER_FACTOR * | ||
| (Math.round(Math.random()) ? 1 : -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math.random() is not cheap. You can remove one call: Math.random() * 2 * DEFAULT_JITTER_FACTOR - DEFAULT_JITTER_FACTOR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh that's some nice mathing! Refactored it slightly to make it more understandable (at least to me).
| (1 - DEFAULT_JITTER_FACTOR) * expected[timeoutHandlerCounter], | ||
| (1 + DEFAULT_JITTER_FACTOR) * expected[timeoutHandlerCounter] | ||
| ); | ||
| timeoutHandlerCounter++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
dev/src/bulk-writer.ts
Outdated
| class BulkWriterOperation { | ||
| private deferred = new Deferred<WriteResult>(); | ||
| private failedAttempts = 0; | ||
| private _failedAttempts = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can rename this back to its old name since there is no getter anymore.
No description provided.