Limit batch buffer capacity#348
Conversation
Summary of ChangesHello @QuentinI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant refinements to the batch processing mechanism within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a capacity limit for the batch buffer, which is a crucial step to prevent unbounded memory usage. The changes also include a more robust handling of undecided batches by placing them in the main buffer for later re-evaluation, which helps in reducing non-determinism. The introduction of headBatch and skipPos in the streamer provides a graceful mechanism to manage a full buffer without getting stuck. Overall, the logic is sound and addresses the intended problem effectively. I have a few suggestions to enhance robustness and code clarity.
4808a80 to
12afb24
Compare
| // Note: we use a custom comparison function that returns 0 only if the batches are actually | ||
| // the same to ensure that newer batches with the same number are stored later in the buffer |
There was a problem hiding this comment.
I was wondering about the case where a new batch has the same hash as an existing batch but a different number. Can we simplify the second if-clause to check the hash only?
(And would a test be helpful for this case?)
There was a problem hiding this comment.
a new batch has the same hash as an existing batch but a different number
That would be a hash collision, block number is part of the hash.
Can we simplify the second if-clause to check the hash only?
In practice yes, equal hashes imply equal numbers absent any collisions, but I feel like it would read worse, because the second if clause would have none of the operands of the first if clause.
espresso/streamer.go
Outdated
| s.processEspressoTransaction(ctx, txn.Payload) | ||
| err := s.processEspressoTransaction(ctx, txn.Payload) | ||
| if errors.Is(err, ErrAtCapacity) { | ||
| s.skipPos = start |
There was a problem hiding this comment.
Can we exit here with the error without continuing the iteration?
There was a problem hiding this comment.
No, because we may still be looking for a missing head batch.
There was a problem hiding this comment.
Makes sense! A follow-up question: should we set skipPos conditionally, i.e., compare it with the current value first before using the new start value?
There was a problem hiding this comment.
Indeed, thank you for the catch! Fixed and added a regression test.
jjeangal
left a comment
There was a problem hiding this comment.
Overall looks good me!
One remark that I have is about the way the testing is handled. There seems to have been a lot of additions and new tests added which is a very good thing but would there be ways to optimize or delete duplicated test logic, etc?
espresso/batch_buffer.go
Outdated
| // the same to ensure that newer batches with the same number are stored later in the buffer | ||
| if a.Number() > t.Number() { | ||
| return 1 | ||
| } else if a.Number() == t.Number() && a.Hash() == t.Hash() { |
There was a problem hiding this comment.
With your comment above in mind we would only need to compare the hash right? Not that big of a deal tho I guess.
There was a problem hiding this comment.
Yes, I deliberately left the Numbers there because I thought it would read better (see my response to similar question raised by Keyao), but it seems like 2/2 reviewers find it more confusing... Removed.
skipPos was unconditionally assigned to start on each ErrAtCapacity, causing later fetch ranges to overwrite the value set by earlier ones. After a rewind, batches dropped in the earlier ranges were permanently skipped. Use min(skipPos, start) to preserve the earliest position. Co-authored-by: OpenCode <noreply@opencode.ai>
|
@jjeangal re: tests - I must admit I haven't checked that I'm not adding any duplicates very thoroughly, I removed a couple tests that were already covered. Everything that remains should be purely new coverage. |
Return ErrDuplicateBatch from BatchBuffer.Insert so the streamer can log a warning when a duplicate hash is encountered. Update tests accordingly. Co-authored-by: OpenCode <noreply@opencode.ai>
jjeangal
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments. LGTM!
9b83f58
into
celo-integration-rebase-14.2
* Add cap to batch buffer * Gemini's suggestions * No recursion * Fix typo * Fmt * Rewrite test * Add tests * Remove unused function * Fix skipPos overwrite across multiple fetch ranges skipPos was unconditionally assigned to start on each ErrAtCapacity, causing later fetch ranges to overwrite the value set by earlier ones. After a rewind, batches dropped in the earlier ranges were permanently skipped. Use min(skipPos, start) to preserve the earliest position. Co-authored-by: OpenCode <noreply@opencode.ai> * Comments * Log warning on duplicate batch insertion Return ErrDuplicateBatch from BatchBuffer.Insert so the streamer can log a warning when a duplicate hash is encountered. Update tests accordingly. Co-authored-by: OpenCode <noreply@opencode.ai> --------- Co-authored-by: OpenCode <noreply@opencode.ai>
* Add cap to batch buffer * Gemini's suggestions * No recursion * Fix typo * Fmt * Rewrite test * Add tests * Remove unused function * Fix skipPos overwrite across multiple fetch ranges skipPos was unconditionally assigned to start on each ErrAtCapacity, causing later fetch ranges to overwrite the value set by earlier ones. After a rewind, batches dropped in the earlier ranges were permanently skipped. Use min(skipPos, start) to preserve the earliest position. Co-authored-by: OpenCode <noreply@opencode.ai> * Comments * Log warning on duplicate batch insertion Return ErrDuplicateBatch from BatchBuffer.Insert so the streamer can log a warning when a duplicate hash is encountered. Update tests accordingly. Co-authored-by: OpenCode <noreply@opencode.ai> --------- Co-authored-by: OpenCode <noreply@opencode.ai>
Closes #<ISSUE_NUMBER>
This PR:
This PR does not:
Key places to review: