feat: Support spill write batch size limit#13862
feat: Support spill write batch size limit#13862wForget wants to merge 5 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
Hi @wForget! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Thanks! |
Thank you, updated |
|
@xiaoxmeng @majetideepak Could you please take a look? |
|
Gentle ping @xiaoxmeng @majetideepak, sorry to bother you again, could you please review it if you have time? |
majetideepak
left a comment
There was a problem hiding this comment.
@wForget can you share the motivation behind this feature?
How would users configure both the buffer size and batch size in general?
My motivation is to make the size of the RowVector after deserializing SpillFile closer to this batch size. As observed in #13861, excessively large row sizes may lead to performance regression.
The spill buffer will be flushed when either the total serialized size reaches |
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
|
|
||
| /// Specifies the batch rows size to buffer the serialized spill data before | ||
| /// write to storage system | ||
| uint64_t writeBatchSize; |
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
closes #13861