Skip to content

feat: Support spill write batch size limit#13862

Closed
wForget wants to merge 5 commits intofacebookincubator:mainfrom
wForget:VELOX-13861
Closed

feat: Support spill write batch size limit#13862
wForget wants to merge 5 commits intofacebookincubator:mainfrom
wForget:VELOX-13861

Conversation

@wForget
Copy link
Copy Markdown
Contributor

@wForget wForget commented Jun 24, 2025

closes #13861

@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 24, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5c6b138
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/685cc2e90c93cd00080f6c3a

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @wForget!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2025
@wForget wForget marked this pull request as ready for review June 24, 2025 11:26
@wForget wForget requested a review from majetideepak as a code owner June 24, 2025 11:26
@jinchengchenghh
Copy link
Copy Markdown
Contributor

Thanks!
Please update the configs.rst

@wForget wForget changed the title Support spill write batch size limit feat: Support spill write batch size limit Jun 25, 2025
@wForget
Copy link
Copy Markdown
Contributor Author

wForget commented Jun 26, 2025

Thanks! Please update the configs.rst

Thank you, updated

@wForget
Copy link
Copy Markdown
Contributor Author

wForget commented Jun 30, 2025

@xiaoxmeng @majetideepak Could you please take a look?

@wForget
Copy link
Copy Markdown
Contributor Author

wForget commented Jul 8, 2025

Gentle ping @xiaoxmeng @majetideepak, sorry to bother you again, could you please review it if you have time?

Copy link
Copy Markdown
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wForget can you share the motivation behind this feature?
How would users configure both the buffer size and batch size in general?

@wForget
Copy link
Copy Markdown
Contributor Author

wForget commented Jul 8, 2025

@wForget can you share the motivation behind this feature?

My motivation is to make the size of the RowVector after deserializing SpillFile closer to this batch size. As observed in #13861, excessively large row sizes may lead to performance regression.

How would users configure both the buffer size and batch size in general?

The spill buffer will be flushed when either the total serialized size reaches spill_write_buffer_size or the number of buffered rows reaches spill_write_batch_size. The default value for spill_write_buffer_size is 4MB, and the default value for spill_write_batch_size is 4096 (Or should we let it be closer to preferred_output_batch_rows ?).

@stale
Copy link
Copy Markdown

stale Bot commented Oct 7, 2025

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale Bot added the stale label Oct 7, 2025

/// Specifies the batch rows size to buffer the serialized spill data before
/// write to storage system
uint64_t writeBatchSize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32_t?

@stale stale Bot removed the stale label Oct 7, 2025
@stale
Copy link
Copy Markdown

stale Bot commented Jan 5, 2026

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support spill write batch size limit

4 participants