Skip to content

Conversation

@duanmeng
Copy link
Collaborator

@duanmeng duanmeng commented Oct 14, 2025

Add a NonMaterializedSortBuffer, which materializes only the sort key columns
and additional index columns into a RowContainer for sorting. After the sort, it
uses these indices to gather and copy the corresponding rows from the original
input vectors into the output vector. This reduces the overhead of materializing
payload columns into the RowContainer, which is especially beneficial for wide
table scenarios. Support for spilling will be added in a follow-up PR.

@netlify
Copy link

netlify bot commented Oct 14, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 543a18f
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/691fc8a813fa2100085456bf

@duanmeng duanmeng marked this pull request as draft October 14, 2025 12:27
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2025
@duanmeng duanmeng changed the title init feat: Hybrid SortBuffer Oct 14, 2025
@duanmeng duanmeng requested a review from xiaoxmeng October 14, 2025 15:26
@duanmeng duanmeng force-pushed the refactor branch 17 times, most recently from 27dc9b0 to ef59320 Compare October 24, 2025 08:33
@duanmeng duanmeng force-pushed the refactor branch 3 times, most recently from 16f988c to 606779e Compare November 16, 2025 01:28
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng LGTM % comments. Thanks!


int32_t outputRow = 0;
int32_t outputSize = 0;
bool isEndOfBatch = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/isEndOfBatch/endOfBatch/

int32_t outputRow = 0;
int32_t outputSize = 0;
bool isEndOfBatch = false;
while (outputRow + outputSize < output_->size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/outputSize/bufferedOutputRows/


void HybridSortBuffer::prepareOutputWithSpill() {
VELOX_CHECK(hasSpilled());
if (inputSpiller_ != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if () {
...
return;
}

...

outputSpillPartitionSet_.begin()->second->createUnorderedReader(
spillConfig_->readBufferSize, pool(), spillStats_);
}
outputSpillPartitionSet_.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belong to the else condition?

@duanmeng duanmeng changed the title feat: Hybrid SortBuffer feat: Add NonMaterializedSortBuffer Nov 20, 2025
@duanmeng duanmeng marked this pull request as ready for review November 20, 2025 04:10
@duanmeng duanmeng removed the request for review from majetideepak November 20, 2025 04:43
@duanmeng duanmeng force-pushed the refactor branch 3 times, most recently from 468ca20 to 027a584 Compare November 20, 2025 07:49
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng LGTM % minors. Thanks!

@duanmeng duanmeng force-pushed the refactor branch 7 times, most recently from a2704bc to 0b7b165 Compare November 20, 2025 13:59
@aditi-pandit
Copy link
Collaborator

@duanmeng : Thanks for this PR. Please can you explain the motivation for this change.

@duanmeng
Copy link
Collaborator Author

@duanmeng : Thanks for this PR. Please can you explain the motivation for this change.

@aditi-pandit Hi Aditi, the NonMaterializedSortBuffer materializes only the sort key columns and additional index columns into a RowContainer for sorting. After the sort, it uses these indices to gather and copy the corresponding rows from the original input vectors into the output vector. This reduces the overhead of materializing payload columns into the RowContainer, which is especially beneficial for wide table scenarios. Support for spilling will be added in a follow-up PR.

@aditi-pandit
Copy link
Collaborator

@duanmeng : Thanks for this PR. Please can you explain the motivation for this change.

@aditi-pandit Hi Aditi, the NonMaterializedSortBuffer materializes only the sort key columns and additional index columns into a RowContainer for sorting. After the sort, it uses these indices to gather and copy the corresponding rows from the original input vectors into the output vector. This reduces the overhead of materializing payload columns into the RowContainer, which is especially beneficial for wide table scenarios. Support for spilling will be added in a follow-up PR.

Thanks for your explanation.

I got the infrastructure pieces. Which operators are we planning to enhance with this ? Do you have some early results to share ?

@duanmeng duanmeng marked this pull request as draft November 25, 2025 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants