Skip to content

Conversation

@SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Jan 28, 2025

Another part of #3637.

image

The improvements for small row counts should be quite substantial for streaming applications.

SimonHeybrock and others added 6 commits January 28, 2025 11:32
Previously the mappers computed (and allocated variables) containing the
output bin sizes. These were then used to compute (and allocate
variables for) output bin indices. As the sizes themselves are not
needed, we can skip this and directly setup the indices. This saves
around 20% when grouping few rows into O(1M) bins.
@SimonHeybrock
Copy link
Member Author

Note the failing builds are "fixed" in #3643 by removing the offending code.

Further reduce overhead in bin and group
@jl-wynen
Copy link
Member

Am I overlooking something or is this the same as #3643 plus removal of an unused function?

@SimonHeybrock
Copy link
Member Author

Am I overlooking something or is this the same as #3643 plus removal of an unused function?

#3643 was merged into this (thus: exclude last commit form review). But no, it does not just remove an unused function.

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Jan 30, 2025

Well, actually what I said above is wrong: This PR was optimizing a function that was removed by the follow-up PR. You are therefore correct that nothing happens here.

@SimonHeybrock SimonHeybrock merged commit a8ae6b1 into main Jan 30, 2025
4 checks passed
@SimonHeybrock SimonHeybrock deleted the speedup-bin-setup branch January 30, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants