Skip to content

Conversation

@SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Jan 28, 2025

Previously the mappers computed (and allocated variables containing) the output bin sizes. These were then used to compute (and allocate variables for) output bin indices. As the sizes themselves are not needed, we can skip this and directly setup the indices. This saves around 20% when grouping few rows into O(1M) bins.

We are now entering the territory of noise for my benchmark setup, but I think this improvement is still significant:
image

Fixes #3637 (in a sense, probably there are more gains to be made).

SimonHeybrock and others added 2 commits January 28, 2025 13:08
Previously the mappers computed (and allocated variables) containing the
output bin sizes. These were then used to compute (and allocate
variables for) output bin indices. As the sizes themselves are not
needed, we can skip this and directly setup the indices. This saves
around 20% when grouping few rows into O(1M) bins.
@SimonHeybrock SimonHeybrock merged commit e519aa5 into speedup-bin-setup Jan 30, 2025
4 checks passed
@SimonHeybrock SimonHeybrock deleted the setup-bin-indices-directly branch January 30, 2025 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants