-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14183: [C++] Improve select_k_unstable performance #12582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for baseline = 63e1acc and contender = 39002af84e35b725ba48f71f160bcffbbe2cefda. Results will be available as each benchmark for each run completes. |
|
Looks this PR is still WIP? |
39002af to
e12db21
Compare
|
Yes @cyb70289 I have different runs on my local. master branch with master branch with |
|
Should you close the older PR for the same issue? |
aocsa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks an interesting change in favor of better performance. I put some general comments, however just for clarity I would suggest to show better the benchmark results as well the improvement in numbers (%) and put some comments in which cases you are getting better/worse results. Moreover if these cases are localized I would suggest also to add new unit tests with comments around this.
| checked_cast<const ArrayType*>(chunks_[loc.chunk_index]), loc.index_in_chunk); | ||
| } | ||
|
|
||
| template <typename ArrayType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add some documentation to these internal but exposes functions (Resolve, ResolveChunkLocation).
| DCHECK_EQ(sorted[i].overall_end(), indices_begin + end_offset); | ||
| DCHECK_EQ(sorted[i].non_null_count() + sorted[i].null_count(), batch.num_rows()); | ||
| begin_offset = end_offset; | ||
| // XXX this is an upper bound on the true null count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove XXX
|
@AlvinJ15 , are you still working on this PR? |
|
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |










Improve select_k_unstable performance, for tables using radix sort for sort individual batches, and a merge sort with K first elements.