Skip to content

Conversation

@AlvinJ15
Copy link
Contributor

@AlvinJ15 AlvinJ15 commented Mar 8, 2022

Improve select_k_unstable performance, for tables using radix sort for sort individual batches, and a merge sort with K first elements.

@github-actions
Copy link

github-actions bot commented Mar 8, 2022

@AlvinJ15
Copy link
Contributor Author

AlvinJ15 commented Mar 8, 2022

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Mar 8, 2022

Benchmark runs are scheduled for baseline = 63e1acc and contender = 39002af84e35b725ba48f71f160bcffbbe2cefda. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Failed ⬇️1.07% ⬆️0.36%] ursa-i9-9960x
[Finished ⬇️0.81% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@cyb70289
Copy link
Contributor

cyb70289 commented Mar 9, 2022

Looks this PR is still WIP?
Do you have benchmark data on your local machine? Looks there's no obvious change from conbench result. Maybe we should add more benchmarks first?

@AlvinJ15 AlvinJ15 force-pushed the ARROW-14183_radix_merge_k branch from 39002af to e12db21 Compare March 9, 2022 05:46
@AlvinJ15
Copy link
Contributor Author

AlvinJ15 commented Mar 9, 2022

Yes @cyb70289 I have different runs on my local.
master branch with K=10
image
RadixBatch + mergeK
image

master branch with K=20
image
RadixBatch + mergeK
image

master branch with K=1000
image
RadixBatch + mergeK
image

master branch with K=10000
image
RadixBatch + mergeK
image

master branch with K=N/8
image
RadixBatch + mergeK
image

@pitrou
Copy link
Member

pitrou commented Mar 10, 2022

Should you close the older PR for the same issue?

Copy link
Contributor

@aocsa aocsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks an interesting change in favor of better performance. I put some general comments, however just for clarity I would suggest to show better the benchmark results as well the improvement in numbers (%) and put some comments in which cases you are getting better/worse results. Moreover if these cases are localized I would suggest also to add new unit tests with comments around this.

checked_cast<const ArrayType*>(chunks_[loc.chunk_index]), loc.index_in_chunk);
}

template <typename ArrayType>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add some documentation to these internal but exposes functions (Resolve, ResolveChunkLocation).

DCHECK_EQ(sorted[i].overall_end(), indices_begin + end_offset);
DCHECK_EQ(sorted[i].non_null_count() + sorted[i].null_count(), batch.num_rows());
begin_offset = end_offset;
// XXX this is an upper bound on the true null count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove XXX

@cyb70289
Copy link
Contributor

cyb70289 commented Jul 4, 2022

@AlvinJ15 , are you still working on this PR?

@AlvinJ15
Copy link
Contributor Author

AlvinJ15 commented Jul 6, 2022

@AlvinJ15 , are you still working on this PR?

@cyb70289 I will rebase and finish this PR for this week, thanks for the remainder

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants