ARROW-14183: [C++] Improve select_k_unstable performance #12582

AlvinJ15 · 2022-03-08T07:16:01Z

Improve select_k_unstable performance, for tables using radix sort for sort individual batches, and a merge sort with K first elements.

github-actions · 2022-03-08T07:16:27Z

https://issues.apache.org/jira/browse/ARROW-14183

AlvinJ15 · 2022-03-08T07:17:38Z

@ursabot please benchmark

ursabot · 2022-03-08T07:17:42Z

Benchmark runs are scheduled for baseline = 63e1acc and contender = 39002af84e35b725ba48f71f160bcffbbe2cefda. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Failed ⬇️1.07% ⬆️0.36%] ursa-i9-9960x
[Finished ⬇️0.81% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

cyb70289 · 2022-03-09T03:07:57Z

Looks this PR is still WIP?
Do you have benchmark data on your local machine? Looks there's no obvious change from conbench result. Maybe we should add more benchmarks first?

AlvinJ15 · 2022-03-09T07:44:33Z

Yes @cyb70289 I have different runs on my local.
master branch with K=10

RadixBatch + mergeK

master branch with K=20

RadixBatch + mergeK

master branch with K=1000

RadixBatch + mergeK

master branch with K=10000

RadixBatch + mergeK

master branch with K=N/8

RadixBatch + mergeK

pitrou · 2022-03-10T10:49:03Z

Should you close the older PR for the same issue?

aocsa

This PR looks an interesting change in favor of better performance. I put some general comments, however just for clarity I would suggest to show better the benchmark results as well the improvement in numbers (%) and put some comments in which cases you are getting better/worse results. Moreover if these cases are localized I would suggest also to add new unit tests with comments around this.

aocsa · 2022-03-14T16:21:51Z

cpp/src/arrow/compute/kernels/chunked_internal.h

+        checked_cast<const ArrayType*>(chunks_[loc.chunk_index]), loc.index_in_chunk);
+  }
+
+  template <typename ArrayType>


I would suggest to add some documentation to these internal but exposes functions (Resolve, ResolveChunkLocation).

aocsa · 2022-03-14T16:23:50Z

cpp/src/arrow/compute/kernels/vector_sort.cc

+      DCHECK_EQ(sorted[i].overall_end(), indices_begin + end_offset);
+      DCHECK_EQ(sorted[i].non_null_count() + sorted[i].null_count(), batch.num_rows());
+      begin_offset = end_offset;
+      // XXX this is an upper bound on the true null count


cyb70289 · 2022-07-04T04:18:18Z

@AlvinJ15 , are you still working on this PR?

AlvinJ15 · 2022-07-06T05:22:46Z

@AlvinJ15 , are you still working on this PR?

@cyb70289 I will rebase and finish this PR for this week, thanks for the remainder

amol- · 2023-03-30T17:16:34Z

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

github-actions bot added the Component: C++ label Mar 8, 2022

ARROW-14183: [C++] Improve select_k_unstable performance

e12db21

AlvinJ15 force-pushed the ARROW-14183_radix_merge_k branch from 39002af to e12db21 Compare March 9, 2022 05:46

AlvinJ15 mentioned this pull request Mar 14, 2022

ARROW-14183: [C++] Improve select_k_unstable performance #12164

Closed

aocsa suggested changes Mar 14, 2022

View reviewed changes

asfimport mentioned this pull request Oct 4, 2022

[C++] Improve select_k_unstable performance #29769

Open

amol- closed this Mar 30, 2023

ARROW-14183: [C++] Improve select_k_unstable performance #12582

ARROW-14183: [C++] Improve select_k_unstable performance #12582

Uh oh!

Conversation

AlvinJ15 commented Mar 8, 2022

Uh oh!

github-actions bot commented Mar 8, 2022

Uh oh!

AlvinJ15 commented Mar 8, 2022

Uh oh!

ursabot commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyb70289 commented Mar 9, 2022

Uh oh!

AlvinJ15 commented Mar 9, 2022

Uh oh!

pitrou commented Mar 10, 2022

Uh oh!

aocsa left a comment

Choose a reason for hiding this comment

Uh oh!

aocsa Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

aocsa Mar 14, 2022

Choose a reason for hiding this comment

Uh oh!

cyb70289 commented Jul 4, 2022

Uh oh!

AlvinJ15 commented Jul 6, 2022

Uh oh!

amol- commented Mar 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ursabot commented Mar 8, 2022 •

edited

Loading