-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8199: [C++] Add support for multi-column sort indices on Table #8612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cpp/src/arrow/compute/api_vector.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hunk is for SortToIndices() -> SortIndices() rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for order support on array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for order support on array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for order support on array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for order support on array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multi-column sorting on table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hunk is for renaming SortToIndices() -> SortIndices(). No logic change.
6b58fbb to
904ee1f
Compare
589ad3d to
2da0ea7
Compare
|
Also update kernel document? |
|
Done. |
|
@kou I'm okay with this patch. Some random thoughts:
|
|
Thanks for your comments!
Each column in table may have the different number of chunks. For example,
It's interesting! I've added the idea to follow-up tasks in the pull request description. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised by the approach adopted here.
I would expect the following approach:
- sort(chunked array) sorts each chunk individually, then merges all of them (see
std::inplace_merge) - sort(table) sorts each sort column in reverse order
21fb667 to
a500146
Compare
I've implemented.
I wanted this a follow-up task but I've implemented this in this pull request. Because both of them mentions this approach. My implementation Summary:
Description: N records: 1048576 Narrow range: Almost elements may be same Wide range: Almost elements will be different N chunks: 4 Narrow range:
Wide range:
N chunks: 32 Narrow range:
Wide range:
Can we mark the radix sort approach improvement a follow-up task? Or should we optimize the radix sort approach in this pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can anyone suggest better name...?
2a1a396 to
86ce536
Compare
|
@pitrou Could you review this again? |
|
Thanks for the numbers. I didn't expect "first sort key" to be better, but I'm convinced now. |
Summary:
* Deprecate SortToIndices()
* Add SortIndices() because we use "sort_indices" as kernel name in
apache#7240
* Add support for sort indices in descending order on Array
* Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support RecordBatch and
Table
Benchmark:
Summary:
* No performance regression in existing sort on array
* Same performance as sort on array when the number of sort keys is 1
* 1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
----------------------------------------------------------------------------------
Benchmark Time CPU
----------------------------------------------------------------------------------
SortToIndicesInt64Count/32768/10000/min_time:1.000 15685 ns 15682 ns
SortToIndicesInt64Count/32768/100/min_time:1.000 15961 ns 15957 ns
SortToIndicesInt64Count/32768/10/min_time:1.000 16473 ns 16469 ns
SortToIndicesInt64Count/32768/2/min_time:1.000 27993 ns 27987 ns
SortToIndicesInt64Count/32768/1/min_time:1.000 5609 ns 5608 ns
SortToIndicesInt64Count/32768/0/min_time:1.000 13143 ns 13141 ns
SortToIndicesInt64Count/1048576/1/min_time:1.000 134695 ns 134670 ns
SortToIndicesInt64Count/8388608/1/min_time:1.000 1243992 ns 1243260 ns
SortToIndicesInt64Compare/32768/10000/min_time:1.000 193632 ns 193595 ns
SortToIndicesInt64Compare/32768/100/min_time:1.000 194876 ns 194837 ns
SortToIndicesInt64Compare/32768/10/min_time:1.000 182362 ns 182324 ns
SortToIndicesInt64Compare/32768/2/min_time:1.000 111607 ns 111584 ns
SortToIndicesInt64Compare/32768/1/min_time:1.000 5642 ns 5641 ns
SortToIndicesInt64Compare/32768/0/min_time:1.000 190302 ns 190268 ns
SortToIndicesInt64Compare/1048576/1/min_time:1.000 134743 ns 134718 ns
SortToIndicesInt64Compare/8388608/1/min_time:1.000 1261404 ns 1249362 ns
After:
-------------------------------------------------------------------------------------
Benchmark Time CPU
-------------------------------------------------------------------------------------
ArraySortIndicesInt64Count/32768/10000/min_time:1.000 14769 ns 14766 ns
ArraySortIndicesInt64Count/32768/100/min_time:1.000 15207 ns 15204 ns
ArraySortIndicesInt64Count/32768/10/min_time:1.000 15892 ns 15889 ns
ArraySortIndicesInt64Count/32768/2/min_time:1.000 30107 ns 30100 ns
ArraySortIndicesInt64Count/32768/1/min_time:1.000 5509 ns 5508 ns
ArraySortIndicesInt64Count/32768/0/min_time:1.000 12699 ns 12696 ns
ArraySortIndicesInt64Count/1048576/1/min_time:1.000 132585 ns 132557 ns
ArraySortIndicesInt64Count/8388608/1/min_time:1.000 1236596 ns 1235842 ns
ArraySortIndicesInt64Compare/32768/10000/min_time:1.000 193259 ns 193217 ns
ArraySortIndicesInt64Compare/32768/100/min_time:1.000 190010 ns 189973 ns
ArraySortIndicesInt64Compare/32768/10/min_time:1.000 179923 ns 179879 ns
ArraySortIndicesInt64Compare/32768/2/min_time:1.000 111100 ns 111074 ns
ArraySortIndicesInt64Compare/32768/1/min_time:1.000 5660 ns 5659 ns
ArraySortIndicesInt64Compare/32768/0/min_time:1.000 186521 ns 186476 ns
ArraySortIndicesInt64Compare/1048576/1/min_time:1.000 132863 ns 132832 ns
ArraySortIndicesInt64Compare/8388608/1/min_time:1.000 1266383 ns 1265765 ns
TableSortIndicesInt64Count/32768/10000/min_time:1.000 21812 ns 21807 ns
TableSortIndicesInt64Count/32768/100/min_time:1.000 22494 ns 22490 ns
TableSortIndicesInt64Count/32768/10/min_time:1.000 17300 ns 17296 ns
TableSortIndicesInt64Count/32768/2/min_time:1.000 29927 ns 29921 ns
TableSortIndicesInt64Count/32768/1/min_time:1.000 5877 ns 5875 ns
TableSortIndicesInt64Count/32768/0/min_time:1.000 20394 ns 20390 ns
TableSortIndicesInt64Count/1048576/1/min_time:1.000 132904 ns 132871 ns
TableSortIndicesInt64Count/8388608/1/min_time:1.000 1342693 ns 1341943 ns
TableSortIndicesInt64Compare/32768/10000/min_time:1.000 203163 ns 203106 ns
TableSortIndicesInt64Compare/32768/100/min_time:1.000 199531 ns 199477 ns
TableSortIndicesInt64Compare/32768/10/min_time:1.000 185968 ns 185916 ns
TableSortIndicesInt64Compare/32768/2/min_time:1.000 113571 ns 113540 ns
TableSortIndicesInt64Compare/32768/1/min_time:1.000 6251 ns 6249 ns
TableSortIndicesInt64Compare/32768/0/min_time:1.000 183650 ns 183609 ns
TableSortIndicesInt64Compare/1048576/1/min_time:1.000 131701 ns 131674 ns
TableSortIndicesInt64Compare/8388608/1/min_time:1.000 1264413 ns 1263622 ns
TableSortIndicesInt64Int64/32768/10000/min_time:1.000 313368 ns 313310 ns
TableSortIndicesInt64Int64/32768/100/min_time:1.000 313425 ns 313361 ns
TableSortIndicesInt64Int64/32768/10/min_time:1.000 337051 ns 336987 ns
TableSortIndicesInt64Int64/32768/2/min_time:1.000 402063 ns 401973 ns
TableSortIndicesInt64Int64/32768/1/min_time:1.000 254660 ns 254612 ns
TableSortIndicesInt64Int64/32768/0/min_time:1.000 305887 ns 305815 ns
TableSortIndicesInt64Int64/1048576/1/min_time:1.000 11157920 ns 11155020 ns
TableSortIndicesInt64Int64/8388608/1/min_time:1.000 106529839 ns 106501576 ns
Follow-up tasks:
* Improve performance when the number of sort keys is 2 or greater
* Idea1: Use counting sort for small range data like on array
* Idea2: Use threading and merge sort
* Add support multi-column partition Nth indices on Table
2020-11-18T15:27:12+09:00
Running release/arrow-compute-vector-sort-benchmark
Run on (24 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x12)
L1 Instruction 32 KiB (x12)
L2 Unified 512 KiB (x12)
L3 Unified 16384 KiB (x4)
Load Average: 0.48, 0.50, 0.53
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 15722 ns 15721 ns 82919 bytes_per_second=1.94115G/s items_per_second=260.537M/s null_percent=0.01 size=32.768k
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16007 ns 16006 ns 89081 bytes_per_second=1.9066G/s items_per_second=255.899M/s null_percent=1 size=32.768k
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 15178 ns 15177 ns 91254 bytes_per_second=2.01071G/s items_per_second=269.873M/s null_percent=10 size=32.768k
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 29886 ns 29885 ns 47570 bytes_per_second=1045.67M/s items_per_second=137.058M/s null_percent=50 size=32.768k
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 5968 ns 5968 ns 235596 bytes_per_second=5.11372G/s items_per_second=686.352M/s null_percent=100 size=32.768k
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 12681 ns 12681 ns 108821 bytes_per_second=2.40665G/s items_per_second=323.015M/s null_percent=0 size=32.768k
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 813376 ns 813331 ns 1777 bytes_per_second=1.2007G/s items_per_second=161.155M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 6543874 ns 6543621 ns 207 bytes_per_second=1.19391G/s items_per_second=160.244M/s null_percent=1 size=8.38861M
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 189957 ns 189956 ns 7344 bytes_per_second=164.511M/s items_per_second=21.5628M/s null_percent=0.01 size=32.768k
ArraySortIndicesInt64Wide/32768/100/min_time:1.000 190269 ns 190267 ns 7346 bytes_per_second=164.243M/s items_per_second=21.5277M/s null_percent=1 size=32.768k
ArraySortIndicesInt64Wide/32768/10/min_time:1.000 175425 ns 175422 ns 7952 bytes_per_second=178.142M/s items_per_second=23.3494M/s null_percent=10 size=32.768k
ArraySortIndicesInt64Wide/32768/2/min_time:1.000 107976 ns 107975 ns 12837 bytes_per_second=289.418M/s items_per_second=37.9346M/s null_percent=50 size=32.768k
ArraySortIndicesInt64Wide/32768/1/min_time:1.000 5941 ns 5941 ns 235078 bytes_per_second=5.13666G/s items_per_second=689.431M/s null_percent=100 size=32.768k
ArraySortIndicesInt64Wide/32768/0/min_time:1.000 184858 ns 184857 ns 7542 bytes_per_second=169.049M/s items_per_second=22.1576M/s null_percent=0 size=32.768k
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9355194 ns 9354942 ns 150 bytes_per_second=106.895M/s items_per_second=14.011M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 94101796 ns 94099551 ns 15 bytes_per_second=85.0163M/s items_per_second=11.1433M/s null_percent=1 size=8.38861M
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns 1386176541 ns 1 items_per_second=756.452k/s
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 1350031141 ns 1349982623 ns 1 items_per_second=776.733k/s
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 1401741018 ns 1401632081 ns 1 items_per_second=748.111k/s
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 1373174328 ns 1373145968 ns 1 items_per_second=763.63k/s
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1035377805 ns 1035362806 ns 1 items_per_second=1012.76k/s
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1035218085 ns 1035201824 ns 1 items_per_second=1012.92k/s
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 6859319 ns 6859251 ns 201 items_per_second=152.87M/s
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 6847314 ns 6847048 ns 209 items_per_second=153.143M/s
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 626909516 ns 626904696 ns 2 items_per_second=1.67262M/s
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 591632035 ns 591602144 ns 2 items_per_second=1.77243M/s
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 613197155 ns 613182727 ns 2 items_per_second=1.71005M/s
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 595568690 ns 595554829 ns 2 items_per_second=1.76067M/s
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 424472984 ns 424453915 ns 3 items_per_second=2.47041M/s
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 397339109 ns 397335604 ns 4 items_per_second=2.63902M/s
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 7027310 ns 7027241 ns 195 items_per_second=149.216M/s
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 6891364 ns 6891272 ns 207 items_per_second=152.16M/s
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 516832749 ns 516823293 ns 3 items_per_second=2.02889M/s
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 495273237 ns 495269975 ns 3 items_per_second=2.11718M/s
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 515550360 ns 515531501 ns 3 items_per_second=2.03397M/s
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 496670125 ns 496663084 ns 3 items_per_second=2.11124M/s
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 340060863 ns 340053172 ns 4 items_per_second=3.08356M/s
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 331281499 ns 331277004 ns 4 items_per_second=3.16525M/s
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 6691062 ns 6690924 ns 212 items_per_second=156.716M/s
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6384382 ns 6384323 ns 221 items_per_second=164.242M/s
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 622544467 ns 622531557 ns 2 items_per_second=1.68437M/s
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 619193843 ns 619155597 ns 2 items_per_second=1.69356M/s
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 615885889 ns 615884764 ns 2 items_per_second=1.70255M/s
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 589917731 ns 589912005 ns 2 items_per_second=1.77751M/s
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 600951149 ns 600947256 ns 2 items_per_second=1.74487M/s
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 592304437 ns 592286953 ns 2 items_per_second=1.77039M/s
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 98781239 ns 98777123 ns 14 items_per_second=10.6156M/s
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 94136230 ns 94085863 ns 15 items_per_second=11.1449M/s
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 232323308 ns 232319347 ns 6 items_per_second=4.51351M/s
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 223708791 ns 223700834 ns 6 items_per_second=4.6874M/s
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 221975360 ns 221968035 ns 6 items_per_second=4.724M/s
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 218214843 ns 218210306 ns 6 items_per_second=4.80535M/s
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 224756430 ns 224745751 ns 6 items_per_second=4.66561M/s
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 220809969 ns 220809044 ns 6 items_per_second=4.74879M/s
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 96159427 ns 96156899 ns 14 items_per_second=10.9048M/s
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 92307749 ns 92304526 ns 15 items_per_second=11.36M/s
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 166390841 ns 166382427 ns 8 items_per_second=6.3022M/s
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 164576492 ns 164570581 ns 9 items_per_second=6.37159M/s
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 165724919 ns 165718584 ns 8 items_per_second=6.32745M/s
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 164048003 ns 164046222 ns 8 items_per_second=6.39195M/s
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 170131788 ns 170124950 ns 8 items_per_second=6.16356M/s
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 170874129 ns 170871245 ns 8 items_per_second=6.13664M/s
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 92314674 ns 92312953 ns 15 items_per_second=11.3589M/s
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 92023019 ns 92022643 ns 15 items_per_second=11.3948M/s
Chunked array sorter implementation is naive.
Radix sort based table sorter is slower than the existing table
sorter. So this is disabled for now.
Benchmark Time CPU
------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 16481 ns 16481 ns
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16782 ns 16782 ns
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 16604 ns 16604 ns
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 28438 ns 28438 ns
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 6064 ns 6064 ns
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 13949 ns 13949 ns
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 739131 ns 739116 ns
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 7270177 ns 7270007 ns
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 197500 ns 197496 ns
ArraySortIndicesInt64Wide/32768/100/min_time:1.000 198485 ns 198481 ns
ArraySortIndicesInt64Wide/32768/10/min_time:1.000 184462 ns 184460 ns
ArraySortIndicesInt64Wide/32768/2/min_time:1.000 114283 ns 114282 ns
ArraySortIndicesInt64Wide/32768/1/min_time:1.000 6117 ns 6116 ns
ArraySortIndicesInt64Wide/32768/0/min_time:1.000 194275 ns 194267 ns
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9688010 ns 9687762 ns
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 98482097 ns 98480482 ns
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 13674075901 ns 13673756372
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 13562068073 ns 13561526021
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 6816877843 ns 6816737232 ns
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 6513599980 ns 6513470119 ns
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1297496469 ns 1297401147 ns
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1198835057 ns 1198795471 ns
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 512438066 ns 512427916 ns
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 406987001 ns 406983020 ns
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 4782555568 ns 4782501462 ns
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 4627874931 ns 4627776462 ns
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 2347109491 ns 2347030723 ns
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 2353793768 ns 2353762906 ns
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 329895435 ns 329878554 ns
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 318651531 ns 318643264 ns
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 27022014 ns 27020904 ns
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 21620126 ns 21619471 ns
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 2082161247 ns 2082123210 ns
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 2125744136 ns 2125688346 ns
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 1054223608 ns 1054187702 ns
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 1039334371 ns 1039300540 ns
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 226713390 ns 226708185 ns
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 232277655 ns 232265803 ns
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 7317963 ns 7317641 ns
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6936323 ns 6935778 ns
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 15478483751 ns 15477778551 ns
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 15323045709 ns 15322716370 ns
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 7600326817 ns 7600060885 ns
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 7305830182 ns 7305653322 ns
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 1603439764 ns 1603419271 ns
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 1585765386 ns 1585720827 ns
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 848231188 ns 848213371 ns
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 599883956 ns 599827161 ns
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 6105580749 ns 6105405315 ns
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 6101922415 ns 6092946808 ns
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 3016793628 ns 3013968137 ns
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 2989364923 ns 2989345424 ns
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 595413434 ns 595398793 ns
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 593904971 ns 593872005 ns
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 126583175 ns 126578734 ns
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 115004382 ns 115003229 ns
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 3013364645 ns 3013319384 ns
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 3032564737 ns 3032430757 ns
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 1457310735 ns 1457286492 ns
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 1465821557 ns 1465809354 ns
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 335816200 ns 335812614 ns
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 330472127 ns 330467667 ns
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 103267823 ns 103262409 ns
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 102048116 ns 102044125 ns
Diff:
Range/N records/Inverse null probability/N sort keys/N chunks
Narrow/1048576/100/16/32/min_time:1.000
FromTheBeginning: 1386231938 ns
Radix: 13674075901 ns
Narrow/1048576/0/16/32/min_time:1.000
FromTheBeginning: 1350031141 ns
Radix: 13562068073 ns
Narrow/1048576/100/8/32/min_time:1.000
FromTheBeginning: 1401741018 ns
Radix: 6816877843 ns
Narrow/1048576/0/8/32/min_time:1.000
FromTheBeginning: 1373174328 ns
Radix: 6513599980 ns
Narrow/1048576/100/2/32/min_time:1.000
FromTheBeginning: 1035377805 ns
Radix: 1297496469 ns
Narrow/1048576/0/2/32/min_time:1.000
FromTheBeginning: 1035218085 ns
Radix: 1198835057 ns
Narrow/1048576/100/1/32/min_time:1.000
FromTheBeginning: 6859319 ns
Radix: 512438066 ns
Narrow/1048576/0/1/32/min_time:1.000
FromTheBeginning: 6847314 ns
Radix: 406987001 ns
Narrow/1048576/100/16/4/min_time:1.000
FromTheBeginning: 626909516 ns
Radix: 4782555568 ns
Narrow/1048576/0/16/4/min_time:1.000
FromTheBeginning: 591632035 ns
Radix: 4627874931 ns
Narrow/1048576/100/8/4/min_time:1.000
FromTheBeginning: 613197155 ns
Radix: 2347109491 ns
Narrow/1048576/0/8/4/min_time:1.000
FromTheBeginning: 595568690 ns
Radix: 2353793768 ns
Narrow/1048576/100/2/4/min_time:1.000
FromTheBeginning: 424472984 ns
Radix: 329895435 ns
Narrow/1048576/0/2/4/min_time:1.000
FromTheBeginning: 397339109 ns
Radix: 318651531 ns
Narrow/1048576/100/1/4/min_time:1.000
FromTheBeginning: 7027310 ns
Radix: 27022014 ns
Narrow/1048576/0/1/4/min_time:1.000
FromTheBeginning: 6891364 ns
Radix: 21620126 ns
Narrow/1048576/100/16/1/min_time:1.000
FromTheBeginning: 516832749 ns
Radix: 2082161247 ns
Narrow/1048576/0/16/1/min_time:1.000
FromTheBeginning: 495273237 ns
Radix: 2125744136 ns
Narrow/1048576/100/8/1/min_time:1.000
FromTheBeginning: 515550360 ns
Radix: 1054223608 ns
Narrow/1048576/0/8/1/min_time:1.000
FromTheBeginning: 496670125 ns
Radix: 1039334371 ns
Narrow/1048576/100/2/1/min_time:1.000
FromTheBeginning: 340060863 ns
Radix: 226713390 ns
Narrow/1048576/0/2/1/min_time:1.000
FromTheBeginning: 331281499 ns
Radix: 232277655 ns
Narrow/1048576/100/1/1/min_time:1.000
FromTheBeginning: 6691062 ns
Radix: 7317963 ns
Narrow/1048576/0/1/1/min_time:1.000
FromTheBeginning: 6384382 ns
Radix: 6936323 ns
Wide/1048576/100/16/32/min_time:1.000
FromTheBeginning: 622544467 ns
Radix: 15478483751 ns
Wide/1048576/0/16/32/min_time:1.000
FromTheBeginning: 619193843 ns
Radix: 15323045709 ns
Wide/1048576/100/8/32/min_time:1.000
FromTheBeginning: 615885889 ns
Radix: 7600326817 ns
Wide/1048576/0/8/32/min_time:1.000
FromTheBeginning: 589917731 ns
Radix: 7305830182 ns
Wide/1048576/100/2/32/min_time:1.000
FromTheBeginning: 600951149 ns
Radix: 1603439764 ns
Wide/1048576/0/2/32/min_time:1.000
FromTheBeginning: 592304437 ns
Radix: 1585765386 ns
Wide/1048576/100/1/32/min_time:1.000
FromTheBeginning: 98781239 ns
Radix: 848231188 ns
Wide/1048576/0/1/32/min_time:1.000
FromTheBeginning: 94136230 ns
Radix: 599883956 ns
Wide/1048576/100/16/4/min_time:1.000
FromTheBeginning: 232323308 ns
Radix: 6105580749 ns
Wide/1048576/0/16/4/min_time:1.000
FromTheBeginning: 223708791 ns
Radix: 6101922415 ns
Wide/1048576/100/8/4/min_time:1.000
FromTheBeginning: 221975360 ns
Radix: 3016793628 ns
Wide/1048576/0/8/4/min_time:1.000
FromTheBeginning: 218214843 ns
Radix: 2989364923 ns
Wide/1048576/100/2/4/min_time:1.000
FromTheBeginning: 224756430 ns
Radix: 595413434 ns
Wide/1048576/0/2/4/min_time:1.000
FromTheBeginning: 220809969 ns
Radix: 593904971 ns
Wide/1048576/100/1/4/min_time:1.000
FromTheBeginning: 96159427 ns
Radix: 126583175 ns
Wide/1048576/0/1/4/min_time:1.000
FromTheBeginning: 92307749 ns
Radix: 115004382 ns
Wide/1048576/100/16/1/min_time:1.000
FromTheBeginning: 166390841 ns
Radix: 3013364645 ns
Wide/1048576/0/16/1/min_time:1.000
FromTheBeginning: 164576492 ns
Radix: 3032564737 ns
Wide/1048576/100/8/1/min_time:1.000
FromTheBeginning: 165724919 ns
Radix: 1457310735 ns
Wide/1048576/0/8/1/min_time:1.000
FromTheBeginning: 164048003 ns
Radix: 1465821557 ns
Wide/1048576/100/2/1/min_time:1.000
FromTheBeginning: 170131788 ns
Radix: 335816200 ns
Wide/1048576/0/2/1/min_time:1.000
FromTheBeginning: 170874129 ns
Radix: 330472127 ns
Wide/1048576/100/1/1/min_time:1.000
FromTheBeginning: 92314674 ns
Radix: 103267823 ns
Wide/1048576/0/1/1/min_time:1.000
FromTheBeginning: 92023019 ns
Radix: 102048116 ns
Add some chunked array sorting benchmarks
e50e471 to
fecd557
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thanks a lot @kou, very nice work. I've pushed some updates and will merge.
Summary:
Deprecate SortToIndices()
Add SortIndices() because we use "sort_indices" as kernel name in
ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework #7240
Add support for sort indices in descending order on Array
Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support ChunkedArray,
RecordBatch and Table
Require benchmark 1.5.2 or later
Benchmark:
Summary:
No performance regression in existing sort on array
Same performance as sort on array when the number of sort keys is 1
1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
After:
Follow-up tasks:
Improve performance when the number of sort keys is 2 or greater
Idea1: Use counting sort for small range data like on array
Idea2: Use threading for radix sort
Add support for multi-column partition Nth indices on Table