Skip to content

Conversation

@kou
Copy link
Member

@kou kou commented Nov 9, 2020

Summary:

Benchmark:

Summary:

  • No performance regression in existing sort on array

  • Same performance as sort on array when the number of sort keys is 1

  • 1.5x-100x slower than sort on array when the number of sort keys is 2

Before:

----------------------------------------------------------------------------------
Benchmark                                                     Time             CPU
----------------------------------------------------------------------------------
SortToIndicesInt64Count/32768/10000/min_time:1.000        16057 ns        16054 ns
SortToIndicesInt64Count/32768/100/min_time:1.000          16592 ns        16589 ns
SortToIndicesInt64Count/32768/10/min_time:1.000           15979 ns        15975 ns
SortToIndicesInt64Count/32768/2/min_time:1.000            28379 ns        28372 ns
SortToIndicesInt64Count/32768/1/min_time:1.000             5767 ns         5766 ns
SortToIndicesInt64Count/32768/0/min_time:1.000            12560 ns        12557 ns
SortToIndicesInt64Count/1048576/100/min_time:1.000       729683 ns       729497 ns
SortToIndicesInt64Count/8388608/100/min_time:1.000      6696415 ns      6693711 ns
SortToIndicesInt64Compare/32768/10000/min_time:1.000     190488 ns       190442 ns
SortToIndicesInt64Compare/32768/100/min_time:1.000       190879 ns       190838 ns
SortToIndicesInt64Compare/32768/10/min_time:1.000        179156 ns       179115 ns
SortToIndicesInt64Compare/32768/2/min_time:1.000         109098 ns       109074 ns
SortToIndicesInt64Compare/32768/1/min_time:1.000           5682 ns         5681 ns
SortToIndicesInt64Compare/32768/0/min_time:1.000         185022 ns       184984 ns
SortToIndicesInt64Compare/1048576/100/min_time:1.000    9275270 ns      9273414 ns
SortToIndicesInt64Compare/8388608/100/min_time:1.000   93126006 ns     93095276 ns

After:

------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU
------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000            15722 ns        15721 ns
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000              16007 ns        16006 ns
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000               15178 ns        15177 ns
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000                29886 ns        29885 ns
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000                 5968 ns         5968 ns
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000                12681 ns        12681 ns
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000           813376 ns       813331 ns
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000          6543874 ns      6543621 ns
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000             189957 ns       189956 ns
ArraySortIndicesInt64Wide/32768/100/min_time:1.000               190269 ns       190267 ns
ArraySortIndicesInt64Wide/32768/10/min_time:1.000                175425 ns       175422 ns
ArraySortIndicesInt64Wide/32768/2/min_time:1.000                 107976 ns       107975 ns
ArraySortIndicesInt64Wide/32768/1/min_time:1.000                   5941 ns         5941 ns
ArraySortIndicesInt64Wide/32768/0/min_time:1.000                 184858 ns       184857 ns
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000            9355194 ns      9354942 ns
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000           94101796 ns     94099551 ns
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns   1386176541 ns
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000   1350031141 ns   1349982623 ns
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000  1401741018 ns   1401632081 ns
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000    1373174328 ns   1373145968 ns
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000  1035377805 ns   1035362806 ns
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000    1035218085 ns   1035201824 ns
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000     6859319 ns      6859251 ns
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000       6847314 ns      6847048 ns
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000   626909516 ns    626904696 ns
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000     591632035 ns    591602144 ns
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000    613197155 ns    613182727 ns
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000      595568690 ns    595554829 ns
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000    424472984 ns    424453915 ns
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000      397339109 ns    397335604 ns
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000      7027310 ns      7027241 ns
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000        6891364 ns      6891272 ns
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000   516832749 ns    516823293 ns
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000     495273237 ns    495269975 ns
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000    515550360 ns    515531501 ns
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000      496670125 ns    496663084 ns
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000    340060863 ns    340053172 ns
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000      331281499 ns    331277004 ns
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000      6691062 ns      6690924 ns
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000        6384382 ns      6384323 ns
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000    622544467 ns    622531557 ns
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000      619193843 ns    619155597 ns
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000     615885889 ns    615884764 ns
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000       589917731 ns    589912005 ns
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000     600951149 ns    600947256 ns
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000       592304437 ns    592286953 ns
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000      98781239 ns     98777123 ns
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000        94136230 ns     94085863 ns
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000     232323308 ns    232319347 ns
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000       223708791 ns    223700834 ns
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000      221975360 ns    221968035 ns
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000        218214843 ns    218210306 ns
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000      224756430 ns    224745751 ns
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000        220809969 ns    220809044 ns
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000       96159427 ns     96156899 ns
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000         92307749 ns     92304526 ns
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000     166390841 ns    166382427 ns
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000       164576492 ns    164570581 ns
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000      165724919 ns    165718584 ns
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000        164048003 ns    164046222 ns
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000      170131788 ns    170124950 ns
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000        170874129 ns    170871245 ns
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000       92314674 ns     92312953 ns
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000         92023019 ns     92022643 ns

Follow-up tasks:

  • Improve performance when the number of sort keys is 2 or greater

    • Idea1: Use counting sort for small range data like on array

    • Idea2: Use threading for radix sort

  • Add support for multi-column partition Nth indices on Table

@github-actions
Copy link

github-actions bot commented Nov 9, 2020

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk is for SortToIndices() -> SortIndices() rename.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for order support on array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for order support on array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array prefix is added to distinguish array and table related codes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array prefix is added to distinguish array and table related codes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for order support on array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for order support on array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array prefix is added to distinguish array and table related codes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-column sorting on table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hunk is for renaming SortToIndices() -> SortIndices(). No logic change.

@kou kou force-pushed the cpp-compute-sort-indices-table branch from 6b58fbb to 904ee1f Compare November 9, 2020 03:52
@kou kou force-pushed the cpp-compute-sort-indices-table branch from 589ad3d to 2da0ea7 Compare November 11, 2020 00:45
@kou
Copy link
Member Author

kou commented Nov 11, 2020

@cyb70289 @pitrou Do you want to review this?

@cyb70289
Copy link
Contributor

Also update kernel document? docs/source/cpp/compute.rst

@kou
Copy link
Member Author

kou commented Nov 11, 2020

Done.

@cyb70289
Copy link
Contributor

@kou I'm okay with this patch.
As you listed in follow up tasks, sorting arrays separately and merging afterwards should be faster. And I think there are other chances to improve performance.

Some random thoughts:

  • Looks you are returning a flat index array, does it make sense to return array of tuple (chunk_index, offset_in_chunk)? Maybe easier for client code to use?
  • For multi column sorting, in one iteration, current code compares values column by column till first non-equal found. I don't know if a radix sort approach is better, e.g. sort by 2nd-order column first, then sort by 1st-order column. It may be possible to leverage existing array based sorting code(counting sort, etc).

@kou
Copy link
Member Author

kou commented Nov 11, 2020

Thanks for your comments!

Looks you are returning a flat index array, does it make sense to return array of tuple (chunk_index, offset_in_chunk)? Maybe easier for client code to use?

Each column in table may have the different number of chunks. For example, {"a": [[1, 2], [3], [4, 5, 6]], "b": [[1, 2, 3], [4, 5, 6]]}. If we want to return chunk_index and offset_in_chunk, we need to return them for each column. It may decrease performance.

For multi column sorting, in one iteration, current code compares values column by column till first non-equal found. I don't know if a radix sort approach is better, e.g. sort by 2nd-order column first, then sort by 1st-order column. It may be possible to leverage existing array based sorting code(counting sort, etc).

It's interesting! I've added the idea to follow-up tasks in the pull request description.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised by the approach adopted here.
I would expect the following approach:

  • sort(chunked array) sorts each chunk individually, then merges all of them (see std::inplace_merge)
  • sort(table) sorts each sort column in reverse order

@kou kou force-pushed the cpp-compute-sort-indices-table branch from 21fb667 to a500146 Compare November 19, 2020 05:51
@kou
Copy link
Member Author

kou commented Nov 19, 2020

* sort(chunked array) sorts each chunk individually, then merges all of them (see `std::inplace_merge`)

I've implemented.
See also: #8612 (comment)

* sort(table) sorts each sort column in reverse order

I wanted this a follow-up task but I've implemented this in this pull request. Because both of them mentions this approach.

My implementation TableRadixSorter is naive but this approach is slower than the first sort key based approach:

Summary:

  • The first sort key approach is 1.1x-27.2x faster than the radix sort approach except narrow range and 2 sort keys case.

  • The first sort key approach doesn't depend on the number of sort keys in wide range data.

  • The first sort key approach slowed down when the number of sort keys is increased in narrow range data.

  • The radix sort approach always slowed down when the number of sort keys is increased.

    • Because the radix sort approach always sorts all sort keys.

Description:

N records: 1048576

Narrow range: Almost elements may be same

Wide range: Almost elements will be different


N chunks: 4

Narrow range:

N sort keys 2 8 16
Radix 318651531ns 2353793768ns 4627874931ns
First sort key 397339109ns 595568690ns 591632035ns
Radix/First sort key 0.8 3.9 7.8

Wide range:

N sort keys 2 8 16
Radix 593904971ns 2989364923ns 6101922415ns
First sort key 220809969ns 218214843ns 223708791ns
Radix/First sort key 2.6 13.6 27.2

N chunks: 32

Narrow range:

N sort keys 2 8 16
Radix 1198835057ns 6513599980ns 13562068073ns
First sort key 1035218085ns 1373174328ns 1350031141ns
Radix/First sort key 1.1 4.7 10.0

Wide range:

N sort keys 2 8 16
Radix 1585765386ns 7305830182ns 15323045709ns
First sort key 592304437ns 589917731ns 619193843ns
Radix/First sort key 2.6 12.3 24.7

Can we mark the radix sort approach improvement a follow-up task? Or should we optimize the radix sort approach in this pull request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can anyone suggest better name...?

@kou kou force-pushed the cpp-compute-sort-indices-table branch from 2a1a396 to 86ce536 Compare November 24, 2020 04:38
@kou
Copy link
Member Author

kou commented Nov 24, 2020

@pitrou Could you review this again?

@pitrou
Copy link
Member

pitrou commented Nov 24, 2020

Thanks for the numbers. I didn't expect "first sort key" to be better, but I'm convinced now.

kou added 13 commits November 24, 2020 16:18
Summary:

  * Deprecate SortToIndices()

  * Add SortIndices() because we use "sort_indices" as kernel name in
    apache#7240

  * Add support for sort indices in descending order on Array

  * Rename existing "sort_indices" kernel to "array_sort_indices" and
    introduce "sort_indices" meta function to support RecordBatch and
    Table

Benchmark:

Summary:

  * No performance regression in existing sort on array

  * Same performance as sort on array when the number of sort keys is 1

  * 1.5x-100x slower than sort on array when the number of sort keys is 2

Before:

    ----------------------------------------------------------------------------------
    Benchmark                                                     Time             CPU
    ----------------------------------------------------------------------------------
    SortToIndicesInt64Count/32768/10000/min_time:1.000        15685 ns        15682 ns
    SortToIndicesInt64Count/32768/100/min_time:1.000          15961 ns        15957 ns
    SortToIndicesInt64Count/32768/10/min_time:1.000           16473 ns        16469 ns
    SortToIndicesInt64Count/32768/2/min_time:1.000            27993 ns        27987 ns
    SortToIndicesInt64Count/32768/1/min_time:1.000             5609 ns         5608 ns
    SortToIndicesInt64Count/32768/0/min_time:1.000            13143 ns        13141 ns
    SortToIndicesInt64Count/1048576/1/min_time:1.000         134695 ns       134670 ns
    SortToIndicesInt64Count/8388608/1/min_time:1.000        1243992 ns      1243260 ns
    SortToIndicesInt64Compare/32768/10000/min_time:1.000     193632 ns       193595 ns
    SortToIndicesInt64Compare/32768/100/min_time:1.000       194876 ns       194837 ns
    SortToIndicesInt64Compare/32768/10/min_time:1.000        182362 ns       182324 ns
    SortToIndicesInt64Compare/32768/2/min_time:1.000         111607 ns       111584 ns
    SortToIndicesInt64Compare/32768/1/min_time:1.000           5642 ns         5641 ns
    SortToIndicesInt64Compare/32768/0/min_time:1.000         190302 ns       190268 ns
    SortToIndicesInt64Compare/1048576/1/min_time:1.000       134743 ns       134718 ns
    SortToIndicesInt64Compare/8388608/1/min_time:1.000      1261404 ns      1249362 ns

After:

    -------------------------------------------------------------------------------------
    Benchmark                                                        Time             CPU
    -------------------------------------------------------------------------------------
    ArraySortIndicesInt64Count/32768/10000/min_time:1.000        14769 ns        14766 ns
    ArraySortIndicesInt64Count/32768/100/min_time:1.000          15207 ns        15204 ns
    ArraySortIndicesInt64Count/32768/10/min_time:1.000           15892 ns        15889 ns
    ArraySortIndicesInt64Count/32768/2/min_time:1.000            30107 ns        30100 ns
    ArraySortIndicesInt64Count/32768/1/min_time:1.000             5509 ns         5508 ns
    ArraySortIndicesInt64Count/32768/0/min_time:1.000            12699 ns        12696 ns
    ArraySortIndicesInt64Count/1048576/1/min_time:1.000         132585 ns       132557 ns
    ArraySortIndicesInt64Count/8388608/1/min_time:1.000        1236596 ns      1235842 ns
    ArraySortIndicesInt64Compare/32768/10000/min_time:1.000     193259 ns       193217 ns
    ArraySortIndicesInt64Compare/32768/100/min_time:1.000       190010 ns       189973 ns
    ArraySortIndicesInt64Compare/32768/10/min_time:1.000        179923 ns       179879 ns
    ArraySortIndicesInt64Compare/32768/2/min_time:1.000         111100 ns       111074 ns
    ArraySortIndicesInt64Compare/32768/1/min_time:1.000           5660 ns         5659 ns
    ArraySortIndicesInt64Compare/32768/0/min_time:1.000         186521 ns       186476 ns
    ArraySortIndicesInt64Compare/1048576/1/min_time:1.000       132863 ns       132832 ns
    ArraySortIndicesInt64Compare/8388608/1/min_time:1.000      1266383 ns      1265765 ns
    TableSortIndicesInt64Count/32768/10000/min_time:1.000        21812 ns        21807 ns
    TableSortIndicesInt64Count/32768/100/min_time:1.000          22494 ns        22490 ns
    TableSortIndicesInt64Count/32768/10/min_time:1.000           17300 ns        17296 ns
    TableSortIndicesInt64Count/32768/2/min_time:1.000            29927 ns        29921 ns
    TableSortIndicesInt64Count/32768/1/min_time:1.000             5877 ns         5875 ns
    TableSortIndicesInt64Count/32768/0/min_time:1.000            20394 ns        20390 ns
    TableSortIndicesInt64Count/1048576/1/min_time:1.000         132904 ns       132871 ns
    TableSortIndicesInt64Count/8388608/1/min_time:1.000        1342693 ns      1341943 ns
    TableSortIndicesInt64Compare/32768/10000/min_time:1.000     203163 ns       203106 ns
    TableSortIndicesInt64Compare/32768/100/min_time:1.000       199531 ns       199477 ns
    TableSortIndicesInt64Compare/32768/10/min_time:1.000        185968 ns       185916 ns
    TableSortIndicesInt64Compare/32768/2/min_time:1.000         113571 ns       113540 ns
    TableSortIndicesInt64Compare/32768/1/min_time:1.000           6251 ns         6249 ns
    TableSortIndicesInt64Compare/32768/0/min_time:1.000         183650 ns       183609 ns
    TableSortIndicesInt64Compare/1048576/1/min_time:1.000       131701 ns       131674 ns
    TableSortIndicesInt64Compare/8388608/1/min_time:1.000      1264413 ns      1263622 ns
    TableSortIndicesInt64Int64/32768/10000/min_time:1.000       313368 ns       313310 ns
    TableSortIndicesInt64Int64/32768/100/min_time:1.000         313425 ns       313361 ns
    TableSortIndicesInt64Int64/32768/10/min_time:1.000          337051 ns       336987 ns
    TableSortIndicesInt64Int64/32768/2/min_time:1.000           402063 ns       401973 ns
    TableSortIndicesInt64Int64/32768/1/min_time:1.000           254660 ns       254612 ns
    TableSortIndicesInt64Int64/32768/0/min_time:1.000           305887 ns       305815 ns
    TableSortIndicesInt64Int64/1048576/1/min_time:1.000       11157920 ns     11155020 ns
    TableSortIndicesInt64Int64/8388608/1/min_time:1.000      106529839 ns    106501576 ns

Follow-up tasks:

  * Improve performance when the number of sort keys is 2 or greater

    * Idea1: Use counting sort for small range data like on array

    * Idea2: Use threading and merge sort

  * Add support multi-column partition Nth indices on Table
kou and others added 23 commits November 24, 2020 16:18
    2020-11-18T15:27:12+09:00
    Running release/arrow-compute-vector-sort-benchmark
    Run on (24 X 3800 MHz CPU s)
    CPU Caches:
      L1 Data 32 KiB (x12)
      L1 Instruction 32 KiB (x12)
      L2 Unified 512 KiB (x12)
      L3 Unified 16384 KiB (x4)
    Load Average: 0.48, 0.50, 0.53
    ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
    -----------------------------------------------------------------------------------------------------------------------
    Benchmark                                                             Time             CPU   Iterations UserCounters...
    -----------------------------------------------------------------------------------------------------------------------
    ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000            15722 ns        15721 ns        82919 bytes_per_second=1.94115G/s items_per_second=260.537M/s null_percent=0.01 size=32.768k
    ArraySortIndicesInt64Narrow/32768/100/min_time:1.000              16007 ns        16006 ns        89081 bytes_per_second=1.9066G/s items_per_second=255.899M/s null_percent=1 size=32.768k
    ArraySortIndicesInt64Narrow/32768/10/min_time:1.000               15178 ns        15177 ns        91254 bytes_per_second=2.01071G/s items_per_second=269.873M/s null_percent=10 size=32.768k
    ArraySortIndicesInt64Narrow/32768/2/min_time:1.000                29886 ns        29885 ns        47570 bytes_per_second=1045.67M/s items_per_second=137.058M/s null_percent=50 size=32.768k
    ArraySortIndicesInt64Narrow/32768/1/min_time:1.000                 5968 ns         5968 ns       235596 bytes_per_second=5.11372G/s items_per_second=686.352M/s null_percent=100 size=32.768k
    ArraySortIndicesInt64Narrow/32768/0/min_time:1.000                12681 ns        12681 ns       108821 bytes_per_second=2.40665G/s items_per_second=323.015M/s null_percent=0 size=32.768k
    ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000           813376 ns       813331 ns         1777 bytes_per_second=1.2007G/s items_per_second=161.155M/s null_percent=1 size=1048.58k
    ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000          6543874 ns      6543621 ns          207 bytes_per_second=1.19391G/s items_per_second=160.244M/s null_percent=1 size=8.38861M
    ArraySortIndicesInt64Wide/32768/10000/min_time:1.000             189957 ns       189956 ns         7344 bytes_per_second=164.511M/s items_per_second=21.5628M/s null_percent=0.01 size=32.768k
    ArraySortIndicesInt64Wide/32768/100/min_time:1.000               190269 ns       190267 ns         7346 bytes_per_second=164.243M/s items_per_second=21.5277M/s null_percent=1 size=32.768k
    ArraySortIndicesInt64Wide/32768/10/min_time:1.000                175425 ns       175422 ns         7952 bytes_per_second=178.142M/s items_per_second=23.3494M/s null_percent=10 size=32.768k
    ArraySortIndicesInt64Wide/32768/2/min_time:1.000                 107976 ns       107975 ns        12837 bytes_per_second=289.418M/s items_per_second=37.9346M/s null_percent=50 size=32.768k
    ArraySortIndicesInt64Wide/32768/1/min_time:1.000                   5941 ns         5941 ns       235078 bytes_per_second=5.13666G/s items_per_second=689.431M/s null_percent=100 size=32.768k
    ArraySortIndicesInt64Wide/32768/0/min_time:1.000                 184858 ns       184857 ns         7542 bytes_per_second=169.049M/s items_per_second=22.1576M/s null_percent=0 size=32.768k
    ArraySortIndicesInt64Wide/1048576/100/min_time:1.000            9355194 ns      9354942 ns          150 bytes_per_second=106.895M/s items_per_second=14.011M/s null_percent=1 size=1048.58k
    ArraySortIndicesInt64Wide/8388608/100/min_time:1.000           94101796 ns     94099551 ns           15 bytes_per_second=85.0163M/s items_per_second=11.1433M/s null_percent=1 size=8.38861M
    TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns   1386176541 ns            1 items_per_second=756.452k/s
    TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000   1350031141 ns   1349982623 ns            1 items_per_second=776.733k/s
    TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000  1401741018 ns   1401632081 ns            1 items_per_second=748.111k/s
    TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000    1373174328 ns   1373145968 ns            1 items_per_second=763.63k/s
    TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000  1035377805 ns   1035362806 ns            1 items_per_second=1012.76k/s
    TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000    1035218085 ns   1035201824 ns            1 items_per_second=1012.92k/s
    TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000     6859319 ns      6859251 ns          201 items_per_second=152.87M/s
    TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000       6847314 ns      6847048 ns          209 items_per_second=153.143M/s
    TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000   626909516 ns    626904696 ns            2 items_per_second=1.67262M/s
    TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000     591632035 ns    591602144 ns            2 items_per_second=1.77243M/s
    TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000    613197155 ns    613182727 ns            2 items_per_second=1.71005M/s
    TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000      595568690 ns    595554829 ns            2 items_per_second=1.76067M/s
    TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000    424472984 ns    424453915 ns            3 items_per_second=2.47041M/s
    TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000      397339109 ns    397335604 ns            4 items_per_second=2.63902M/s
    TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000      7027310 ns      7027241 ns          195 items_per_second=149.216M/s
    TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000        6891364 ns      6891272 ns          207 items_per_second=152.16M/s
    TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000   516832749 ns    516823293 ns            3 items_per_second=2.02889M/s
    TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000     495273237 ns    495269975 ns            3 items_per_second=2.11718M/s
    TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000    515550360 ns    515531501 ns            3 items_per_second=2.03397M/s
    TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000      496670125 ns    496663084 ns            3 items_per_second=2.11124M/s
    TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000    340060863 ns    340053172 ns            4 items_per_second=3.08356M/s
    TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000      331281499 ns    331277004 ns            4 items_per_second=3.16525M/s
    TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000      6691062 ns      6690924 ns          212 items_per_second=156.716M/s
    TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000        6384382 ns      6384323 ns          221 items_per_second=164.242M/s
    TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000    622544467 ns    622531557 ns            2 items_per_second=1.68437M/s
    TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000      619193843 ns    619155597 ns            2 items_per_second=1.69356M/s
    TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000     615885889 ns    615884764 ns            2 items_per_second=1.70255M/s
    TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000       589917731 ns    589912005 ns            2 items_per_second=1.77751M/s
    TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000     600951149 ns    600947256 ns            2 items_per_second=1.74487M/s
    TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000       592304437 ns    592286953 ns            2 items_per_second=1.77039M/s
    TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000      98781239 ns     98777123 ns           14 items_per_second=10.6156M/s
    TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000        94136230 ns     94085863 ns           15 items_per_second=11.1449M/s
    TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000     232323308 ns    232319347 ns            6 items_per_second=4.51351M/s
    TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000       223708791 ns    223700834 ns            6 items_per_second=4.6874M/s
    TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000      221975360 ns    221968035 ns            6 items_per_second=4.724M/s
    TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000        218214843 ns    218210306 ns            6 items_per_second=4.80535M/s
    TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000      224756430 ns    224745751 ns            6 items_per_second=4.66561M/s
    TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000        220809969 ns    220809044 ns            6 items_per_second=4.74879M/s
    TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000       96159427 ns     96156899 ns           14 items_per_second=10.9048M/s
    TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000         92307749 ns     92304526 ns           15 items_per_second=11.36M/s
    TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000     166390841 ns    166382427 ns            8 items_per_second=6.3022M/s
    TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000       164576492 ns    164570581 ns            9 items_per_second=6.37159M/s
    TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000      165724919 ns    165718584 ns            8 items_per_second=6.32745M/s
    TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000        164048003 ns    164046222 ns            8 items_per_second=6.39195M/s
    TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000      170131788 ns    170124950 ns            8 items_per_second=6.16356M/s
    TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000        170874129 ns    170871245 ns            8 items_per_second=6.13664M/s
    TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000       92314674 ns     92312953 ns           15 items_per_second=11.3589M/s
    TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000         92023019 ns     92022643 ns           15 items_per_second=11.3948M/s
Chunked array sorter implementation is naive.

Radix sort based table sorter is slower than the existing table
sorter. So this is disabled for now.

    Benchmark                                                             Time             CPU
    ------------------------------------------------------------------------------------------
    ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000            16481 ns        16481 ns
    ArraySortIndicesInt64Narrow/32768/100/min_time:1.000              16782 ns        16782 ns
    ArraySortIndicesInt64Narrow/32768/10/min_time:1.000               16604 ns        16604 ns
    ArraySortIndicesInt64Narrow/32768/2/min_time:1.000                28438 ns        28438 ns
    ArraySortIndicesInt64Narrow/32768/1/min_time:1.000                 6064 ns         6064 ns
    ArraySortIndicesInt64Narrow/32768/0/min_time:1.000                13949 ns        13949 ns
    ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000           739131 ns       739116 ns
    ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000          7270177 ns      7270007 ns
    ArraySortIndicesInt64Wide/32768/10000/min_time:1.000             197500 ns       197496 ns
    ArraySortIndicesInt64Wide/32768/100/min_time:1.000               198485 ns       198481 ns
    ArraySortIndicesInt64Wide/32768/10/min_time:1.000                184462 ns       184460 ns
    ArraySortIndicesInt64Wide/32768/2/min_time:1.000                 114283 ns       114282 ns
    ArraySortIndicesInt64Wide/32768/1/min_time:1.000                   6117 ns         6116 ns
    ArraySortIndicesInt64Wide/32768/0/min_time:1.000                 194275 ns       194267 ns
    ArraySortIndicesInt64Wide/1048576/100/min_time:1.000            9688010 ns      9687762 ns
    ArraySortIndicesInt64Wide/8388608/100/min_time:1.000           98482097 ns     98480482 ns
    TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 13674075901 ns   13673756372
    TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000   13562068073 ns   13561526021
    TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000  6816877843 ns   6816737232 ns
    TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000    6513599980 ns   6513470119 ns
    TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000  1297496469 ns   1297401147 ns
    TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000    1198835057 ns   1198795471 ns
    TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000   512438066 ns    512427916 ns
    TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000     406987001 ns    406983020 ns
    TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000  4782555568 ns   4782501462 ns
    TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000    4627874931 ns   4627776462 ns
    TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000   2347109491 ns   2347030723 ns
    TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000     2353793768 ns   2353762906 ns
    TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000    329895435 ns    329878554 ns
    TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000      318651531 ns    318643264 ns
    TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000     27022014 ns     27020904 ns
    TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000       21620126 ns     21619471 ns
    TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000  2082161247 ns   2082123210 ns
    TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000    2125744136 ns   2125688346 ns
    TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000   1054223608 ns   1054187702 ns
    TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000     1039334371 ns   1039300540 ns
    TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000    226713390 ns    226708185 ns
    TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000      232277655 ns    232265803 ns
    TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000      7317963 ns      7317641 ns
    TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000        6936323 ns      6935778 ns
    TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000   15478483751 ns   15477778551 ns
    TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000     15323045709 ns   15322716370 ns
    TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000    7600326817 ns   7600060885 ns
    TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000      7305830182 ns   7305653322 ns
    TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000    1603439764 ns   1603419271 ns
    TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000      1585765386 ns   1585720827 ns
    TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000     848231188 ns    848213371 ns
    TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000       599883956 ns    599827161 ns
    TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000    6105580749 ns   6105405315 ns
    TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000      6101922415 ns   6092946808 ns
    TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000     3016793628 ns   3013968137 ns
    TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000       2989364923 ns   2989345424 ns
    TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000      595413434 ns    595398793 ns
    TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000        593904971 ns    593872005 ns
    TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000      126583175 ns    126578734 ns
    TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000        115004382 ns    115003229 ns
    TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000    3013364645 ns   3013319384 ns
    TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000      3032564737 ns   3032430757 ns
    TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000     1457310735 ns   1457286492 ns
    TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000       1465821557 ns   1465809354 ns
    TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000      335816200 ns    335812614 ns
    TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000        330472127 ns    330467667 ns
    TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000      103267823 ns    103262409 ns
    TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000        102048116 ns    102044125 ns

Diff:

    Range/N records/Inverse null probability/N sort keys/N chunks

    Narrow/1048576/100/16/32/min_time:1.000
          FromTheBeginning:   1386231938 ns
                     Radix:  13674075901 ns
    Narrow/1048576/0/16/32/min_time:1.000
          FromTheBeginning:   1350031141 ns
                     Radix:  13562068073 ns
    Narrow/1048576/100/8/32/min_time:1.000
          FromTheBeginning:   1401741018 ns
                     Radix:   6816877843 ns
    Narrow/1048576/0/8/32/min_time:1.000
          FromTheBeginning:   1373174328 ns
                     Radix:   6513599980 ns
    Narrow/1048576/100/2/32/min_time:1.000
          FromTheBeginning:   1035377805 ns
                     Radix:   1297496469 ns
    Narrow/1048576/0/2/32/min_time:1.000
          FromTheBeginning:   1035218085 ns
                     Radix:   1198835057 ns
    Narrow/1048576/100/1/32/min_time:1.000
          FromTheBeginning:      6859319 ns
                     Radix:    512438066 ns
    Narrow/1048576/0/1/32/min_time:1.000
          FromTheBeginning:      6847314 ns
                     Radix:    406987001 ns
    Narrow/1048576/100/16/4/min_time:1.000
          FromTheBeginning:    626909516 ns
                     Radix:   4782555568 ns
    Narrow/1048576/0/16/4/min_time:1.000
          FromTheBeginning:    591632035 ns
                     Radix:   4627874931 ns
    Narrow/1048576/100/8/4/min_time:1.000
          FromTheBeginning:    613197155 ns
                     Radix:   2347109491 ns
    Narrow/1048576/0/8/4/min_time:1.000
          FromTheBeginning:    595568690 ns
                     Radix:   2353793768 ns
    Narrow/1048576/100/2/4/min_time:1.000
          FromTheBeginning:    424472984 ns
                     Radix:    329895435 ns
    Narrow/1048576/0/2/4/min_time:1.000
          FromTheBeginning:    397339109 ns
                     Radix:    318651531 ns
    Narrow/1048576/100/1/4/min_time:1.000
          FromTheBeginning:      7027310 ns
                     Radix:     27022014 ns
    Narrow/1048576/0/1/4/min_time:1.000
          FromTheBeginning:      6891364 ns
                     Radix:     21620126 ns
    Narrow/1048576/100/16/1/min_time:1.000
          FromTheBeginning:    516832749 ns
                     Radix:   2082161247 ns
    Narrow/1048576/0/16/1/min_time:1.000
          FromTheBeginning:    495273237 ns
                     Radix:   2125744136 ns
    Narrow/1048576/100/8/1/min_time:1.000
          FromTheBeginning:    515550360 ns
                     Radix:   1054223608 ns
    Narrow/1048576/0/8/1/min_time:1.000
          FromTheBeginning:    496670125 ns
                     Radix:   1039334371 ns
    Narrow/1048576/100/2/1/min_time:1.000
          FromTheBeginning:    340060863 ns
                     Radix:    226713390 ns
    Narrow/1048576/0/2/1/min_time:1.000
          FromTheBeginning:    331281499 ns
                     Radix:    232277655 ns
    Narrow/1048576/100/1/1/min_time:1.000
          FromTheBeginning:      6691062 ns
                     Radix:      7317963 ns
    Narrow/1048576/0/1/1/min_time:1.000
          FromTheBeginning:      6384382 ns
                     Radix:      6936323 ns
    Wide/1048576/100/16/32/min_time:1.000
          FromTheBeginning:    622544467 ns
                     Radix:  15478483751 ns
    Wide/1048576/0/16/32/min_time:1.000
          FromTheBeginning:    619193843 ns
                     Radix:  15323045709 ns
    Wide/1048576/100/8/32/min_time:1.000
          FromTheBeginning:    615885889 ns
                     Radix:   7600326817 ns
    Wide/1048576/0/8/32/min_time:1.000
          FromTheBeginning:    589917731 ns
                     Radix:   7305830182 ns
    Wide/1048576/100/2/32/min_time:1.000
          FromTheBeginning:    600951149 ns
                     Radix:   1603439764 ns
    Wide/1048576/0/2/32/min_time:1.000
          FromTheBeginning:    592304437 ns
                     Radix:   1585765386 ns
    Wide/1048576/100/1/32/min_time:1.000
          FromTheBeginning:     98781239 ns
                     Radix:    848231188 ns
    Wide/1048576/0/1/32/min_time:1.000
          FromTheBeginning:     94136230 ns
                     Radix:    599883956 ns
    Wide/1048576/100/16/4/min_time:1.000
          FromTheBeginning:    232323308 ns
                     Radix:   6105580749 ns
    Wide/1048576/0/16/4/min_time:1.000
          FromTheBeginning:    223708791 ns
                     Radix:   6101922415 ns
    Wide/1048576/100/8/4/min_time:1.000
          FromTheBeginning:    221975360 ns
                     Radix:   3016793628 ns
    Wide/1048576/0/8/4/min_time:1.000
          FromTheBeginning:    218214843 ns
                     Radix:   2989364923 ns
    Wide/1048576/100/2/4/min_time:1.000
          FromTheBeginning:    224756430 ns
                     Radix:    595413434 ns
    Wide/1048576/0/2/4/min_time:1.000
          FromTheBeginning:    220809969 ns
                     Radix:    593904971 ns
    Wide/1048576/100/1/4/min_time:1.000
          FromTheBeginning:     96159427 ns
                     Radix:    126583175 ns
    Wide/1048576/0/1/4/min_time:1.000
          FromTheBeginning:     92307749 ns
                     Radix:    115004382 ns
    Wide/1048576/100/16/1/min_time:1.000
          FromTheBeginning:    166390841 ns
                     Radix:   3013364645 ns
    Wide/1048576/0/16/1/min_time:1.000
          FromTheBeginning:    164576492 ns
                     Radix:   3032564737 ns
    Wide/1048576/100/8/1/min_time:1.000
          FromTheBeginning:    165724919 ns
                     Radix:   1457310735 ns
    Wide/1048576/0/8/1/min_time:1.000
          FromTheBeginning:    164048003 ns
                     Radix:   1465821557 ns
    Wide/1048576/100/2/1/min_time:1.000
          FromTheBeginning:    170131788 ns
                     Radix:    335816200 ns
    Wide/1048576/0/2/1/min_time:1.000
          FromTheBeginning:    170874129 ns
                     Radix:    330472127 ns
    Wide/1048576/100/1/1/min_time:1.000
          FromTheBeginning:     92314674 ns
                     Radix:    103267823 ns
    Wide/1048576/0/1/1/min_time:1.000
          FromTheBeginning:     92023019 ns
                     Radix:    102048116 ns
Add some chunked array sorting benchmarks
@pitrou pitrou force-pushed the cpp-compute-sort-indices-table branch from e50e471 to fecd557 Compare November 24, 2020 15:18
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks a lot @kou, very nice work. I've pushed some updates and will merge.

@pitrou pitrou closed this in 5e60962 Nov 24, 2020
@kou kou deleted the cpp-compute-sort-indices-table branch November 24, 2020 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants