Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Jun 28, 2020

Improve performance with streamlined implementation with bulk appends for the non-null/all-selected case. Benchmarks to follow

@wesm
Copy link
Member Author

wesm commented Jun 28, 2020

Benchmarks on gcc-8

$ archery benchmark diff --cc=gcc-8 --cxx=g++-8 --benchmark-filter=FilterString
                                 benchmark         baseline        contender  change %                                                                      counters
13     FilterStringFilterNoNulls/1048576/0    1.242 GiB/sec    6.766 GiB/sec   444.926    {'iterations': 890, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
28     FilterStringFilterNoNulls/1048576/3    1.216 GiB/sec    5.055 GiB/sec   315.816    {'iterations': 877, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
8      FilterStringFilterNoNulls/1048576/6    1.087 GiB/sec    2.512 GiB/sec   130.990    {'iterations': 771, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
4     FilterStringFilterNoNulls/1048576/12  227.501 MiB/sec  500.272 MiB/sec   119.899   {'iterations': 160, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
23     FilterStringFilterNoNulls/1048576/9  902.994 MiB/sec    1.362 GiB/sec    54.476   {'iterations': 622, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
24  FilterStringFilterWithNulls/1048576/12  235.786 MiB/sec  356.124 MiB/sec    51.037   {'iterations': 165, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 99.9}
5    FilterStringFilterWithNulls/1048576/3  996.664 MiB/sec    1.361 GiB/sec    39.858    {'iterations': 687, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 99.9}
22   FilterStringFilterWithNulls/1048576/6  988.599 MiB/sec    1.331 GiB/sec    37.918    {'iterations': 685, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 99.9}
15   FilterStringFilterWithNulls/1048576/9  913.433 MiB/sec    1.205 GiB/sec    35.073   {'iterations': 634, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 99.9}
7     FilterStringFilterNoNulls/1048576/13  219.235 MiB/sec  292.800 MiB/sec    33.555   {'iterations': 155, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
14    FilterStringFilterNoNulls/1048576/10    1.169 GiB/sec    1.559 GiB/sec    33.370   {'iterations': 838, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
25   FilterStringFilterWithNulls/1048576/0    1.062 GiB/sec    1.366 GiB/sec    28.690    {'iterations': 761, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 99.9}
2      FilterStringFilterNoNulls/1048576/1    1.398 GiB/sec    1.763 GiB/sec    26.088   {'iterations': 1007, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
11   FilterStringFilterWithNulls/1048576/4    1.233 GiB/sec    1.521 GiB/sec    23.336    {'iterations': 894, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 50.0}
29   FilterStringFilterWithNulls/1048576/7    1.240 GiB/sec    1.519 GiB/sec    22.426    {'iterations': 881, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 50.0}
9   FilterStringFilterWithNulls/1048576/10    1.177 GiB/sec    1.370 GiB/sec    16.380   {'iterations': 753, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 50.0}
26  FilterStringFilterWithNulls/1048576/13  215.181 MiB/sec  249.411 MiB/sec    15.907   {'iterations': 151, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 50.0}
3      FilterStringFilterNoNulls/1048576/7    1.373 GiB/sec    1.580 GiB/sec    15.117    {'iterations': 983, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
10     FilterStringFilterNoNulls/1048576/4    1.427 GiB/sec    1.607 GiB/sec    12.602    {'iterations': 996, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
17   FilterStringFilterWithNulls/1048576/1    1.364 GiB/sec    1.532 GiB/sec    12.343    {'iterations': 950, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 50.0}
20     FilterStringFilterNoNulls/1048576/8   19.656 GiB/sec   20.549 GiB/sec     4.540   {'iterations': 14209, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
6   FilterStringFilterWithNulls/1048576/11   12.449 GiB/sec   12.500 GiB/sec     0.407   {'iterations': 8994, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 1.0}
0      FilterStringFilterNoNulls/1048576/5   20.884 GiB/sec   20.824 GiB/sec    -0.287   {'iterations': 14923, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
19    FilterStringFilterNoNulls/1048576/11   18.171 GiB/sec   17.865 GiB/sec    -1.687  {'iterations': 12875, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
21  FilterStringFilterWithNulls/1048576/14    1.551 GiB/sec    1.519 GiB/sec    -2.057   {'iterations': 1090, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 1.0}
16     FilterStringFilterNoNulls/1048576/2   21.518 GiB/sec   21.028 GiB/sec    -2.281   {'iterations': 15479, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
12    FilterStringFilterNoNulls/1048576/14    2.367 GiB/sec    2.295 GiB/sec    -3.071   {'iterations': 1690, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
27   FilterStringFilterWithNulls/1048576/8   14.006 GiB/sec   12.802 GiB/sec    -8.601   {'iterations': 10005, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 1.0}
1    FilterStringFilterWithNulls/1048576/5   14.044 GiB/sec   12.815 GiB/sec    -8.749   {'iterations': 10036, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
18   FilterStringFilterWithNulls/1048576/2   14.547 GiB/sec   13.088 GiB/sec   -10.029   {'iterations': 10421, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}

clang-11

$ archery benchmark diff --cc=clang-11 --cxx=clang++-11 --benchmark-filter=FilterString
                                 benchmark          baseline        contender  change %                                                                      counters
15     FilterStringFilterNoNulls/1048576/0     1.528 GiB/sec    6.513 GiB/sec   326.150   {'iterations': 1092, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
12     FilterStringFilterNoNulls/1048576/3     1.483 GiB/sec    5.488 GiB/sec   269.986   {'iterations': 1059, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
10     FilterStringFilterNoNulls/1048576/6     1.269 GiB/sec    2.550 GiB/sec   100.978    {'iterations': 905, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
25     FilterStringFilterNoNulls/1048576/9  1020.328 MiB/sec    1.333 GiB/sec    33.742   {'iterations': 718, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
26   FilterStringFilterWithNulls/1048576/3     1.069 GiB/sec    1.405 GiB/sec    31.393    {'iterations': 763, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 99.9}
6   FilterStringFilterWithNulls/1048576/12   281.146 MiB/sec  368.375 MiB/sec    31.026   {'iterations': 195, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 99.9}
17   FilterStringFilterWithNulls/1048576/0     1.080 GiB/sec    1.401 GiB/sec    29.703    {'iterations': 783, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 99.9}
2    FilterStringFilterWithNulls/1048576/9  1007.528 MiB/sec    1.261 GiB/sec    28.170   {'iterations': 709, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 99.9}
0    FilterStringFilterWithNulls/1048576/6     1.059 GiB/sec    1.344 GiB/sec    26.923    {'iterations': 762, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 99.9}
1      FilterStringFilterNoNulls/1048576/1     1.472 GiB/sec    1.840 GiB/sec    25.021   {'iterations': 1061, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
9   FilterStringFilterWithNulls/1048576/13   228.021 MiB/sec  277.798 MiB/sec    21.830   {'iterations': 160, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 50.0}
19   FilterStringFilterWithNulls/1048576/1     1.321 GiB/sec    1.598 GiB/sec    20.983    {'iterations': 942, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 50.0}
21    FilterStringFilterNoNulls/1048576/12   360.116 MiB/sec  433.747 MiB/sec    20.446   {'iterations': 253, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
4    FilterStringFilterWithNulls/1048576/7     1.368 GiB/sec    1.630 GiB/sec    19.141    {'iterations': 962, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 50.0}
7      FilterStringFilterNoNulls/1048576/4     1.405 GiB/sec    1.663 GiB/sec    18.311   {'iterations': 1048, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
23  FilterStringFilterWithNulls/1048576/10     1.265 GiB/sec    1.478 GiB/sec    16.849   {'iterations': 912, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 50.0}
20  FilterStringFilterWithNulls/1048576/14     1.277 GiB/sec    1.462 GiB/sec    14.420    {'iterations': 975, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 1.0}
13     FilterStringFilterNoNulls/1048576/7     1.430 GiB/sec    1.627 GiB/sec    13.777   {'iterations': 1028, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
22     FilterStringFilterNoNulls/1048576/2    22.004 GiB/sec   24.994 GiB/sec    13.592   {'iterations': 15771, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
18   FilterStringFilterWithNulls/1048576/4     1.335 GiB/sec    1.500 GiB/sec    12.345    {'iterations': 960, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 50.0}
8   FilterStringFilterWithNulls/1048576/11    10.872 GiB/sec   12.110 GiB/sec    11.392   {'iterations': 7820, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 1.0}
27    FilterStringFilterNoNulls/1048576/10     1.311 GiB/sec    1.386 GiB/sec     5.668   {'iterations': 933, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
14     FilterStringFilterNoNulls/1048576/5    20.494 GiB/sec   21.433 GiB/sec     4.579   {'iterations': 14595, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
5    FilterStringFilterWithNulls/1048576/2    12.081 GiB/sec   12.621 GiB/sec     4.471    {'iterations': 8706, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}
16    FilterStringFilterNoNulls/1048576/13   256.578 MiB/sec  252.991 MiB/sec    -1.398   {'iterations': 182, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
11   FilterStringFilterWithNulls/1048576/8    12.340 GiB/sec   12.157 GiB/sec    -1.479    {'iterations': 8871, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 1.0}
3      FilterStringFilterNoNulls/1048576/8    19.332 GiB/sec   18.721 GiB/sec    -3.160   {'iterations': 13954, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
29   FilterStringFilterWithNulls/1048576/5    12.376 GiB/sec   11.794 GiB/sec    -4.700    {'iterations': 8938, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
28    FilterStringFilterNoNulls/1048576/11    17.402 GiB/sec   14.552 GiB/sec   -16.374  {'iterations': 12528, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
24    FilterStringFilterNoNulls/1048576/14     2.286 GiB/sec    1.760 GiB/sec   -22.986   {'iterations': 1636, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}

The benchmarks where there is a performance decrease seem inconsequential

@github-actions
Copy link

@wesm
Copy link
Member Author

wesm commented Jun 30, 2020

+1, this is a bit dry so would rather reviewers reserve their time for other PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant