-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size #7442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here's benchmark runs on my machine
If you want to benchmark yourself, please use this branch for the "before": https://github.com/wesm/arrow/tree/ARROW-9075-comparison. It contains the RandomArrayGenerator::Boolean change and some other changes to the benchmarks without which the results will be non-comparable |
|
To show some simple numbers to show the perf before and after in Python, this example has a high selectivity (all but one value selected) and low selectivity filter (1/100 and 1/1000): before: after EDIT: updated benchmarks for low-selectivity optimization |
|
The RTools 4.0 build is spurious. This is ready for review |
|
I implemented some other optimizations, especially for the case where neither values nor filter contain nulls. I'm working on updated benchmarks Updated benchmarks: https://gist.github.com/wesm/ad07cec1613b6327926dfe1d95e7f4f0/revisions?diff=split |
|
I found some issues in the Python benchmarks I posted before. Here's the updated setup and current numbers setup (I was including the cost of converting NumPy booleans to Arrow booleans in the prior results). I also added a "worst case scenario" where 50% of values are not selected before: after: |
|
@ursabot benchmark --help |
|
|
@ursabot benchmark --benchmark-filter=Filter 66df3d0 |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112487) builder has been succeeded. Revision: 31a66630f6bcb9a3f74912da7d31ac2412e97184 ======================================= =============== =============== =========
benchmark baseline contender change
======================================= =============== =============== =========
FilterInt64FilterWithNulls/262144/3 563.800 MiB/sec 576.625 MiB/sec 2.275%
- FilterStringFilterWithNulls/262144/3 498.174 MiB/sec 434.196 MiB/sec -12.842%
FilterFSLInt64FilterNoNulls/262144/9 158.897 MiB/sec 268.195 MiB/sec 68.785%
FilterInt64FilterNoNulls/262144/14 2.793 GiB/sec 6.554 GiB/sec 134.709%
FilterFSLInt64FilterNoNulls/262144/11 2.356 GiB/sec 5.386 GiB/sec 128.589%
FilterStringFilterNoNulls/262144/2 4.937 GiB/sec 10.996 GiB/sec 122.715%
FilterFSLInt64FilterWithNulls/262144/5 1.590 GiB/sec 4.193 GiB/sec 163.732%
FilterInt64FilterWithNulls/262144/12 519.932 MiB/sec 496.829 MiB/sec -4.443%
FilterInt64FilterNoNulls/262144/0 669.365 MiB/sec 7.541 GiB/sec 1053.558%
FilterFSLInt64FilterNoNulls/262144/1 268.027 MiB/sec 560.837 MiB/sec 109.246%
FilterStringFilterNoNulls/262144/6 488.692 MiB/sec 481.827 MiB/sec -1.405%
FilterInt64FilterNoNulls/262144/8 2.735 GiB/sec 6.313 GiB/sec 130.810%
FilterInt64FilterNoNulls/262144/5 2.809 GiB/sec 6.018 GiB/sec 114.267%
- FilterStringFilterWithNulls/262144/12 84.168 MiB/sec 70.410 MiB/sec -16.346%
FilterFSLInt64FilterNoNulls/262144/0 169.867 MiB/sec 718.594 MiB/sec 323.035%
FilterStringFilterWithNulls/262144/14 355.644 MiB/sec 878.914 MiB/sec 147.133%
FilterStringFilterWithNulls/262144/2 3.338 GiB/sec 8.903 GiB/sec 166.736%
FilterFSLInt64FilterWithNulls/262144/1 263.151 MiB/sec 512.905 MiB/sec 94.909%
FilterFSLInt64FilterNoNulls/262144/14 2.395 GiB/sec 5.212 GiB/sec 117.604%
FilterInt64FilterWithNulls/262144/11 1.729 GiB/sec 4.684 GiB/sec 170.948%
FilterInt64FilterNoNulls/262144/9 566.051 MiB/sec 3.083 GiB/sec 457.794%
- FilterStringFilterWithNulls/262144/10 619.724 MiB/sec 578.798 MiB/sec -6.604%
FilterInt64FilterWithNulls/262144/1 541.616 MiB/sec 558.958 MiB/sec 3.202%
FilterFSLInt64FilterWithNulls/262144/14 1.596 GiB/sec 4.061 GiB/sec 154.454%
FilterFSLInt64FilterWithNulls/262144/0 170.064 MiB/sec 398.738 MiB/sec 134.464%
FilterInt64FilterWithNulls/262144/2 1.739 GiB/sec 4.883 GiB/sec 180.721%
FilterInt64FilterWithNulls/262144/4 528.271 MiB/sec 555.772 MiB/sec 5.206%
FilterFSLInt64FilterNoNulls/262144/2 2.383 GiB/sec 6.074 GiB/sec 154.832%
FilterInt64FilterNoNulls/262144/4 584.370 MiB/sec 579.728 MiB/sec -0.794%
FilterInt64FilterNoNulls/262144/12 575.177 MiB/sec 3.023 GiB/sec 438.268%
- FilterStringFilterWithNulls/262144/9 459.179 MiB/sec 394.515 MiB/sec -14.083%
FilterStringFilterNoNulls/262144/5 4.936 GiB/sec 10.562 GiB/sec 113.987%
FilterInt64FilterNoNulls/262144/2 2.838 GiB/sec 7.390 GiB/sec 160.374%
FilterFSLInt64FilterNoNulls/262144/7 261.996 MiB/sec 464.922 MiB/sec 77.454%
FilterStringFilterNoNulls/262144/14 580.305 MiB/sec 1.253 GiB/sec 121.158%
FilterFSLInt64FilterWithNulls/262144/13 249.426 MiB/sec 386.982 MiB/sec 55.149%
- FilterInt64FilterWithNulls/262144/9 530.774 MiB/sec 497.368 MiB/sec -6.294%
FilterStringFilterWithNulls/262144/8 3.270 GiB/sec 8.467 GiB/sec 158.943%
FilterFSLInt64FilterNoNulls/262144/10 257.812 MiB/sec 390.196 MiB/sec 51.349%
- FilterStringFilterNoNulls/262144/13 98.039 MiB/sec 90.475 MiB/sec -7.716%
FilterInt64FilterWithNulls/262144/8 1.737 GiB/sec 4.652 GiB/sec 167.790%
FilterFSLInt64FilterWithNulls/262144/3 167.057 MiB/sec 351.817 MiB/sec 110.597%
- FilterStringFilterWithNulls/262144/6 494.580 MiB/sec 429.801 MiB/sec -13.098%
FilterFSLInt64FilterWithNulls/262144/12 165.174 MiB/sec 262.176 MiB/sec 58.728%
FilterInt64FilterWithNulls/262144/7 526.592 MiB/sec 541.187 MiB/sec 2.772%
FilterStringFilterNoNulls/262144/11 4.531 GiB/sec 9.652 GiB/sec 113.006%
FilterStringFilterWithNulls/262144/1 662.260 MiB/sec 633.359 MiB/sec -4.364%
FilterStringFilterWithNulls/262144/4 670.467 MiB/sec 644.877 MiB/sec -3.817%
FilterStringFilterNoNulls/262144/0 503.582 MiB/sec 550.304 MiB/sec 9.278%
- FilterStringFilterNoNulls/262144/9 443.066 MiB/sec 390.416 MiB/sec -11.883%
FilterFSLInt64FilterNoNulls/262144/13 251.747 MiB/sec 351.809 MiB/sec 39.747%
FilterInt64FilterNoNulls/262144/11 2.788 GiB/sec 6.687 GiB/sec 139.878%
- FilterInt64FilterWithNulls/262144/0 620.421 MiB/sec 585.692 MiB/sec -5.598%
FilterFSLInt64FilterWithNulls/262144/8 1.593 GiB/sec 4.155 GiB/sec 160.783%
- FilterStringFilterNoNulls/262144/7 692.942 MiB/sec 654.463 MiB/sec -5.553%
FilterStringFilterNoNulls/262144/8 4.900 GiB/sec 10.519 GiB/sec 114.694%
FilterInt64FilterWithNulls/262144/10 510.602 MiB/sec 527.612 MiB/sec 3.331%
FilterFSLInt64FilterNoNulls/262144/3 159.401 MiB/sec 555.494 MiB/sec 248.487%
FilterFSLInt64FilterNoNulls/262144/6 162.294 MiB/sec 399.907 MiB/sec 146.410%
- FilterStringFilterWithNulls/262144/0 517.359 MiB/sec 439.657 MiB/sec -15.019%
FilterInt64FilterWithNulls/262144/13 502.220 MiB/sec 527.971 MiB/sec 5.128%
FilterStringFilterWithNulls/262144/7 666.386 MiB/sec 638.254 MiB/sec -4.221%
FilterInt64FilterNoNulls/262144/6 603.261 MiB/sec 3.473 GiB/sec 489.518%
FilterStringFilterWithNulls/262144/11 2.994 GiB/sec 8.094 GiB/sec 170.304%
FilterFSLInt64FilterWithNulls/262144/6 165.225 MiB/sec 335.017 MiB/sec 102.765%
FilterFSLInt64FilterWithNulls/262144/7 257.333 MiB/sec 466.760 MiB/sec 81.383%
FilterInt64FilterNoNulls/262144/7 583.317 MiB/sec 564.896 MiB/sec -3.158%
FilterStringFilterNoNulls/262144/4 691.530 MiB/sec 699.221 MiB/sec 1.112%
FilterFSLInt64FilterWithNulls/262144/11 1.592 GiB/sec 4.057 GiB/sec 154.837%
- FilterStringFilterNoNulls/262144/12 88.970 MiB/sec 70.067 MiB/sec -21.246%
FilterInt64FilterNoNulls/262144/10 562.254 MiB/sec 545.802 MiB/sec -2.926%
FilterInt64FilterWithNulls/262144/14 1.738 GiB/sec 4.747 GiB/sec 173.077%
FilterFSLInt64FilterWithNulls/262144/2 1.570 GiB/sec 4.295 GiB/sec 173.597%
FilterInt64FilterNoNulls/262144/13 558.715 MiB/sec 554.622 MiB/sec -0.733%
FilterInt64FilterWithNulls/262144/6 561.253 MiB/sec 537.786 MiB/sec -4.181%
FilterStringFilterWithNulls/262144/13 91.370 MiB/sec 89.650 MiB/sec -1.882%
FilterFSLInt64FilterNoNulls/262144/12 153.042 MiB/sec 241.416 MiB/sec 57.745%
FilterFSLInt64FilterNoNulls/262144/5 2.414 GiB/sec 5.672 GiB/sec 134.917%
FilterFSLInt64FilterNoNulls/262144/8 2.377 GiB/sec 5.541 GiB/sec 133.082%
- FilterStringFilterNoNulls/262144/10 632.556 MiB/sec 572.816 MiB/sec -9.444%
FilterFSLInt64FilterWithNulls/262144/9 166.869 MiB/sec 288.049 MiB/sec 72.620%
FilterInt64FilterNoNulls/262144/1 599.855 MiB/sec 912.146 MiB/sec 52.061%
FilterStringFilterWithNulls/262144/5 3.295 GiB/sec 8.587 GiB/sec 160.574%
FilterFSLInt64FilterNoNulls/262144/4 263.896 MiB/sec 514.836 MiB/sec 95.091%
FilterFSLInt64FilterWithNulls/262144/4 258.744 MiB/sec 477.042 MiB/sec 84.369%
FilterInt64FilterWithNulls/262144/5 1.735 GiB/sec 4.728 GiB/sec 172.542%
FilterStringFilterNoNulls/262144/3 495.135 MiB/sec 539.178 MiB/sec 8.895%
FilterInt64FilterNoNulls/262144/3 611.978 MiB/sec 3.929 GiB/sec 557.402%
FilterFSLInt64FilterWithNulls/262144/10 255.156 MiB/sec 417.072 MiB/sec 63.458%
FilterStringFilterNoNulls/262144/1 714.448 MiB/sec 713.457 MiB/sec -0.139%
======================================= =============== =============== ========= |
|
The string perf regressions are mostly for the cases where 99.9% of the values are selected. I'll take a closer look at this to see what can be done. The varbinary case is so important that we might want to create a specialized implementation for it |
|
Still, a 10% decrease for string is highly tolerable for a 50-150% increase for all other types. |
|
True. I think for binary-based types we need to implement bulk-block-appends. It's beyond the scope of this PR -- I will take a brief look to see if there's anything dumb (like messing up the preallocation) that I did that's making things slower |
|
I'll have to deal with the string optimization in a follow up PR, so I'm going to leave this for review as is. It would be good to get this merged sooner rather than later. EDIT: opened https://issues.apache.org/jira/browse/ARROW-9152 |
|
Everything is much faster here, including string filtering. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't taken a look at everything.
fsaintjacques
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments regarding testing and implementation.
cpp/src/arrow/compute/api_vector.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be extracted as a ScalarFunction named popcount or so (follow up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
@ursabot benchmark --benchmark-filter=Filter c4f425768 |
|
I think I improved some of the readability problems and addressed the other comments. I'd like to merge this soon once CI is green |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112952) builder has been succeeded. Revision: f50b39e54c50e8a53606eda486c88e6ec51d7006 ======================================= =============== ================ ========
benchmark baseline contender change
======================================= =============== ================ ========
- FilterFSLInt64FilterNoNulls/262144/14 5.457 GiB/sec 4.398 GiB/sec -19.404%
FilterStringFilterWithNulls/262144/4 642.405 MiB/sec 677.920 MiB/sec 5.528%
- FilterFSLInt64FilterNoNulls/262144/7 463.992 MiB/sec 378.391 MiB/sec -18.449%
FilterFSLInt64FilterWithNulls/262144/6 333.996 MiB/sec 320.327 MiB/sec -4.093%
- FilterFSLInt64FilterWithNulls/262144/1 516.189 MiB/sec 459.926 MiB/sec -10.900%
- FilterStringFilterNoNulls/262144/4 681.504 MiB/sec 595.788 MiB/sec -12.577%
- FilterFSLInt64FilterNoNulls/262144/8 5.889 GiB/sec 4.675 GiB/sec -20.610%
- FilterInt64FilterWithNulls/262144/10 606.960 MiB/sec 547.973 MiB/sec -9.718%
- FilterInt64FilterNoNulls/262144/7 638.264 MiB/sec 568.923 MiB/sec -10.864%
FilterStringFilterWithNulls/262144/6 431.474 MiB/sec 484.077 MiB/sec 12.191%
- FilterStringFilterNoNulls/262144/14 1.245 GiB/sec 1008.386 MiB/sec -20.893%
FilterFSLInt64FilterWithNulls/262144/11 4.239 GiB/sec 4.029 GiB/sec -4.954%
- FilterStringFilterNoNulls/262144/8 10.899 GiB/sec 8.494 GiB/sec -22.064%
- FilterFSLInt64FilterNoNulls/262144/4 515.626 MiB/sec 406.426 MiB/sec -21.178%
FilterInt64FilterNoNulls/262144/6 3.697 GiB/sec 3.525 GiB/sec -4.664%
FilterInt64FilterNoNulls/262144/8 6.829 GiB/sec 6.809 GiB/sec -0.301%
- FilterFSLInt64FilterNoNulls/262144/2 6.453 GiB/sec 4.950 GiB/sec -23.289%
- FilterInt64FilterWithNulls/262144/13 606.984 MiB/sec 548.948 MiB/sec -9.561%
- FilterStringFilterNoNulls/262144/1 707.132 MiB/sec 609.027 MiB/sec -13.874%
FilterStringFilterWithNulls/262144/3 436.301 MiB/sec 488.825 MiB/sec 12.038%
FilterStringFilterWithNulls/262144/1 616.105 MiB/sec 675.493 MiB/sec 9.639%
FilterStringFilterNoNulls/262144/3 548.660 MiB/sec 533.539 MiB/sec -2.756%
- FilterFSLInt64FilterNoNulls/262144/9 268.363 MiB/sec 250.359 MiB/sec -6.709%
- FilterStringFilterNoNulls/262144/13 89.995 MiB/sec 76.326 MiB/sec -15.189%
FilterStringFilterWithNulls/262144/12 71.366 MiB/sec 82.415 MiB/sec 15.483%
FilterInt64FilterNoNulls/262144/9 3.209 GiB/sec 3.114 GiB/sec -2.971%
FilterFSLInt64FilterWithNulls/262144/9 288.819 MiB/sec 276.679 MiB/sec -4.203%
FilterStringFilterNoNulls/262144/12 66.141 MiB/sec 65.509 MiB/sec -0.956%
- FilterFSLInt64FilterWithNulls/262144/4 474.907 MiB/sec 429.013 MiB/sec -9.664%
- FilterInt64FilterWithNulls/262144/1 651.659 MiB/sec 556.258 MiB/sec -14.640%
FilterStringFilterWithNulls/262144/14 911.019 MiB/sec 871.756 MiB/sec -4.310%
- FilterInt64FilterNoNulls/262144/4 675.941 MiB/sec 569.448 MiB/sec -15.755%
- FilterFSLInt64FilterNoNulls/262144/13 352.227 MiB/sec 307.638 MiB/sec -12.659%
FilterInt64FilterWithNulls/262144/5 5.129 GiB/sec 4.921 GiB/sec -4.068%
- FilterFSLInt64FilterWithNulls/262144/14 4.168 GiB/sec 3.909 GiB/sec -6.200%
FilterStringFilterWithNulls/262144/9 396.156 MiB/sec 442.591 MiB/sec 11.721%
- FilterFSLInt64FilterNoNulls/262144/3 554.664 MiB/sec 464.787 MiB/sec -16.204%
- FilterStringFilterNoNulls/262144/2 11.394 GiB/sec 8.924 GiB/sec -21.683%
- FilterStringFilterWithNulls/262144/8 8.856 GiB/sec 8.075 GiB/sec -8.825%
- FilterFSLInt64FilterNoNulls/262144/10 389.368 MiB/sec 333.033 MiB/sec -14.468%
- FilterFSLInt64FilterNoNulls/262144/11 5.587 GiB/sec 4.507 GiB/sec -19.338%
FilterStringFilterWithNulls/262144/10 580.314 MiB/sec 612.106 MiB/sec 5.478%
- FilterFSLInt64FilterNoNulls/262144/5 6.032 GiB/sec 4.717 GiB/sec -21.802%
- FilterFSLInt64FilterNoNulls/262144/0 725.211 MiB/sec 565.535 MiB/sec -22.018%
- FilterInt64FilterNoNulls/262144/3 4.266 GiB/sec 3.855 GiB/sec -9.641%
- FilterInt64FilterWithNulls/262144/12 549.159 MiB/sec 499.761 MiB/sec -8.995%
- FilterInt64FilterWithNulls/262144/0 622.810 MiB/sec 497.075 MiB/sec -20.188%
- FilterInt64FilterNoNulls/262144/1 1.021 GiB/sec 980.686 MiB/sec -6.230%
- FilterFSLInt64FilterWithNulls/262144/0 399.890 MiB/sec 375.677 MiB/sec -6.055%
- FilterFSLInt64FilterWithNulls/262144/2 4.497 GiB/sec 4.233 GiB/sec -5.880%
- FilterFSLInt64FilterNoNulls/262144/1 564.700 MiB/sec 431.560 MiB/sec -23.577%
- FilterInt64FilterWithNulls/262144/9 549.832 MiB/sec 499.657 MiB/sec -9.125%
- FilterInt64FilterWithNulls/262144/7 625.701 MiB/sec 550.091 MiB/sec -12.084%
FilterInt64FilterNoNulls/262144/14 6.386 GiB/sec 6.901 GiB/sec 8.073%
FilterInt64FilterWithNulls/262144/8 5.034 GiB/sec 4.958 GiB/sec -1.517%
FilterInt64FilterNoNulls/262144/12 3.215 GiB/sec 3.131 GiB/sec -2.607%
FilterStringFilterNoNulls/262144/0 560.832 MiB/sec 545.275 MiB/sec -2.774%
- FilterStringFilterNoNulls/262144/7 641.313 MiB/sec 582.952 MiB/sec -9.100%
- FilterInt64FilterWithNulls/262144/3 615.558 MiB/sec 496.003 MiB/sec -19.422%
- FilterStringFilterNoNulls/262144/10 578.560 MiB/sec 506.085 MiB/sec -12.527%
FilterInt64FilterWithNulls/262144/14 4.934 GiB/sec 4.873 GiB/sec -1.228%
FilterInt64FilterNoNulls/262144/5 7.145 GiB/sec 6.863 GiB/sec -3.945%
FilterStringFilterWithNulls/262144/7 632.496 MiB/sec 669.411 MiB/sec 5.836%
FilterInt64FilterWithNulls/262144/11 4.937 GiB/sec 4.860 GiB/sec -1.544%
- FilterStringFilterWithNulls/262144/5 9.095 GiB/sec 8.275 GiB/sec -9.015%
FilterStringFilterNoNulls/262144/6 483.482 MiB/sec 470.273 MiB/sec -2.732%
- FilterFSLInt64FilterWithNulls/262144/7 464.358 MiB/sec 418.157 MiB/sec -9.949%
- FilterStringFilterNoNulls/262144/11 10.039 GiB/sec 7.873 GiB/sec -21.572%
FilterInt64FilterNoNulls/262144/11 6.389 GiB/sec 6.942 GiB/sec 8.664%
- FilterFSLInt64FilterNoNulls/262144/6 400.926 MiB/sec 355.070 MiB/sec -11.437%
- FilterStringFilterNoNulls/262144/5 10.942 GiB/sec 8.621 GiB/sec -21.211%
FilterInt64FilterNoNulls/262144/2 7.901 GiB/sec 7.942 GiB/sec 0.526%
- FilterFSLInt64FilterWithNulls/262144/13 387.523 MiB/sec 354.145 MiB/sec -8.613%
- FilterInt64FilterNoNulls/262144/10 635.634 MiB/sec 574.368 MiB/sec -9.639%
- FilterStringFilterWithNulls/262144/11 8.363 GiB/sec 7.663 GiB/sec -8.365%
- FilterInt64FilterWithNulls/262144/4 644.733 MiB/sec 554.689 MiB/sec -13.966%
- FilterInt64FilterWithNulls/262144/2 5.308 GiB/sec 4.950 GiB/sec -6.739%
- FilterInt64FilterWithNulls/262144/6 582.743 MiB/sec 494.561 MiB/sec -15.132%
FilterFSLInt64FilterWithNulls/262144/5 4.299 GiB/sec 4.094 GiB/sec -4.757%
FilterInt64FilterNoNulls/262144/0 7.685 GiB/sec 8.021 GiB/sec 4.371%
- FilterInt64FilterNoNulls/262144/13 634.999 MiB/sec 574.211 MiB/sec -9.573%
- FilterStringFilterWithNulls/262144/2 9.478 GiB/sec 8.593 GiB/sec -9.337%
FilterFSLInt64FilterWithNulls/262144/8 4.256 GiB/sec 4.060 GiB/sec -4.609%
- FilterFSLInt64FilterWithNulls/262144/10 422.316 MiB/sec 380.968 MiB/sec -9.791%
FilterStringFilterNoNulls/262144/9 383.197 MiB/sec 374.020 MiB/sec -2.395%
- FilterFSLInt64FilterNoNulls/262144/12 242.820 MiB/sec 227.762 MiB/sec -6.201%
FilterStringFilterWithNulls/262144/0 429.008 MiB/sec 493.378 MiB/sec 15.004%
- FilterFSLInt64FilterWithNulls/262144/12 267.881 MiB/sec 249.827 MiB/sec -6.739%
FilterFSLInt64FilterWithNulls/262144/3 349.988 MiB/sec 337.076 MiB/sec -3.689%
FilterStringFilterWithNulls/262144/13 90.911 MiB/sec 97.476 MiB/sec 7.222%
======================================= =============== ================ ======== |
|
Something weird with the commit history, I'm not sure those benchmarks are right. I'll rebase things again and rerun |
Small fix More work, start writing filter -> selection vector Things compiling again finally BinaryBitBlockCounter tests passing Consolidate take/filter tests in same module, fix GetTakeIndices / GetFilterOutputSize unit tests and implementations Finish filter implementation, tests passing again Clean up includes Tweak benchmark parameters Some string streamlining Python fixes Python test fixes. Add fast path for low-selectivity filters Low selectivity path for non-primitive filtering VisitFilter is not a dependent template Implement some obvious non-null filter optimizations Fix typo
…ter paths less spaghetti Split primitive filter paths between DROP/EMIT_NULL, improve readability
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112989) builder has been succeeded. Revision: 21227cc ======================================= =============== ================ ========
benchmark baseline contender change
======================================= =============== ================ ========
- FilterStringFilterNoNulls/262144/7 637.909 MiB/sec 572.355 MiB/sec -10.276%
- FilterStringFilterNoNulls/262144/8 10.897 GiB/sec 8.711 GiB/sec -20.057%
FilterStringFilterNoNulls/262144/6 485.775 MiB/sec 476.410 MiB/sec -1.928%
FilterStringFilterWithNulls/262144/4 649.558 MiB/sec 677.796 MiB/sec 4.347%
FilterInt64FilterNoNulls/262144/9 3.212 GiB/sec 3.264 GiB/sec 1.612%
- FilterFSLInt64FilterNoNulls/262144/13 351.877 MiB/sec 308.073 MiB/sec -12.449%
- FilterFSLInt64FilterNoNulls/262144/10 389.471 MiB/sec 333.418 MiB/sec -14.392%
- FilterInt64FilterNoNulls/262144/4 668.729 MiB/sec 625.199 MiB/sec -6.509%
FilterFSLInt64FilterWithNulls/262144/9 287.988 MiB/sec 276.495 MiB/sec -3.991%
- FilterStringFilterWithNulls/262144/2 9.441 GiB/sec 8.793 GiB/sec -6.865%
FilterStringFilterWithNulls/262144/12 73.855 MiB/sec 82.463 MiB/sec 11.656%
- FilterFSLInt64FilterNoNulls/262144/5 6.091 GiB/sec 4.403 GiB/sec -27.714%
- FilterFSLInt64FilterNoNulls/262144/3 550.519 MiB/sec 463.959 MiB/sec -15.723%
FilterInt64FilterNoNulls/262144/2 7.988 GiB/sec 7.976 GiB/sec -0.147%
- FilterStringFilterNoNulls/262144/4 700.795 MiB/sec 605.189 MiB/sec -13.643%
- FilterFSLInt64FilterWithNulls/262144/1 516.544 MiB/sec 460.521 MiB/sec -10.846%
- FilterStringFilterWithNulls/262144/8 8.877 GiB/sec 8.364 GiB/sec -5.779%
- FilterFSLInt64FilterWithNulls/262144/3 350.123 MiB/sec 329.103 MiB/sec -6.004%
FilterStringFilterWithNulls/262144/3 435.836 MiB/sec 494.167 MiB/sec 13.384%
FilterInt64FilterNoNulls/262144/10 630.544 MiB/sec 628.104 MiB/sec -0.387%
- FilterStringFilterNoNulls/262144/5 11.014 GiB/sec 8.788 GiB/sec -20.216%
FilterInt64FilterNoNulls/262144/3 4.263 GiB/sec 4.181 GiB/sec -1.936%
FilterInt64FilterWithNulls/262144/1 635.637 MiB/sec 615.015 MiB/sec -3.244%
FilterStringFilterWithNulls/262144/7 638.645 MiB/sec 678.465 MiB/sec 6.235%
- FilterFSLInt64FilterNoNulls/262144/2 6.506 GiB/sec 5.012 GiB/sec -22.975%
- FilterFSLInt64FilterNoNulls/262144/0 729.854 MiB/sec 569.623 MiB/sec -21.954%
FilterInt64FilterNoNulls/262144/5 6.946 GiB/sec 6.899 GiB/sec -0.674%
FilterInt64FilterWithNulls/262144/12 545.763 MiB/sec 547.657 MiB/sec 0.347%
FilterStringFilterNoNulls/262144/9 383.858 MiB/sec 377.178 MiB/sec -1.740%
- FilterFSLInt64FilterNoNulls/262144/8 5.825 GiB/sec 4.702 GiB/sec -19.289%
FilterInt64FilterNoNulls/262144/13 632.053 MiB/sec 633.157 MiB/sec 0.175%
FilterInt64FilterNoNulls/262144/1 1.020 GiB/sec 1.022 GiB/sec 0.239%
- FilterFSLInt64FilterNoNulls/262144/12 242.197 MiB/sec 228.152 MiB/sec -5.799%
FilterInt64FilterWithNulls/262144/4 640.980 MiB/sec 614.192 MiB/sec -4.179%
FilterInt64FilterWithNulls/262144/8 4.967 GiB/sec 5.071 GiB/sec 2.102%
- FilterFSLInt64FilterWithNulls/262144/0 396.373 MiB/sec 374.388 MiB/sec -5.546%
FilterInt64FilterWithNulls/262144/11 4.934 GiB/sec 4.997 GiB/sec 1.282%
- FilterFSLInt64FilterNoNulls/262144/14 5.435 GiB/sec 4.459 GiB/sec -17.946%
FilterInt64FilterNoNulls/262144/12 3.255 GiB/sec 3.185 GiB/sec -2.144%
FilterStringFilterWithNulls/262144/1 638.704 MiB/sec 690.413 MiB/sec 8.096%
- FilterStringFilterNoNulls/262144/2 11.411 GiB/sec 9.040 GiB/sec -20.778%
FilterInt64FilterWithNulls/262144/6 582.753 MiB/sec 554.462 MiB/sec -4.855%
FilterStringFilterWithNulls/262144/10 586.149 MiB/sec 616.404 MiB/sec 5.162%
FilterInt64FilterNoNulls/262144/0 7.653 GiB/sec 7.971 GiB/sec 4.146%
FilterInt64FilterWithNulls/262144/13 590.396 MiB/sec 607.816 MiB/sec 2.951%
- FilterStringFilterNoNulls/262144/14 1.254 GiB/sec 1011.778 MiB/sec -21.233%
- FilterFSLInt64FilterWithNulls/262144/4 474.573 MiB/sec 428.073 MiB/sec -9.798%
FilterInt64FilterWithNulls/262144/2 5.245 GiB/sec 5.072 GiB/sec -3.310%
- FilterStringFilterWithNulls/262144/11 8.381 GiB/sec 7.793 GiB/sec -7.006%
FilterFSLInt64FilterWithNulls/262144/14 4.065 GiB/sec 3.917 GiB/sec -3.648%
- FilterFSLInt64FilterNoNulls/262144/1 566.516 MiB/sec 432.124 MiB/sec -23.723%
FilterStringFilterWithNulls/262144/6 431.308 MiB/sec 489.475 MiB/sec 13.486%
- FilterFSLInt64FilterNoNulls/262144/9 267.636 MiB/sec 250.549 MiB/sec -6.385%
- FilterFSLInt64FilterWithNulls/262144/2 4.505 GiB/sec 4.244 GiB/sec -5.789%
- FilterStringFilterNoNulls/262144/1 699.807 MiB/sec 605.175 MiB/sec -13.523%
FilterInt64FilterWithNulls/262144/14 4.914 GiB/sec 4.970 GiB/sec 1.141%
- FilterStringFilterNoNulls/262144/11 9.990 GiB/sec 7.988 GiB/sec -20.035%
- FilterStringFilterNoNulls/262144/12 70.677 MiB/sec 65.603 MiB/sec -7.180%
FilterStringFilterWithNulls/262144/9 395.814 MiB/sec 447.434 MiB/sec 13.042%
FilterFSLInt64FilterWithNulls/262144/6 333.780 MiB/sec 319.575 MiB/sec -4.256%
FilterFSLInt64FilterWithNulls/262144/8 4.263 GiB/sec 4.091 GiB/sec -4.021%
FilterInt64FilterNoNulls/262144/14 6.414 GiB/sec 6.933 GiB/sec 8.095%
FilterStringFilterWithNulls/262144/0 441.849 MiB/sec 496.266 MiB/sec 12.316%
FilterInt64FilterNoNulls/262144/11 6.411 GiB/sec 6.874 GiB/sec 7.218%
- FilterInt64FilterNoNulls/262144/7 648.036 MiB/sec 547.011 MiB/sec -15.589%
- FilterFSLInt64FilterWithNulls/262144/10 419.063 MiB/sec 381.681 MiB/sec -8.920%
- FilterFSLInt64FilterWithNulls/262144/13 386.755 MiB/sec 353.726 MiB/sec -8.540%
FilterInt64FilterNoNulls/262144/8 6.724 GiB/sec 7.073 GiB/sec 5.190%
FilterInt64FilterWithNulls/262144/9 545.560 MiB/sec 545.449 MiB/sec -0.020%
- FilterStringFilterNoNulls/262144/10 575.809 MiB/sec 507.681 MiB/sec -11.832%
- FilterStringFilterWithNulls/262144/5 9.154 GiB/sec 8.428 GiB/sec -7.931%
FilterStringFilterNoNulls/262144/0 519.896 MiB/sec 554.802 MiB/sec 6.714%
FilterFSLInt64FilterWithNulls/262144/5 4.294 GiB/sec 4.126 GiB/sec -3.911%
- FilterFSLInt64FilterNoNulls/262144/7 463.085 MiB/sec 378.577 MiB/sec -18.249%
FilterFSLInt64FilterWithNulls/262144/11 4.245 GiB/sec 4.061 GiB/sec -4.333%
FilterStringFilterNoNulls/262144/3 544.102 MiB/sec 542.846 MiB/sec -0.231%
- FilterInt64FilterWithNulls/262144/0 617.474 MiB/sec 560.813 MiB/sec -9.176%
FilterInt64FilterWithNulls/262144/7 619.732 MiB/sec 609.068 MiB/sec -1.721%
FilterStringFilterWithNulls/262144/13 91.185 MiB/sec 97.530 MiB/sec 6.958%
- FilterStringFilterWithNulls/262144/14 929.857 MiB/sec 874.512 MiB/sec -5.952%
- FilterInt64FilterWithNulls/262144/3 604.918 MiB/sec 560.882 MiB/sec -7.280%
- FilterFSLInt64FilterNoNulls/262144/4 514.014 MiB/sec 411.713 MiB/sec -19.902%
- FilterFSLInt64FilterWithNulls/262144/7 463.921 MiB/sec 417.320 MiB/sec -10.045%
- FilterFSLInt64FilterWithNulls/262144/12 267.697 MiB/sec 247.408 MiB/sec -7.579%
- FilterFSLInt64FilterNoNulls/262144/11 5.632 GiB/sec 4.533 GiB/sec -19.515%
- FilterStringFilterNoNulls/262144/13 90.578 MiB/sec 76.367 MiB/sec -15.690%
FilterInt64FilterNoNulls/262144/6 3.709 GiB/sec 3.680 GiB/sec -0.786%
FilterInt64FilterWithNulls/262144/5 5.115 GiB/sec 4.997 GiB/sec -2.309%
FilterInt64FilterWithNulls/262144/10 604.161 MiB/sec 607.760 MiB/sec 0.596%
- FilterFSLInt64FilterNoNulls/262144/6 389.763 MiB/sec 354.969 MiB/sec -8.927%
======================================= =============== ================ ======== |
|
So these "readability" improvements made performance worse so I'll revert them |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#113048) builder has been succeeded. Revision: 54bb838 ======================================= =============== =============== ========
benchmark baseline contender change
======================================= =============== =============== ========
FilterStringFilterWithNulls/262144/9 395.928 MiB/sec 397.664 MiB/sec 0.439%
FilterInt64FilterWithNulls/262144/0 621.828 MiB/sec 613.884 MiB/sec -1.277%
FilterStringFilterWithNulls/262144/10 578.179 MiB/sec 577.449 MiB/sec -0.126%
FilterFSLInt64FilterWithNulls/262144/14 4.068 GiB/sec 4.018 GiB/sec -1.247%
FilterInt64FilterWithNulls/262144/13 604.515 MiB/sec 575.481 MiB/sec -4.803%
FilterFSLInt64FilterNoNulls/262144/13 350.875 MiB/sec 355.061 MiB/sec 1.193%
FilterStringFilterWithNulls/262144/0 441.188 MiB/sec 442.379 MiB/sec 0.270%
FilterInt64FilterWithNulls/262144/7 623.569 MiB/sec 594.423 MiB/sec -4.674%
FilterStringFilterWithNulls/262144/12 73.925 MiB/sec 73.930 MiB/sec 0.007%
FilterStringFilterNoNulls/262144/3 548.889 MiB/sec 548.269 MiB/sec -0.113%
FilterInt64FilterNoNulls/262144/0 7.942 GiB/sec 8.079 GiB/sec 1.727%
FilterInt64FilterNoNulls/262144/6 3.827 GiB/sec 3.725 GiB/sec -2.665%
FilterStringFilterWithNulls/262144/2 9.138 GiB/sec 9.205 GiB/sec 0.726%
FilterFSLInt64FilterWithNulls/262144/13 385.938 MiB/sec 370.599 MiB/sec -3.975%
FilterInt64FilterWithNulls/262144/9 549.281 MiB/sec 542.112 MiB/sec -1.305%
FilterInt64FilterWithNulls/262144/2 5.253 GiB/sec 5.047 GiB/sec -3.918%
FilterFSLInt64FilterNoNulls/262144/5 5.778 GiB/sec 5.676 GiB/sec -1.761%
FilterStringFilterNoNulls/262144/1 711.705 MiB/sec 697.941 MiB/sec -1.934%
FilterStringFilterNoNulls/262144/0 560.111 MiB/sec 560.315 MiB/sec 0.036%
FilterStringFilterWithNulls/262144/5 8.773 GiB/sec 8.976 GiB/sec 2.318%
FilterInt64FilterWithNulls/262144/11 4.863 GiB/sec 4.942 GiB/sec 1.631%
FilterFSLInt64FilterWithNulls/262144/11 4.145 GiB/sec 4.089 GiB/sec -1.362%
FilterInt64FilterNoNulls/262144/2 7.854 GiB/sec 7.609 GiB/sec -3.117%
FilterStringFilterNoNulls/262144/11 9.751 GiB/sec 9.565 GiB/sec -1.904%
FilterStringFilterNoNulls/262144/7 641.570 MiB/sec 650.710 MiB/sec 1.425%
FilterStringFilterWithNulls/262144/3 435.185 MiB/sec 436.932 MiB/sec 0.401%
FilterFSLInt64FilterNoNulls/262144/14 5.202 GiB/sec 5.302 GiB/sec 1.915%
FilterInt64FilterNoNulls/262144/4 674.907 MiB/sec 654.585 MiB/sec -3.011%
FilterInt64FilterNoNulls/262144/5 7.023 GiB/sec 6.971 GiB/sec -0.741%
FilterInt64FilterWithNulls/262144/12 548.203 MiB/sec 542.909 MiB/sec -0.966%
FilterFSLInt64FilterNoNulls/262144/10 387.772 MiB/sec 390.564 MiB/sec 0.720%
FilterInt64FilterWithNulls/262144/8 4.951 GiB/sec 5.094 GiB/sec 2.880%
FilterStringFilterNoNulls/262144/13 90.750 MiB/sec 91.694 MiB/sec 1.040%
FilterFSLInt64FilterWithNulls/262144/12 230.292 MiB/sec 263.113 MiB/sec 14.252%
FilterStringFilterNoNulls/262144/12 70.772 MiB/sec 70.740 MiB/sec -0.044%
FilterStringFilterWithNulls/262144/14 927.254 MiB/sec 925.791 MiB/sec -0.158%
FilterStringFilterNoNulls/262144/5 10.587 GiB/sec 10.322 GiB/sec -2.509%
FilterFSLInt64FilterNoNulls/262144/3 551.473 MiB/sec 556.816 MiB/sec 0.969%
FilterInt64FilterNoNulls/262144/14 6.302 GiB/sec 6.848 GiB/sec 8.656%
FilterInt64FilterWithNulls/262144/14 4.804 GiB/sec 4.945 GiB/sec 2.933%
FilterStringFilterNoNulls/262144/14 1.257 GiB/sec 1.247 GiB/sec -0.814%
FilterFSLInt64FilterNoNulls/262144/6 399.266 MiB/sec 402.455 MiB/sec 0.799%
FilterInt64FilterWithNulls/262144/5 5.037 GiB/sec 4.954 GiB/sec -1.645%
FilterFSLInt64FilterNoNulls/262144/8 5.576 GiB/sec 5.576 GiB/sec -0.004%
FilterFSLInt64FilterNoNulls/262144/7 462.231 MiB/sec 456.668 MiB/sec -1.203%
FilterFSLInt64FilterNoNulls/262144/11 5.377 GiB/sec 5.381 GiB/sec 0.082%
FilterStringFilterNoNulls/262144/6 487.645 MiB/sec 487.464 MiB/sec -0.037%
FilterStringFilterNoNulls/262144/4 687.214 MiB/sec 678.019 MiB/sec -1.338%
FilterFSLInt64FilterWithNulls/262144/9 287.916 MiB/sec 285.805 MiB/sec -0.733%
FilterInt64FilterNoNulls/262144/9 3.245 GiB/sec 3.126 GiB/sec -3.683%
FilterFSLInt64FilterWithNulls/262144/1 514.149 MiB/sec 501.235 MiB/sec -2.512%
FilterInt64FilterNoNulls/262144/11 6.304 GiB/sec 6.838 GiB/sec 8.471%
FilterInt64FilterWithNulls/262144/4 642.597 MiB/sec 617.492 MiB/sec -3.907%
FilterFSLInt64FilterNoNulls/262144/0 723.263 MiB/sec 719.475 MiB/sec -0.524%
FilterFSLInt64FilterWithNulls/262144/2 4.335 GiB/sec 4.281 GiB/sec -1.228%
FilterStringFilterWithNulls/262144/8 8.635 GiB/sec 8.847 GiB/sec 2.451%
FilterFSLInt64FilterWithNulls/262144/4 473.024 MiB/sec 457.711 MiB/sec -3.237%
FilterStringFilterWithNulls/262144/4 637.237 MiB/sec 646.187 MiB/sec 1.405%
FilterStringFilterWithNulls/262144/6 430.118 MiB/sec 433.059 MiB/sec 0.684%
FilterStringFilterNoNulls/262144/10 572.254 MiB/sec 573.892 MiB/sec 0.286%
FilterStringFilterWithNulls/262144/1 644.800 MiB/sec 644.056 MiB/sec -0.115%
FilterStringFilterWithNulls/262144/7 635.644 MiB/sec 640.796 MiB/sec 0.810%
FilterInt64FilterWithNulls/262144/6 581.863 MiB/sec 575.886 MiB/sec -1.027%
FilterFSLInt64FilterNoNulls/262144/4 513.508 MiB/sec 499.319 MiB/sec -2.763%
FilterInt64FilterNoNulls/262144/13 632.203 MiB/sec 613.689 MiB/sec -2.928%
FilterStringFilterNoNulls/262144/8 10.491 GiB/sec 10.181 GiB/sec -2.953%
FilterFSLInt64FilterNoNulls/262144/1 563.147 MiB/sec 540.663 MiB/sec -3.993%
FilterFSLInt64FilterNoNulls/262144/9 267.226 MiB/sec 269.194 MiB/sec 0.736%
FilterFSLInt64FilterWithNulls/262144/10 420.329 MiB/sec 405.197 MiB/sec -3.600%
- FilterInt64FilterNoNulls/262144/1 1.022 GiB/sec 922.850 MiB/sec -11.845%
FilterInt64FilterNoNulls/262144/7 652.709 MiB/sec 631.526 MiB/sec -3.245%
FilterStringFilterNoNulls/262144/2 11.144 GiB/sec 10.843 GiB/sec -2.698%
FilterStringFilterWithNulls/262144/13 91.231 MiB/sec 91.638 MiB/sec 0.446%
FilterInt64FilterNoNulls/262144/12 3.242 GiB/sec 3.112 GiB/sec -4.024%
FilterFSLInt64FilterNoNulls/262144/12 242.297 MiB/sec 242.607 MiB/sec 0.128%
FilterFSLInt64FilterNoNulls/262144/2 6.165 GiB/sec 6.062 GiB/sec -1.679%
FilterFSLInt64FilterWithNulls/262144/6 331.566 MiB/sec 332.386 MiB/sec 0.247%
FilterInt64FilterWithNulls/262144/1 648.702 MiB/sec 622.712 MiB/sec -4.006%
FilterFSLInt64FilterWithNulls/262144/5 4.123 GiB/sec 4.122 GiB/sec -0.014%
FilterFSLInt64FilterWithNulls/262144/0 399.262 MiB/sec 398.338 MiB/sec -0.231%
FilterFSLInt64FilterWithNulls/262144/3 347.643 MiB/sec 349.930 MiB/sec 0.658%
FilterInt64FilterNoNulls/262144/3 4.312 GiB/sec 4.291 GiB/sec -0.478%
FilterStringFilterWithNulls/262144/11 8.207 GiB/sec 8.348 GiB/sec 1.720%
FilterStringFilterNoNulls/262144/9 391.780 MiB/sec 391.367 MiB/sec -0.106%
FilterFSLInt64FilterWithNulls/262144/8 4.142 GiB/sec 4.103 GiB/sec -0.926%
FilterInt64FilterNoNulls/262144/8 6.703 GiB/sec 6.908 GiB/sec 3.063%
FilterInt64FilterWithNulls/262144/10 604.595 MiB/sec 575.671 MiB/sec -4.784%
FilterFSLInt64FilterWithNulls/262144/7 461.693 MiB/sec 447.411 MiB/sec -3.093%
FilterInt64FilterNoNulls/262144/10 632.128 MiB/sec 614.452 MiB/sec -2.796%
FilterInt64FilterWithNulls/262144/3 613.629 MiB/sec 607.939 MiB/sec -0.927%
======================================= =============== =============== ======== |
|
+1. Thanks all for the comments |
NOTE: the diff is artificially larger due to some code rearranging (that was necessitated because of how some data selection code is shared between the Take and Filter implementations).
Summary:
Some incidental changes:
compute::internal::GetTakeIndices. I have also altered the implementation of filtering a record batch to use this, which should be faster (it would be good to have some benchmarks to confirm this).