-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7179: [C++][Python][R] Consolidate coalesce/fill_null #10987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried using BitRunReader instead?
(er, perhaps we already had this conversation? :-S)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not helpful in case_when but might be useful here, I can give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may largely depend on the clustering of nulls/non-nulls, hence the suggestion to benchmark different null probabilites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it does better, though I could imagine it doing badly at ~50% nulls.
--------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------
Before:
CoalesceScalarBench64/0 1321249 ns 1321175 ns 508 bytes_per_second=5.9133G/s items_per_second=793.67M/s length=1048.58k null%=1 num_args=2
CoalesceScalarBench64/2 5049228 ns 5049073 ns 133 bytes_per_second=1.54731G/s items_per_second=207.677M/s length=1048.58k null%=25 num_args=2
CoalesceScalarStringBench/0 87407512 ns 87404678 ns 7 bytes_per_second=5.70364G/s items_per_second=11.9968M/s length=1048.58k null%=1 num_args=2
CoalesceScalarStringBench/2 77686321 ns 77685936 ns 8 bytes_per_second=4.87622G/s items_per_second=13.4976M/s length=1048.58k null%=25 num_args=2
After:
CoalesceScalarBench64/0 988642 ns 988606 ns 664 bytes_per_second=7.90255G/s items_per_second=1060.66M/s length=1048.58k null%=1 num_args=2
CoalesceScalarBench64/2 4537487 ns 4537432 ns 154 bytes_per_second=1.72179G/s items_per_second=231.095M/s length=1048.58k null%=25 num_args=2
CoalesceScalarStringBench/0 103582667 ns 103575544 ns 6 bytes_per_second=4.81315G/s items_per_second=10.1238M/s length=1048.58k null%=1 num_args=2
CoalesceScalarStringBench/2 79813292 ns 79807705 ns 8 bytes_per_second=4.74658G/s items_per_second=13.1388M/s length=1048.58k null%=25 num_args=2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness, I tested 50% and 99% nulls too. BitRunReader is still faster overall, maybe a bit slower with 50% nulls with strings.
--------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------
BitBlockCounter:
CoalesceScalarBench64/0 1488163 ns 1488088 ns 457 bytes_per_second=5.25003G/s items_per_second=704.647M/s length=1048.58k null%=1 num_args=2
CoalesceScalarBench64/2 5307719 ns 5307520 ns 133 bytes_per_second=1.47197G/s items_per_second=197.564M/s length=1048.58k null%=25 num_args=2
CoalesceScalarBench64/4 8338070 ns 8337713 ns 83 bytes_per_second=959.496M/s items_per_second=125.763M/s length=1048.58k null%=50 num_args=2
CoalesceScalarBench64/6 2187502 ns 2187409 ns 323 bytes_per_second=3.57158G/s items_per_second=479.369M/s length=1048.58k null%=99 num_args=2
CoalesceScalarStringBench/0 112844773 ns 112838638 ns 5 bytes_per_second=4.41803G/s items_per_second=9.2927M/s length=1048.58k null%=1 num_args=2
CoalesceScalarStringBench/2 94380710 ns 94376399 ns 8 bytes_per_second=4.01386G/s items_per_second=11.1106M/s length=1048.58k null%=25 num_args=2
CoalesceScalarStringBench/4 54766774 ns 54766616 ns 10 bytes_per_second=4.64332G/s items_per_second=19.1463M/s length=1048.58k null%=50 num_args=2
CoalesceScalarStringBench/6 6268787 ns 6268586 ns 108 bytes_per_second=1.41205G/s items_per_second=167.275M/s length=1048.58k null%=99 num_args=2
BitRunReader:
CoalesceScalarBench64/0 994000 ns 994007 ns 717 bytes_per_second=7.8596G/s items_per_second=1054.9M/s length=1048.58k null%=1 num_args=2
CoalesceScalarBench64/2 4624991 ns 4625012 ns 153 bytes_per_second=1.68918G/s items_per_second=226.719M/s length=1048.58k null%=25 num_args=2
CoalesceScalarBench64/4 7016190 ns 7016189 ns 102 bytes_per_second=1.1135G/s items_per_second=149.451M/s length=1048.58k null%=50 num_args=2
CoalesceScalarBench64/6 1013352 ns 1013358 ns 672 bytes_per_second=7.70952G/s items_per_second=1034.75M/s length=1048.58k null%=99 num_args=2
CoalesceScalarStringBench/0 110433694 ns 110433042 ns 5 bytes_per_second=4.51427G/s items_per_second=9.49513M/s length=1048.58k null%=1 num_args=2
CoalesceScalarStringBench/2 79904343 ns 79904642 ns 7 bytes_per_second=4.74082G/s items_per_second=13.1228M/s length=1048.58k null%=25 num_args=2
CoalesceScalarStringBench/4 61812967 ns 61813273 ns 10 bytes_per_second=4.11398G/s items_per_second=16.9636M/s length=1048.58k null%=50 num_args=2
CoalesceScalarStringBench/6 6219633 ns 6219571 ns 105 bytes_per_second=1.42317G/s items_per_second=168.593M/s length=1048.58k null%=99 num_args=2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the numbers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be moved up before // Array with nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for left_valid to be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to also parametrize the null_probability used in the coalesce benchmarks? I think we have some logic to do this (see e.g. RegressionArgs and the Take benchmarks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for left_valid to be null? If there are no nulls in the left argument, I would expect this to shortcut.
f46a228 to
2dc5e0d
Compare
|
I've submitted a PR to your branch to improve performance: I think the same approach may be used for other conditional kernels, but that would be a separate JIRA. |
Before: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- CoalesceBench64/0 3366683 ns 3364669 ns 213 bytes_per_second=2.32192G/s items_per_second=311.643M/s length=1048.58k null%=1 num_args=2 CoalesceBench64/1 2851663 ns 2849846 ns 245 bytes_per_second=2.74138G/s items_per_second=367.941M/s length=1048.58k null%=1 num_args=4 CoalesceBench64/2 7499865 ns 7495813 ns 95 bytes_per_second=1067.26M/s items_per_second=139.888M/s length=1048.58k null%=25 num_args=2 CoalesceBench64/3 11773437 ns 11766272 ns 60 bytes_per_second=679.909M/s items_per_second=89.1171M/s length=1048.58k null%=25 num_args=4 CoalesceBench64/4 9636207 ns 9631169 ns 73 bytes_per_second=830.636M/s items_per_second=108.873M/s length=1048.58k null%=50 num_args=2 CoalesceBench64/5 19456855 ns 19445858 ns 36 bytes_per_second=411.399M/s items_per_second=53.9228M/s length=1048.58k null%=50 num_args=4 CoalesceBench64/6 3288217 ns 3286426 ns 214 bytes_per_second=2.3772G/s items_per_second=319.063M/s length=1048.58k null%=99 num_args=2 CoalesceBench64/7 7603232 ns 7599720 ns 92 bytes_per_second=1052.67M/s items_per_second=137.976M/s length=1048.58k null%=99 num_args=4 CoalesceScalarBench64/0 775260 ns 774797 ns 904 bytes_per_second=10.0833G/s items_per_second=1.35336G/s length=1048.58k null%=1 num_args=2 CoalesceScalarBench64/2 3500267 ns 3498388 ns 201 bytes_per_second=2.23317G/s items_per_second=299.731M/s length=1048.58k null%=25 num_args=2 CoalesceScalarBench64/4 4815186 ns 4812821 ns 146 bytes_per_second=1.62327G/s items_per_second=217.871M/s length=1048.58k null%=50 num_args=2 CoalesceScalarBench64/6 446897 ns 446783 ns 1541 bytes_per_second=17.4861G/s items_per_second=2.34695G/s length=1048.58k null%=99 num_args=2 CoalesceScalarStringBench/0 74138532 ns 74089097 ns 10 bytes_per_second=6.72872G/s items_per_second=14.1529M/s length=1048.58k null%=1 num_args=2 CoalesceScalarStringBench/2 58106933 ns 58064020 ns 9 bytes_per_second=6.52407G/s items_per_second=18.059M/s length=1048.58k null%=25 num_args=2 CoalesceScalarStringBench/4 52094990 ns 52064312 ns 10 bytes_per_second=4.88432G/s items_per_second=20.14M/s length=1048.58k null%=50 num_args=2 CoalesceScalarStringBench/6 5136540 ns 5133121 ns 138 bytes_per_second=1.7244G/s items_per_second=204.276M/s length=1048.58k null%=99 num_args=2 ``` After: ``` -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------- CoalesceBench64/0 1047061 ns 1046399 ns 661 bytes_per_second=7.46608G/s items_per_second=1002.08M/s length=1048.58k null%=1 num_args=2 CoalesceBench64/1 1377282 ns 1376405 ns 511 bytes_per_second=5.67602G/s items_per_second=761.822M/s length=1048.58k null%=1 num_args=4 CoalesceBench64/2 2804061 ns 2802178 ns 251 bytes_per_second=2.78801G/s items_per_second=374.2M/s length=1048.58k null%=25 num_args=2 CoalesceBench64/3 5234633 ns 5230898 ns 134 bytes_per_second=1.49353G/s items_per_second=200.458M/s length=1048.58k null%=25 num_args=4 CoalesceBench64/4 3700820 ns 3698116 ns 190 bytes_per_second=2.11256G/s items_per_second=283.543M/s length=1048.58k null%=50 num_args=2 CoalesceBench64/5 7731316 ns 7726379 ns 90 bytes_per_second=1035.41M/s items_per_second=135.714M/s length=1048.58k null%=50 num_args=4 CoalesceBench64/6 1004359 ns 1003745 ns 693 bytes_per_second=7.78335G/s items_per_second=1044.66M/s length=1048.58k null%=99 num_args=2 CoalesceBench64/7 4660379 ns 4658001 ns 151 bytes_per_second=1.67722G/s items_per_second=225.113M/s length=1048.58k null%=99 num_args=4 CoalesceScalarBench64/0 656265 ns 655870 ns 1067 bytes_per_second=11.9117G/s items_per_second=1.59876G/s length=1048.58k null%=1 num_args=2 CoalesceScalarBench64/2 2889294 ns 2887898 ns 242 bytes_per_second=2.70525G/s items_per_second=363.093M/s length=1048.58k null%=25 num_args=2 CoalesceScalarBench64/4 4015990 ns 4014054 ns 175 bytes_per_second=1.94629G/s items_per_second=261.226M/s length=1048.58k null%=50 num_args=2 CoalesceScalarBench64/6 390245 ns 390138 ns 1800 bytes_per_second=20.025G/s items_per_second=2.68771G/s length=1048.58k null%=99 num_args=2 CoalesceScalarStringBench/0 82277097 ns 82223643 ns 9 bytes_per_second=6.06303G/s items_per_second=12.7527M/s length=1048.58k null%=1 num_args=2 CoalesceScalarStringBench/2 70821126 ns 70771323 ns 10 bytes_per_second=5.35265G/s items_per_second=14.8164M/s length=1048.58k null%=25 num_args=2 CoalesceScalarStringBench/4 47119447 ns 47087724 ns 13 bytes_per_second=5.40053G/s items_per_second=22.2686M/s length=1048.58k null%=50 num_args=2 CoalesceScalarStringBench/6 4579486 ns 4576728 ns 150 bytes_per_second=1.93403G/s items_per_second=229.11M/s length=1048.58k null%=99 num_args=2 ```
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
@ursabot please benchmark |
|
Benchmark runs are scheduled for baseline = 20885df and contender = 43a4450. Results will be available as each benchmark for each run completes. |
|
I think the benchmarks changed from git master, so I'm not sure there will be meaningful changes shown by conbench. |
Indeed, it seems there are no significant changes in the already existing benchmarks. |
Apparently |
|
Thanks! case_when could probably use some work, though I'm currently in the middle of digging into it to get dictionary support. select also could use some work (though the task there is harder). I'll file issues/take a look when I get a chance. |
This removes fill_null in favor of coalesce.
A few points: