-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rework GroupByHash to for faster performance and support grouping by nulls #808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4265132 to
9ad6719
Compare
de95fef to
a3176f8
Compare
a3176f8 to
89e61ef
Compare
89e61ef to
6f1283d
Compare
|
I am basically done with this PR. All that remains in my mind is to run some benchmarks and I'll mark it as ready for review |
|
On the db-benchmark aggregation queries: PR: Master: It looks like it's a small perf hit on q2, but I think the other 4 queries do greatly compensate for this 🎉 |
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have skimmed #812 and this. I do not blame to understand everything but the code does it is described to do. The tests and performance numbers look great.
| /// scratch space used to collect indices for input rows in a | ||
| /// bach that have values to aggregate. Reset on each batch | ||
| indices: Vec<u32>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
| eq_array_primitive!(array, index, IntervalDayTimeArray, val) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
| let u32_vals = make_typed_vec!(u8_vals, u32); | ||
| let u64_vals = make_typed_vec!(u8_vals, u64); | ||
|
|
||
| let str_vals = vec![Some("foo"), None, Some("bar")]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why all the second value is always NULL? Will it be more general to have it random (first or third)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NULL is present to test null handling (which found a bug in my dictionary implementation, actually)
It is always the second entry because:
- I basically copy/pasted the tests
- I figured putting the validity bit in the middle (rather than at the ends) would be more likely to catch potential latent bugs (though your suggestion of varying its location is probably better). In theory all the null edge cases should be handled in the underlying arrow code
|
I had a good look and think all looks GREAT! |
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work and results 🥇. Thanks a lot for this, @alamb !
|
Thanks @Dandandan and @jorgecarleitao -- I plan to merge #812 in first and leave this one open for another few days in case anyone else wants to comment. |
0051a85 to
c5bc0c1
Compare
|
Rebased now that #812 has been merged |
|
A TCP-H query that got quite a bit faster is q13, on parquet SF=100 from 37.8s -> 29.5s |
|
Thanks @alamb ! 🎉 🎉 🎉 🎉 |
|
Thanks everyone for all the help. This was a very cool experience of collaborative development for me |
|
That is an interesting article.
Looking at the summary:
The common implementation of the function using hashing techniques
suffers lower throughput rate due to the collision of the insert keys in
the hashing techniques.....
I actually found it very hard to test the group by collision handling
correctness because the hashing technique in `create_hashes` was so good I
could not find any example data that hased to the same value in a
reasonable amount of time -- LOL
However, the technique to search several slots at once might indeed be
relevant
<https://www.researchgate.net/figure/SIMD-accelerated-cuckoo-hashing-extended-from-Ross-et-al-14_fig1_326669722>
…On Sat, Aug 14, 2021 at 4:58 AM Jorge Leitao ***@***.***> wrote:
Potentially relevant:
https://www.researchgate.net/publication/326669722_SIMD_Vectorized_Hashing_for_Grouped_Aggregation_22nd_European_Conference_ADBIS_2018_Budapest_Hungary_September_2-5_2018_Proceedings
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#808 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADXZMNMY67U56FRXCORIV3T4Y45BANCNFSM5BLDRZWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
|
Hashbrown already implements many tricks like this I believe, it's one of the fastest hash table implementations: There is also a nightly rawtable API to retrieve multiple values at once So far, in profiling results, I haven't seen the probing/hashmap itself being a very expensive part currently. AFAIK It's mostly other parts that could be optimized: updating the states/values, collision checks, converting to array, creating hash values, actual |
* Add window function as template for others and function builder * Adding docstrings * Change last_value to use function builder instead of explicitly passing values * Allow any value for lead function default value and add unit test * Add lead window function and unit tests * Temporarily commenting out deprecated functions in documenation so builder will pass * Expose row_number window function * Add rank window function * Add percent rank and dense rank * Add cume_dist * Add ntile window function * Add comment to update when upstream merges * Window frame required calling inner value * Add unit test for avg as window function * Working on documentation for window functions * Add pyo build config file to git ignore since this is user specific * Add examples to docstring * Optionally add window function parameters during function call * Update sort and order_by to apply automatic ordering if any other expression is given * Update unit tests to be cleaner and use default sort on expressions * Ignore vscode folder specific settings * Window frames should only apply to aggregate functions used as window functions. Also pass in scalar pyarrow values so we can set a range other than a uint * Remove deprecated warning until we actually have a way to use all functions without calling window() * Built in window functions do not have any impact by setting null_treatment so remove from user facing * Update user documentation on how to pass parameters for different window functions and what their impacts are * Make first_value and last_value identical in the interface
Which issue does this PR close?
Closes #790 by implementing a new design for group by hash. Built on #812 so it may be easier to review that one first
This PR is an amazing collaborative effort and includes ideas from @Dandandan @jhorstmann @rdettai @jorgecarleitao and likely others I forgot.
Rationale for this change
What changes are included in this PR?
Potential Follow On Work
Performance Summary
db-query
The numbers @Dandandan measureed are at #808 (comment)
DataFusion Synthetic Aggregate Benchmark (newly created for this work):
Full results, https://github.com/alamb/datafusion_aggregate_bench/blob/main/RESULTS.md . Summary:
gby_null/master(less than 1 is better)100 Groups; 100M rows, int64_keys(10% nulls), f64 values(1% nulls)100 Groups; 100M rows, utf8_keys(10% nulls), f64 values(1% nulls)100 Groups; 100M rows, dictionary(utf8, int32) keys(10% nulls), f64 values(1% nulls)aggregate micro benchmarks
Tested via
Results
Performance Source
This approach avoids the following operations which should improve its speed:
group_valuesinsteadAre there any user-facing changes?
Faster performance
Notes:
I tried to keep the same names and structure of the existing hash algorithm (as I found that easy to follow -- nice work @Dandandan and @andygrove ) and I think that will make this easier to review
Items completed
Add ascii art diagrams(the design got simpler usingRawTableAPI)ScalarValueandArrayRefAdd ScalarValue::eq_array optimized comparison function #844