-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13451: [C++] WIP: POC: benchmark using hash aggregate kernels for scalar aggregation #10813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is just to see how the hash aggregate kernels perform compared to the dedicated scalar aggregation kernels in the case that there is only one group. I only tested MinMax and Count. I didn't test Sum/Mean since the scalar kernels use pairwise summation and the hash aggregate kernels use naive summation. Unfortunately, it's rather terrible. For count: DetailsAt 2 orders of magnitude slower, the hash aggregate kernel isn't anywhere near the dedicated scalar one. The scalar kernel essentially just calls CountSetBits, while the hash aggregate kernel must use VisitSetBitRuns and index into a length-1 array of counts. Also, a good amount of time (~10% of the runtime according to perf) is spent just allocating and filling an array of group IDs to use at the start. For min_max the story is not so clear. The hash aggregate kernel actually wins for floats, but loses badly (not as badly as with Count) for integers. DetailsFor integers, the scalar kernel has a SIMD variant that gets leveraged. For doubles, it's unclear to me why the scalar kernel is slower; they both boil down to just min/max calls, though the scalar kernel uses fmin/fmax and the hash kernel uses std::min/std::max. |
|
Ah, and for min_max, the scalar kernel becomes much faster if it calls std::max instead of std::fmax: So in all cases the hash aggregate kernel is considerably slower. |
|
|
|
Ah indeed. Testing it with std::fmax in both kernels, the hash aggregate kernel is slower in all cases: |
|
Well that's disappointing. I'll rewrite the JIRA some- we could still consolidate to a single compute::Function and decide between scalar and grouped aggregation in Dispatch* |
|
Thanks for looking into this, @lidavidm |
|
It would have been nice to maintain half as many aggregate kernels. As you say, we could consolidate the implementations into a single logical function (we'd have to consolidate HashAggregateFunction and ScalarAggregateFunction, though?). |
No description provided.