-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11541: [C++][Compute] Implement tdigest kernel #9435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious whether this is best exposed as an interpolation kind for the "quantile" function, or a separate function altogether. Are there precedents in other libraries or database engines?
It seems R uses a separate function:
https://www.rdocumentation.org/packages/tdigest/versions/0.3.0/topics/tdigest
cc @nealrichardson @michalursa for opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matlab quantile has a method parameter, which can be exact (linear interpolation) or approximate (t-digest).
https://www.mathworks.com/help/stats/quantile.html#mw_06fb6551-ec1a-48f6-a690-f790dfaa9849
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW that tdigest R function is not part of base R, it is a contributed package.
R's quantile function also supports multiple methods, 9 in fact, via the type parameter: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
None of these are approximate in the same way as this, but it's a further argument that it could be a function parameter rather than a separate function. TBH I don't know how much it matters, as a compute API consumer I can make either work. It's marginally easier if they're separate functions rather than having to mess with FunctionOptions to select them--are maybe there are downsides to that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the t-digest method also has its own parameters (compression and delta). If we ever want to expose them, it would probably be less confusing as a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like separate function makes the most sense unless there's a cost to having an additional function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the t-digest method also has its own parameters (compression and delta). If we ever want to expose them, it would probably be less confusing as a separate function.
Good point. I will separate it into another function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added separated TDigest kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou I've separated TDigest kernel, please review, thanks.
d4608a9 to
356c300
Compare
|
appveyor complains protobuf namespace collision, should be fixed by pr #9542 |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Some comments below.
Also, it seems there's a performance problem here where TDigest is much slower than Quantile:
QuantileKernelDouble/1048576/10000 1104 us 1104 us 628 bytes_per_second=905.728M/s null_percent=0.01 size=1048.58k
QuantileKernelDouble/1048576/100 1242 us 1242 us 562 bytes_per_second=805.287M/s null_percent=1 size=1048.58k
QuantileKernelDouble/1048576/10 898 us 897 us 780 bytes_per_second=1114.26M/s null_percent=10 size=1048.58k
QuantileKernelDouble/1048576/2 864 us 863 us 810 bytes_per_second=1.13112G/s null_percent=50 size=1048.58k
QuantileKernelDouble/1048576/1 0.920 us 0.920 us 767094 bytes_per_second=1062.03G/s null_percent=100 size=1048.58k
QuantileKernelDouble/1048576/0 1222 us 1221 us 570 bytes_per_second=818.707M/s null_percent=0 size=1048.58k
TDigestKernel/1048576/10000 5933 us 5932 us 118 bytes_per_second=168.565M/s null_percent=0.01 size=1048.58k
TDigestKernel/1048576/100 5899 us 5898 us 119 bytes_per_second=169.563M/s null_percent=1 size=1048.58k
TDigestKernel/1048576/10 5518 us 5517 us 127 bytes_per_second=181.242M/s null_percent=10 size=1048.58k
TDigestKernel/1048576/2 3341 us 3341 us 210 bytes_per_second=299.355M/s null_percent=50 size=1048.58k
TDigestKernel/1048576/1 1.58 us 1.58 us 440133 bytes_per_second=617.774G/s null_percent=100 size=1048.58k
TDigestKernel/1048576/0 5942 us 5941 us 117 bytes_per_second=168.328M/s null_percent=0 size=1048.58k
(Ubuntu 20.04, clang 10)
cpp/src/arrow/util/tdigest.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be better to partition at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No solid benchmark.
The idea is to drop the isnan check in Add function. When input buffer is full, partition NaNs to the buffer end and ignore them in sorting and merging. Guess the performance depends heavily on inputs (maybe worse for common cases where NaN is rare and isnan branch is always predicted correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this comment. Will do benchmark first.
|
For the record, 93% of the consumed CPU time seems to be in
|
|
Here are the new benchmark results. It seems TDigest is only really useful for centiles (!): |
Thanks for careful review. As you found, the bottleneck of TDigest is sorting. I once talked with the author about possibility of improving TDigest performance by removing sorting: tdunning/t-digest#153. There are some ideas to mitigate the penalty of sorting. I also plan to evaluate faster sorting algorithm like radix sort for floating numbers (folly has such an implementation).
I think |
|
Other thoughts about O(1) space quantile kernel performance: |
|
CI error not related |
|
Rebasing from master should allow fixing the CI failure. |
Implement approximate quantile kernel using t-digest util.
|
@cyb70289 Should we merge this or do you want to keep experimenting? |
I would like to merge this patch. Take tdigest optimization as followup task. Guess only code in util lib needs change, not kernel related code. |
Implement approximate quantile kernel using t-digest util.