ARROW-11541: [C++][Compute] Implement tdigest kernel #9435

cyb70289 · 2021-02-07T05:26:50Z

Implement approximate quantile kernel using t-digest util.

github-actions · 2021-02-07T05:27:14Z

https://issues.apache.org/jira/browse/ARROW-11541

pitrou · 2021-02-08T13:23:24Z

cpp/src/arrow/compute/api_aggregate.h

I'm curious whether this is best exposed as an interpolation kind for the "quantile" function, or a separate function altogether. Are there precedents in other libraries or database engines?

It seems R uses a separate function:
https://www.rdocumentation.org/packages/tdigest/versions/0.3.0/topics/tdigest

cc @nealrichardson @michalursa for opinion.

Matlab quantile has a method parameter, which can be exact (linear interpolation) or approximate (t-digest).
https://www.mathworks.com/help/stats/quantile.html#mw_06fb6551-ec1a-48f6-a690-f790dfaa9849

FWIW that tdigest R function is not part of base R, it is a contributed package.

R's quantile function also supports multiple methods, 9 in fact, via the type parameter: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html

None of these are approximate in the same way as this, but it's a further argument that it could be a function parameter rather than a separate function. TBH I don't know how much it matters, as a compute API consumer I can make either work. It's marginally easier if they're separate functions rather than having to mess with FunctionOptions to select them--are maybe there are downsides to that way?

Note that the t-digest method also has its own parameters (compression and delta). If we ever want to expose them, it would probably be less confusing as a separate function.

Sounds like separate function makes the most sense unless there's a cost to having an additional function.

Note that the t-digest method also has its own parameters (compression and delta). If we ever want to expose them, it would probably be less confusing as a separate function.

Good point. I will separate it into another function.

Added separated TDigest kernel.

@pitrou I've separated TDigest kernel, please review, thanks.

cyb70289 · 2021-02-22T08:56:59Z

appveyor complains protobuf namespace collision, should be fixed by pr #9542
mingw32 arrow-python-test timeout, still checking if related to this patch

pitrou

Thanks for the update. Some comments below.

Also, it seems there's a performance problem here where TDigest is much slower than Quantile:

QuantileKernelDouble/1048576/10000       1104 us         1104 us          628 bytes_per_second=905.728M/s null_percent=0.01 size=1048.58k
QuantileKernelDouble/1048576/100         1242 us         1242 us          562 bytes_per_second=805.287M/s null_percent=1 size=1048.58k
QuantileKernelDouble/1048576/10           898 us          897 us          780 bytes_per_second=1114.26M/s null_percent=10 size=1048.58k
QuantileKernelDouble/1048576/2            864 us          863 us          810 bytes_per_second=1.13112G/s null_percent=50 size=1048.58k
QuantileKernelDouble/1048576/1          0.920 us        0.920 us       767094 bytes_per_second=1062.03G/s null_percent=100 size=1048.58k
QuantileKernelDouble/1048576/0           1222 us         1221 us          570 bytes_per_second=818.707M/s null_percent=0 size=1048.58k

TDigestKernel/1048576/10000              5933 us         5932 us          118 bytes_per_second=168.565M/s null_percent=0.01 size=1048.58k
TDigestKernel/1048576/100                5899 us         5898 us          119 bytes_per_second=169.563M/s null_percent=1 size=1048.58k
TDigestKernel/1048576/10                 5518 us         5517 us          127 bytes_per_second=181.242M/s null_percent=10 size=1048.58k
TDigestKernel/1048576/2                  3341 us         3341 us          210 bytes_per_second=299.355M/s null_percent=50 size=1048.58k
TDigestKernel/1048576/1                  1.58 us         1.58 us       440133 bytes_per_second=617.774G/s null_percent=100 size=1048.58k
TDigestKernel/1048576/0                  5942 us         5941 us          117 bytes_per_second=168.328M/s null_percent=0 size=1048.58k

(Ubuntu 20.04, clang 10)

cpp/src/arrow/compute/kernels/aggregate_test.cc

cpp/src/arrow/compute/kernels/aggregate_tdigest.cc

cpp/src/arrow/compute/kernels/aggregate_test.cc

pitrou · 2021-02-22T15:13:02Z

cpp/src/arrow/util/tdigest.h

Why would it be better to partition at the end?

No solid benchmark.
The idea is to drop the isnan check in Add function. When input buffer is full, partition NaNs to the buffer end and ignore them in sorting and merging. Guess the performance depends heavily on inputs (maybe worse for common cases where NaN is rare and isnan branch is always predicted correctly).

Removed this comment. Will do benchmark first.

cpp/src/arrow/util/tdigest.h

cpp/src/arrow/compute/kernels/aggregate_tdigest.cc

cpp/src/arrow/compute/kernels/aggregate_benchmark.cc

pitrou · 2021-02-22T16:03:12Z

For the record, 93% of the consumed CPU time seems to be in TDigest::MergeInput, of which:

23% in TDigestMerger::Add
68% in std::sort

pitrou · 2021-02-22T16:06:51Z

Here are the new benchmark results. It seems TDigest is only really useful for centiles (!):

QuantileKernelMedianNarrow<Int32Type>/1048576/0          611 us          610 us         1170 bytes_per_second=1.59974G/s items_per_second=429.428M/s null_percent=0 size=1048.58k
QuantileKernelMedianWide<Int32Type>/1048576/0           1602 us         1595 us          438 bytes_per_second=627.15M/s items_per_second=164.404M/s null_percent=0 size=1048.58k
QuantileKernelMedianNarrow<Int64Type>/1048576/0          332 us          331 us         2111 bytes_per_second=2.95009G/s items_per_second=395.955M/s null_percent=0 size=1048.58k
QuantileKernelMedianWide<Int64Type>/1048576/0            553 us          550 us         1272 bytes_per_second=1.77461G/s items_per_second=238.185M/s null_percent=0 size=1048.58k
QuantileKernelMedianWide<DoubleType>/1048576/0          1148 us         1145 us          609 bytes_per_second=873.302M/s items_per_second=114.465M/s null_percent=0 size=1048.58k

QuantileKernelDecilesNarrow<Int32Type>/1048576/0         620 us          617 us         1158 bytes_per_second=1.5826G/s items_per_second=424.825M/s null_percent=0 size=1048.58k
QuantileKernelDecilesWide<Int32Type>/1048576/0          4685 us         4666 us          151 bytes_per_second=214.324M/s items_per_second=56.1838M/s null_percent=0 size=1048.58k
QuantileKernelDecilesWide<DoubleType>/1048576/0         2841 us         2829 us          247 bytes_per_second=353.477M/s items_per_second=46.3309M/s null_percent=0 size=1048.58k

QuantileKernelCentilesNarrow<Int32Type>/1048576/0        621 us          619 us         1137 bytes_per_second=1.57778G/s items_per_second=423.531M/s null_percent=0 size=1048.58k
QuantileKernelCentilesWide<Int32Type>/1048576/0        14151 us        14120 us           50 bytes_per_second=70.8198M/s items_per_second=18.565M/s null_percent=0 size=1048.58k
QuantileKernelCentilesWide<DoubleType>/1048576/0        7354 us         7328 us           97 bytes_per_second=136.46M/s items_per_second=17.8861M/s null_percent=0 size=1048.58k

TDigestKernelDoubleMedian/1048576/0                     5655 us         5628 us          125 bytes_per_second=177.681M/s items_per_second=23.289M/s null_percent=0 size=1048.58k
TDigestKernelDoubleDeciles/1048576/0                    5617 us         5598 us          125 bytes_per_second=178.645M/s items_per_second=23.4154M/s null_percent=0 size=1048.58k
TDigestKernelDoubleCentiles/1048576/0                   5802 us         5784 us          125 bytes_per_second=172.883M/s items_per_second=22.6601M/s null_percent=0 size=1048.58k

cyb70289 · 2021-02-23T02:49:08Z

Also, it seems there's a performance problem here where TDigest is much slower than Quantile:

Thanks for careful review. As you found, the bottleneck of TDigest is sorting. I once talked with the author about possibility of improving TDigest performance by removing sorting: tdunning/t-digest#153. There are some ideas to mitigate the penalty of sorting. I also plan to evaluate faster sorting algorithm like radix sort for floating numbers (folly has such an implementation).

Quantile kernel is expensive in calculating quantiles (call std::nth_element). So centils() performances the worst, simply because it calculates more quantiles.
TDigest kernel is trivial in calculating quantiles, so its performance is almost constant regardless number of quantiles to calculate. But TDigest is expensive in adding data points (TDigestMerger::Add), when buffer is full, it sorts (call std::sort) the input buffer which is much slower than std::nth_element.

I think Quantile and TDigest are for different purpose. You will use TDigest if you have large volume of inputs (or streaming data), and copying all of them leads to huge memory footprint. Another benefit of TDigest is that chunks can be processed in parallel, then merge into one tdigest, though current aggregate kernel framework doesn't support it.

cyb70289 · 2021-02-23T03:02:37Z

Other thoughts about O(1) space quantile kernel performance:
TDigest is good at estimate skewed quantiles, like 0.01, 0.99. For normal quantiles between 0.1 ~ 0.9, the old school online quantile algorithm P-Square [1] is good enough, it should be faster than TDigest. Maybe we can add a p-square kernel for additional cases, or auto select tdigest/p-square per input options.
[1] https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf

cyb70289 · 2021-02-23T08:10:59Z

CI error not related

pitrou · 2021-02-23T10:17:34Z

Rebasing from master should allow fixing the CI failure.

Implement approximate quantile kernel using t-digest util.

pitrou · 2021-02-23T12:52:10Z

@cyb70289 Should we merge this or do you want to keep experimenting?

cyb70289 · 2021-02-23T15:19:47Z

@cyb70289 Should we merge this or do you want to keep experimenting?

I would like to merge this patch. Take tdigest optimization as followup task. Guess only code in util lib needs change, not kernel related code.

github-actions bot added Component: C++ Component: Python labels Feb 7, 2021

pitrou reviewed Feb 8, 2021

View reviewed changes

jorgecarleitao force-pushed the master branch from d4608a9 to 356c300 Compare February 14, 2021 12:09

cyb70289 changed the title ~~ARROW-11541: [C++][Compute] Implement approximate quantile kernel~~ ARROW-11541: [C++][Compute] Implement tdigest kernel Feb 22, 2021

pitrou requested changes Feb 22, 2021

View reviewed changes

cyb70289 and others added 4 commits February 23, 2021 10:24

ARROW-11541: [C++][Compute] Implement approximate quantile kernel

c6883d3

Implement approximate quantile kernel using t-digest util.

declare tdigest::is_empty() as const

2d2630a

Overhaul benchmarks, add decile and centile benchmarks

9e34f38

address review comments

608d3d8

pitrou approved these changes Feb 23, 2021

View reviewed changes

pitrou closed this in 922fb98 Feb 23, 2021

cyb70289 deleted the quantile-approx branch February 24, 2021 00:03

asfimport mentioned this pull request Aug 1, 2021

[C++][Compute] Implement approximate quantile kernel #27415

Closed

ARROW-11541: [C++][Compute] Implement tdigest kernel #9435

ARROW-11541: [C++][Compute] Implement tdigest kernel #9435

Uh oh!

Conversation

cyb70289 commented Feb 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyb70289 Feb 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyb70289 commented Feb 22, 2021

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou commented Feb 22, 2021

Uh oh!

pitrou commented Feb 22, 2021

Uh oh!

cyb70289 commented Feb 23, 2021

Uh oh!

cyb70289 commented Feb 23, 2021

Uh oh!

cyb70289 commented Feb 23, 2021

Uh oh!

pitrou commented Feb 23, 2021

Uh oh!

pitrou commented Feb 23, 2021

Uh oh!

cyb70289 commented Feb 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cyb70289 commented Feb 7, 2021 •

edited

Loading

cyb70289 Feb 20, 2021 •

edited

Loading