-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
ARROW-13772 binds quantile() to tdigest() which returns approximate quantiles and binds median() to approximate_median() which returns an approximate median. The bindings issue a warning saying that the median/quantile is approximate. Once ARROW-13309 is implemented, modify the binding to call Arrow functions that returns exact quantiles and medians, and remove the warnings.
We should keep the approximate quantile and median bindings but rename them.
When doing this, we should also modify the bindings to accept type and interpolation arguments like we do in the quantile.ArrowDatum method:
Lines 156 to 187 in 170a24f
| quantile.ArrowDatum <- function(x, | |
| probs = seq(0, 1, 0.25), | |
| na.rm = FALSE, | |
| type = 7, | |
| interpolation = c("linear", "lower", "higher", "nearest", "midpoint"), | |
| ...) { | |
| if (inherits(x, "Scalar")) x <- Array$create(x) | |
| assert_is(probs, c("numeric", "integer")) | |
| assert_that(length(probs) > 0) | |
| assert_that(all(probs >= 0 & probs <= 1)) | |
| if (!na.rm && x$null_count > 0) { | |
| stop("Missing values not allowed if 'na.rm' is FALSE", call. = FALSE) | |
| } | |
| if (type != 7) { | |
| stop( | |
| "Argument `type` not supported in Arrow. To control the quantile ", | |
| "interpolation algorithm, set argument `interpolation` to one of: ", | |
| "\"linear\" (the default), \"lower\", \"higher\", \"nearest\", or ", | |
| "\"midpoint\".", | |
| call. = FALSE | |
| ) | |
| } | |
| interpolation <- QuantileInterpolation[[toupper(match.arg(interpolation))]] | |
| out <- call_function("quantile", x, options = list(q = probs, interpolation = interpolation)) | |
| if (length(out) == 0) { | |
| # When there are no non-missing values in the data, the Arrow quantile | |
| # function returns an empty Array, but for consistency with the R quantile | |
| # function, we want an Array of NA_real_ with the same length as probs | |
| out <- Array$create(rep(NA_real_, length(probs))) | |
| } | |
| out | |
| } |
Reporter: Ian Cook / @ianmcook
Related issues:
- [R] Binding for median() and quantile() aggregation functions (relates to)
- [C++] Implement hash_aggregate exact quantile kernel (depends upon)
Note: This issue was originally created as ARROW-14021. Please see the migration documentation for further details.