Skip to content

Conversation

@thisisnic
Copy link
Member

@thisisnic thisisnic commented Aug 27, 2021

This adds a binding for median() in dplyr::summarise(). This also adds a binding for quantile() but only for a single probability per function call (length(probs) == 1). With both bindings, the results are approximate, calculated using the tdigest algorithm. The user is warned once per session that the results are approximate.

@github-actions
Copy link

@thisisnic thisisnic marked this pull request as draft August 27, 2021 11:02
@thisisnic
Copy link
Member Author

hash_digest returns a list, and currently we can't combine expression on aggregation kernels - look at this again after #10992 has been merged as this will enable this behaviour

@ianmcook ianmcook force-pushed the ARROW-13772_median_hash branch from ee8c6a2 to c4db4f1 Compare September 17, 2021 00:52
@ianmcook
Copy link
Member

This is blocked until #11159 merges.

@ianmcook ianmcook force-pushed the ARROW-13772_median_hash branch from 9515eeb to c44d54b Compare September 20, 2021 20:48
@ianmcook ianmcook marked this pull request as ready for review September 21, 2021 02:20
@ianmcook
Copy link
Member

ianmcook commented Sep 21, 2021

I will rebase this after #11204 merges. Until then, tests will fail.

@ianmcook ianmcook force-pushed the ARROW-13772_median_hash branch from 421abf6 to a837785 Compare September 24, 2021 15:01
@ianmcook
Copy link
Member

I addressed all comments and rebased after the approximate_median kernel PR was merged. I will merge when the CI is all green.

@ianmcook ianmcook marked this pull request as ready for review September 24, 2021 15:41
@ianmcook ianmcook closed this in 2178905 Sep 25, 2021
ianmcook added a commit that referenced this pull request Sep 25, 2021
This is a trivially small fix to resolve a test failure on older versions of R following ARROW-13772 (#11018).

Closes #11235 from ianmcook/ARROW-13772-fix

Authored-by: Ian Cook <ianmcook@gmail.com>
Signed-off-by: Ian Cook <ianmcook@gmail.com>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This adds a binding for `median()` in `dplyr::summarise()`. This also adds a binding for `quantile()` but only for a single probability per function call (`length(probs) == 1`). With both bindings, the results are approximate, calculated using the tdigest algorithm. The user is warned once per session that the results are approximate.

Closes apache#11018 from thisisnic/ARROW-13772_median_hash

Lead-authored-by: Ian Cook <ianmcook@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Ian Cook <ianmcook@gmail.com>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This is a trivially small fix to resolve a test failure on older versions of R following ARROW-13772 (apache#11018).

Closes apache#11235 from ianmcook/ARROW-13772-fix

Authored-by: Ian Cook <ianmcook@gmail.com>
Signed-off-by: Ian Cook <ianmcook@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants