Skip to content

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Jul 15, 2021

This PR adds support for both scalar and group-by aggregation via dplyr::summarize(). Only the functions sum, any, and all are wired up. Followup issues (both bugs and features):

  • [C++] Aggregation nodes seem not to respect FunctionOptions, or else I'm not passing them in correctly (ARROW-13497)
  • [C++] ScanNode takes filter but doesn't filter (ARROW-13498)
  • [R] Aggregation on expression doesn't NSE correctly (ARROW-13499)
  • [R] Bindings for mean, var, sd aggregation (ARROW-13528)
  • [R] Bindings for count aggregation (ARROW-13501)
  • [R] Bindings for min/max aggregation (ARROW-13502)
  • [R] Handle summarize() with 0 arguments or no aggregate functions (ARROW-13543)
  • [R] Support .groups argument to summarize() (ARROW-13550)
  • [C++] MakeScalarAggregateNode and MakeGroupByNode have quite different function signatures, which makes working with the API confusing; GroupBy doesn't let you specify the names of the output columns (ARROW-13482)
  • [C++] Grouped aggregation functions all have to be invoked with a hash_ prefix to the name, which seems unnecessary because you can't call a non-hash-aggregation function in GroupBy and you can't call a hash_ function in ScalarAggregate (ARROW-13451)

@nealrichardson nealrichardson requested a review from bkietz July 15, 2021 11:16
@nealrichardson nealrichardson changed the title [WIP][R] ExecPlan/Node and ScalarAggregate [WIP] ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode and ScalarAggregateNode Jul 15, 2021
@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch from b8d4db6 to d6e0344 Compare July 15, 2021 14:09
@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch from 9110b65 to 0ca08b2 Compare July 19, 2021 19:31
@ianmcook ianmcook self-requested a review July 23, 2021 15:54
@bkietz bkietz force-pushed the scalar-aggregate-node branch from 0ca08b2 to 6cd68cc Compare July 26, 2021 15:52
@nealrichardson nealrichardson changed the title [WIP] ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode and ScalarAggregateNode ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode Jul 26, 2021
@apache apache deleted a comment from github-actions bot Jul 26, 2021
@github-actions
Copy link

@nealrichardson nealrichardson marked this pull request as ready for review July 27, 2021 18:40
@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch from 133f328 to a7f5cad Compare July 27, 2021 18:43
@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch 2 times, most recently from d0731f3 to 43463b3 Compare July 30, 2021 16:36
@bkietz
Copy link
Member

bkietz commented Jul 30, 2021

  • MakeScalarAggregateNode and MakeGroupByNode have quite different function signatures, which makes working with the API confusing
  • GroupBy doesn't let you specify the names of the output columns

ARROW-13482 (#10793) unifies the factories for scalar and grouped aggregation. I'll ensure that PR also provides configurable output column names for both

  • Grouped aggregation functions all have to be invoked with a hash_ prefix to the name, which seems unnecessary because you can't call a non-hash-aggregation function in GroupBy and you can't call a hash_ function in ScalarAggregate

ARROW-13451 would allow grouping of grouped and scalar aggregation kernels under the same function (rather than requiring distinct sum and hash_sum functions), which I think would address these

@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch 2 times, most recently from 8e2dcd2 to a7ec453 Compare August 3, 2021 13:32
@nealrichardson nealrichardson force-pushed the scalar-aggregate-node branch from a7ec453 to eab89e8 Compare August 3, 2021 15:23
@ianmcook ianmcook self-requested a review August 4, 2021 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants