Skip to content

[C++][Compute] Support merging GroupByState for multithreaded aggregation #27687

@asfimport

Description

@asfimport

ARROW-11591 adds support for grouped aggregation, but defers merging (which is non-trivial and unnecessary for single threaded aggregation). Eventually it will be required to support merging, however: when aggregating in a multithreaded dataset scan, each thread's results will need to be combined after the scan is completed.

Note that currently ScalarAggExecutor::Consume assumes that merging aggregations is not costly (true for small aggregation state as with "mean", but false for "group_by"), and invokes ScalarAggregateKernel::merge for each input batch. ARROW-11591 introduces "group_by" as a special case which will not be merged for each input batch but Ideally this assumption would not be made for any kernel. When removing it, be sure that merging other aggregates continues to be tested.

Reporter: Ben Kietzman / @bkietz

Related issues:

Note: This issue was originally created as ARROW-11840. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions