Skip to content

[C++][Compute] GroupBy: improve performance by encoding keys in row format only when they are inserted into hash table #28467

@asfimport

Description

@asfimport

Previous implementation of hash group by converts input ExecBatches to row-oriented format,
then hashes and compares rows as if they were a single column. 
It is more efficient (especially for small number of key columns) to avoid relatively costly 
encoding and instead compute hashes of individual columns in column-oriented format mixing them together, and similarly comparing column-oriented data to row-oriented data in the hash table without converting. 
Encoding only happens for a subset of input rows that are inserted into the hash table - they introduce new groups. 
Keys in hash table remain stored as row-oriented.

Reporter: Michal Nowakiewicz / @michalursa
Assignee: Michal Nowakiewicz / @michalursa

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12725. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions