Skip to content

[C++][Compute] Implement count_distinct/distinct hash aggregate kernels  #28470

@asfimport

Description

@asfimport

Implement count distinct aggregate reusing hash table from hash group by inside of it.

This brings support to SQL queries like:
select a, count(distinct b), count(distinct c) from t group by a

For instance to compute count(distinct b), the first group id mapping will give group id based on column a value; then the second group id mapping is done using the key (groupid(a), b) inside count(distinct b) aggregate (similarly for count(distinct c)). 
After all input rows are consumed, the final processing step scans the hash tables based on (groupid(a), b) and updates an array of counts indexed by groupid(a).
The resulting array of counts represents the output of count distinct aggregate.

Reporter: Michal Nowakiewicz / @michalursa
Assignee: David Li / @lidavidm
Watchers: Rok Mihevc / @rok

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12728. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions