[C++][Compute] Hash Join microbenchmarks

Implement a series of microbenchmarks giving a good picture of the performance of hash join implemented in Arrow across different set of dimensions.
Compare the performance against some other product(s).
Add scripts for generating useful visual reports giving a good picture of the costs of hash join.

Examples of dimensions to explore in microbenchmarks:
- number of duplicate keys on build side
- relative size of build side to probe side
- selectivity of the join
- number of key columns
- number of payload columns
- filtering performance for semi- and anti- joins
- dense integer key vs sparse integer key vs string key
- build size
- scaling of build, filtering, probe
- inner vs left outer, inner vs right outer
- left semi vs right semi, left anti vs right anti, left outer vs right outer
- non-uniform key distribution
- monotonic key values in input, partitioned key values in input (with and without per batch min-max metadata)
- chain of multiple hash joins
- overhead of Bloom filter for non-selective Bloom filter

**Reporter**: [Michal Nowakiewicz](https://issues.apache.org/jira/browse/ARROW-14479) / @michalursa
**Assignee**: [Sasha Krassovsky](https://issues.apache.org/jira/browse/ARROW-14479) / @save-buffer
#### Related issues:
- [[C++][Compute] Hash Join performance improvement](https://github.com/apache/arrow/issues/29768) (is a child of)
#### PRs and other links:
- [GitHub Pull Request #11876](https://github.com/apache/arrow/pull/11876)

<sub>**Note**: *This issue was originally created as [ARROW-14479](https://issues.apache.org/jira/browse/ARROW-14479). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++][Compute] Hash Join microbenchmarks #30038

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Compute] Hash Join microbenchmarks #30038

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions