ARROW-14479: [C++] Hash Join Microbenchmarks #11876

save-buffer · 2021-12-07T03:34:22Z

Added microbenchmarks for Hash Join (using OpenMP for parallelism for now). Also added a python script that analyzes the benchmark file and makes a lot of pretty graphs!

github-actions · 2021-12-07T03:34:42Z

https://issues.apache.org/jira/browse/ARROW-14479

cpp/src/arrow/compute/exec/CMakeLists.txt

westonpace

This is a great benchmark. I think you've captured several interesting aspects of the hash/join performance.

That being said, this benchmark is too slow. It takes close to an hour to run. Maybe compile flags can be used to distinguish between "benchmark" (a small space of parameters to help protect against regressions) and "experiment" (a wider set of parameters to validate the hash/join implementation and useful for people looking to improve the hash/join code)?

cpp/src/arrow/compute/exec/CMakeLists.txt

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

cpp/src/arrow/compute/exec/CMakeLists.txt

save-buffer · 2021-12-14T20:27:04Z

Regarding the benchmark being too slow: Making the number of selectivity test cases smaller and removing a lot of the slow utf8 runs, it now runs in 16.64 mins on my machine, which is more reasonable?

westonpace · 2021-12-14T23:30:08Z

Regarding the benchmark being too slow: Making the number of selectivity test cases smaller and removing a lot of the slow utf8 runs, it now runs in 16.64 mins on my machine, which is more reasonable?

It would certainly be an outlier. Right now a full run of 52 benchmarks (including compilation) takes ~ 100 minutes. So we are averaging ~2 minutes per benchmark (some are quite a bit faster and some are quite a bit slower). I don't have the distribution for all of them.

Benchmarking tools like conbench run the full suite of C++ benchmarks against every Arrow commit. So at the very least I think there would need to be a compelling case why we can't get away with something similar here (again, it may be we want a reduced "for-automation" set which is the default and a more complete "for-investigation" set)

CC @jonkeane @pitrou for a second opinion

pitrou · 2021-12-15T09:07:44Z

I agree that 16 minutes sounds too long. I assume this is because many combinations are being tested. I'm also surprised that the benchmarks are using OpenMP, is that actually required?

pitrou · 2021-12-15T09:08:09Z

For the record, our entire sorting benchmarks take 2 minutes here.

pitrou · 2021-12-15T09:09:59Z

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

Do you have to test with all these tuples of int64 types? Or is it sufficient for performance analysis to test only {int64} and {int64,int64,int64,int64} for example?

The main use case is we want to see if there's any huge performance hit to having multiple fields or one field (i.e. 2 x int64 vs fixed_size_binary(16)).

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

pitrou · 2021-12-15T09:17:47Z

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

I'll admit my ignorance here, but is it useful to benchmark all join types or are some just going to yield the same performance of another (because of symmetry)?

Yes, because the build side is always on the right and the probe side is always on the left, so it's not actually symmetrical.

cpp/src/arrow/compute/exec/hash_join_graphs.py

pitrou · 2021-12-15T09:23:44Z

it may be we want a reduced "for-automation" set which is the default and a more complete "for-investigation" set

Agreed that we should think more about what we're expecting from this. Does the fine-grained selection of benchmark parameters really help dive into performance issues? Or is the coverage just excessive?

We could have pretty much the same fine-grained approach for many other benchmarks (I gave the sorting example above, which definitely encourages a combinatory explosion in benchmark numbers as well), but it would multiply the total time for running benchmarks by a non-trivial factor.

Besides continuous benchmarking, I'll point out that interactive work with benchmarks is less pleasant and more tedious when individual benchmark suites are too long (again that's my experience with the current sorting benchmarks, yet they're 8x faster than this).

jonkeane · 2021-12-15T14:09:00Z

again, it may be we want a reduced "for-automation" set which is the default and a more complete "for-investigation" set

We've run into this a few times in arrowbench (and actually already have this implemented in arrowbench for exactly this reason). I don't know of an example of this in the C++ benchmarks, it might be as simple as making it so that they don't run when archery benchmark run is called, but instead we need an additional --full or something to run the full set?

save-buffer · 2021-12-15T21:42:11Z

What would be the best way to provide a light version of the benchmark? Should I create a hash_join_benchmark_lite.cc? Or should this be done with compile-time arguments?
An alternative way could be to have the automation tooling specify which benchmarks to run using benchmark_filter regex.

jonkeane · 2021-12-16T19:38:21Z

@ElenaHenderson might have some ideas about benchmark filtering and hard/easy that would be to implement. Currently, the automation runs whatever is run when we do archery benchmark run so it would be sufficient to make the extraneous benchmarks not run without some other flag (or separate different command, or...)

pitrou · 2021-12-16T19:40:34Z

I'm not opposed to adding flags, but I think it would be more widely beneficial to try and reduce the matrix of parameter values.

jonkeane · 2021-12-16T19:44:22Z

Yes, I didn't mean to imply that the flag / gating should be the only thing we do — we should do both (or only the parameter reduction if that gets the space small enough to just run them all)

westonpace

Thanks again for doing this. The new flag works great. Just a few more notes. I think you're just about there but there is a glitch in the X axis for UTF8 now.

@jonkeane we will need to make sure to set ARROW_BUILD_DETAILED_BENCHMARKS=OFF on any kind of regularly run benchmarks (conbench?).

@pitrou / @kou Any last thoughts?

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

westonpace · 2022-01-11T00:21:39Z

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

Suggested change

DCHECK_OK(join_->InputReceived(tid, 1 /* side */, *it));

DCHECK_OK(join_->InputReceived(tid, /*side=*/1, *it));

For consistency foo(/*param_name=*/0) is more common in the code base than foo(0 /* param_name */)

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

westonpace · 2022-01-11T00:27:50Z

cpp/src/arrow/compute/exec/hash_join_benchmark.cc

Why wrap this with a struct?

I originally had a few more stats, and figured it would be clearer to have it in the stats_ struct. When I switched to having only one stat, I decided to keep the struct in case we added more again in the future (like if we add cache miss counting and such)

westonpace · 2022-01-11T00:36:37Z

cpp/src/arrow/compute/exec/hash_join_graphs.py

Since these are string values the order gets messed up since the UTF8 plot has unique X axis values. On the other hand, if you read them in numerically, you'll need to configure the X axis to be logarithmic or else you'll get washed out by the higher values.

pitrou · 2022-01-11T13:19:04Z

No particular concern from me.

jonkeane · 2022-01-11T14:23:12Z

@jonkeane we will need to make sure to set ARROW_BUILD_DETAILED_BENCHMARKS=OFF on any kind of regularly run benchmarks (conbench?).

is it possible to make the default/unset option be that they don't run? Then we won't need to make a PR against conbench (and folks running the benchmarks locally won't need to remember to set it if they don't want these).

kou

I have no concern.

save-buffer · 2022-01-12T19:48:04Z

@jonkeane @westonpace I did set ARROW_BUILD_DETAILED_BENCHMARKS to OFF in CMakePresets.json, so I think it should default to OFF?

…al data

westonpace

@save-buffer is right, my mistake, it looks like the default is OFF. Thanks for the updates. Once CI passes I will merge.

ursabot · 2022-01-13T02:56:12Z

Benchmark runs are scheduled for baseline = 2ec4e99 and contender = ab86daf. ab86daf is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️1.79% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: C++ label Dec 7, 2021

save-buffer force-pushed the sasha_benchmark branch from 4297e17 to 2ce4d08 Compare December 7, 2021 03:35

edponce reviewed Dec 7, 2021

View reviewed changes

cpp/src/arrow/compute/exec/CMakeLists.txt Outdated Show resolved Hide resolved

save-buffer force-pushed the sasha_benchmark branch from b4c6baf to 37f7bab Compare December 7, 2021 21:13

westonpace self-requested a review December 7, 2021 21:25

save-buffer force-pushed the sasha_benchmark branch 2 times, most recently from bdaae1c to 224a047 Compare December 9, 2021 06:04

westonpace requested changes Dec 11, 2021

View reviewed changes

westonpace reviewed Dec 11, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

westonpace reviewed Dec 11, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

westonpace reviewed Dec 11, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

westonpace reviewed Dec 13, 2021

View reviewed changes

cpp/src/arrow/compute/exec/CMakeLists.txt Outdated Show resolved Hide resolved

pitrou reviewed Dec 15, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

pitrou reviewed Dec 15, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

pitrou reviewed Dec 15, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_benchmark.cc Outdated Show resolved Hide resolved

pitrou reviewed Dec 15, 2021

View reviewed changes

cpp/src/arrow/compute/exec/hash_join_graphs.py Outdated Show resolved Hide resolved

save-buffer force-pushed the sasha_benchmark branch from 903618a to 61f5f32 Compare January 5, 2022 01:13

save-buffer force-pushed the sasha_benchmark branch 2 times, most recently from 5a8cdc2 to 0250ce0 Compare January 7, 2022 22:13

westonpace self-requested a review January 7, 2022 22:18

westonpace requested changes Jan 11, 2022

View reviewed changes

kou approved these changes Jan 11, 2022

View reviewed changes

save-buffer added 15 commits January 12, 2022 12:33

Add benchmarking for hash join

24c40cd

Parameterize the fixture, including making it multithreaded with OMP

e1706c1

Add some more benchmarks

cd3c6ca

Add cardinality and selectivity

f3c1fdc

Add python script

59e8cd9

cmake-format

d3ce1e3

clang-format

ca30b92

Add license header

49f7d53

Linter comments

4b99774

Make openmp work on msvc

d81a94e

Respond to comments

08b5234

Switch to rows/sec, make script a bit more intelligent with categoric…

5e1e329

…al data

Respond to some more comments

ce9cec0

clang-format

4f3bf2d

Add ARROW_BUILD_DETAILED_BENCHMARKS flag

f182ddf

save-buffer force-pushed the sasha_benchmark branch from d8fa07f to 94bbb63 Compare January 12, 2022 20:34

Respond to Weston comments, switch to log scale when prudent

ddab7f5

save-buffer force-pushed the sasha_benchmark branch from 94bbb63 to ddab7f5 Compare January 12, 2022 21:05

westonpace approved these changes Jan 12, 2022

View reviewed changes

westonpace closed this in ab86daf Jan 13, 2022

asfimport mentioned this pull request Jan 14, 2022

[C++][Compute] Hash Join microbenchmarks #30038

Closed

	DCHECK_OK(join_->InputReceived(tid, 1 /* side /, it));
	DCHECK_OK(join_->InputReceived(tid, /side=/1, *it));

ARROW-14479: [C++] Hash Join Microbenchmarks #11876

ARROW-14479: [C++] Hash Join Microbenchmarks #11876

Uh oh!

Conversation

save-buffer commented Dec 7, 2021

Uh oh!

github-actions bot commented Dec 7, 2021

Uh oh!

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

save-buffer commented Dec 14, 2021

Uh oh!

westonpace commented Dec 14, 2021

Uh oh!

pitrou commented Dec 15, 2021

Uh oh!

pitrou commented Dec 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou commented Dec 15, 2021

Uh oh!

jonkeane commented Dec 15, 2021

Uh oh!

save-buffer commented Dec 15, 2021

Uh oh!

jonkeane commented Dec 16, 2021

Uh oh!

pitrou commented Dec 16, 2021

Uh oh!

jonkeane commented Dec 16, 2021

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Jan 11, 2022

Uh oh!

jonkeane commented Jan 11, 2022

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Jan 13, 2022 •

edited

Loading