ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator #13630

wesm · 2022-07-17T20:16:37Z

This completes the porting to use ExecSpan everywhere. I also changed the ExecBatchIterator benchmarks to use ExecSpan to show the performance improvement in input splitting that we've talked about in the past:

Splitting inputs into small ExecSpan:

------------------------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------
BM_ExecSpanIterator/1024      205671 ns       205667 ns         3395 items_per_second=4.86223k/s
BM_ExecSpanIterator/4096       54749 ns        54750 ns        13121 items_per_second=18.265k/s
BM_ExecSpanIterator/16384      15979 ns        15979 ns        42628 items_per_second=62.5824k/s
BM_ExecSpanIterator/65536       5597 ns         5597 ns       125099 items_per_second=178.668k/s

Splitting inputs into small ExecBatch:

-------------------------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------
BM_ExecBatchIterator/1024    17163432 ns     17163171 ns           41 items_per_second=58.2643/s
BM_ExecBatchIterator/4096     4243467 ns      4243316 ns          163 items_per_second=235.665/s
BM_ExecBatchIterator/16384    1093680 ns      1093638 ns          620 items_per_second=914.38/s
BM_ExecBatchIterator/65536     272451 ns       272435 ns         2584 items_per_second=3.6706k/s

Because the input in this benchmark has 1M elements, this shows that splitting into 1024 chunks of size 1024 adds only 0.2ms of overhead with ExecSpanIterator versus 17.16ms of overhead with ExecBatchIterator (> 80x improvement).

This won't by itself do much to impact performance in Acero but things for the community to explore in the future are the following (this work that I've been doing has been a precondition to consider this):

A leaner ExecuteScalarExpression implementation that reuses temporary allocations (ARROW-16758)
Parallel expression evaluation
Better defining morsel (~1M elements) versus task (~1K elements) granularity in execution
Work stealing so that we don't "hog" the thread pools, and we keep the work pinned to a particular CPU core if there are other things going on at the same time

github-actions · 2022-07-17T20:16:59Z

https://issues.apache.org/jira/browse/ARROW-16852

wesm · 2022-07-17T21:33:01Z

The only CI failures are the linkage issues related to protobuf/otel

pitrou

Very nice, just a couple minor comments.

cpp/src/arrow/compute/exec/aggregate.cc

cpp/src/arrow/compute/exec_internal.h

cpp/src/arrow/compute/kernel.h

cpp/src/arrow/compute/kernels/aggregate_basic_internal.h

cpp/src/arrow/compute/kernels/aggregate_quantile.cc

cpp/src/arrow/compute/kernels/hash_aggregate_test.cc

cpp/src/arrow/compute/row/grouper.cc

Fix some more stuff Remove unused scalar path More refactoring Get everything compiling again Fix some more bugs Fix another bug Remove dead code Fix ArraySpan add/set offset logic Cleaner ArraySpan slicing logic Revert some files

wesm · 2022-07-19T13:43:42Z

Thanks for the review. Will merge when CI green

wesm · 2022-07-19T21:26:12Z

Merging. The CI issues look like assorted flakiness to me

ursabot · 2022-07-20T20:21:44Z

Benchmark runs are scheduled for baseline = c445243 and contender = 4d931ff. 4d931ff is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️2.19% ⬆️0.21%] test-mac-arm
[Failed] ursa-i9-9960x
[Finished ⬇️1.03% ⬆️0.43%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] 4d931ff1 ec2-t3-xlarge-us-east-2
[Failed] 4d931ff1 test-mac-arm
[Failed] 4d931ff1 ursa-i9-9960x
[Finished] 4d931ff1 ursa-thinkcentre-m75q
[Failed] c445243a ec2-t3-xlarge-us-east-2
[Failed] c445243a test-mac-arm
[Failed] c445243a ursa-i9-9960x
[Finished] c445243a ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: C++ label Jul 17, 2022

pitrou reviewed Jul 19, 2022

View reviewed changes

wesm added 4 commits July 19, 2022 08:17

Start refactoring

b4e3712

Fix some more stuff Remove unused scalar path More refactoring Get everything compiling again Fix some more bugs Fix another bug Remove dead code Fix ArraySpan add/set offset logic Cleaner ArraySpan slicing logic Revert some files

Reapply changes

36ec390

Fix some compiler warnings in release mode

09a6d42

Review comments

df1a92e

wesm force-pushed the execspan-other-kernels branch from b182b8c to df1a92e Compare July 19, 2022 13:40

lidavidm approved these changes Jul 19, 2022

View reviewed changes

wesm merged commit 4d931ff into apache:master Jul 19, 2022

asfimport mentioned this pull request Jul 20, 2022

[C++] Migrate SCALAR_AGGREGATE, HASH_AGGREGATE functions to use ExecSpan #32178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator #13630

ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator #13630

Uh oh!

wesm commented Jul 17, 2022

Uh oh!

github-actions bot commented Jul 17, 2022

Uh oh!

wesm commented Jul 17, 2022

Uh oh!

pitrou left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wesm commented Jul 19, 2022

Uh oh!

wesm commented Jul 19, 2022

Uh oh!

ursabot commented Jul 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator #13630

ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator #13630

Uh oh!

Conversation

wesm commented Jul 17, 2022

Uh oh!

github-actions bot commented Jul 17, 2022

Uh oh!

wesm commented Jul 17, 2022

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wesm commented Jul 19, 2022

Uh oh!

wesm commented Jul 19, 2022

Uh oh!

ursabot commented Jul 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants