Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Feb 23, 2024

Rationale for this change

take concatenates chunks when it's applied to a chunked values array, but when the indices arrays is also chunked it concatenates values more than once -- one Concatenate call with values.chunks() for every chunk in indices. This PR doesn't remove the concatenation, but ensures it's done only once instead of indices.size() times.

What changes are included in this PR?

Are these changes tested?

By existing tests.

Are there any user-facing changes?

A faster compute kernel.

@felipecrv felipecrv changed the title Take ccc GH-40207: [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA Feb 23, 2024
@apache apache deleted a comment from github-actions bot Feb 23, 2024
@github-actions
Copy link

⚠️ GitHub issue #40207 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor Author

felipecrv commented Feb 23, 2024

Items per second improved a lot. Higher is better.

EDIT: these numbers come from the initial version of the benchmarks (10 chunks per array and a 10x bigger indices array relative to values).

chart

@felipecrv felipecrv requested review from js8544 and pitrou February 23, 2024 15:39
@felipecrv
Copy link
Contributor Author

Making the only difference between the benchmarks, the fact that the array is chunked (in a 100 chunks) makes the differences more evident.

Screenshot 2024-02-23 at 12 57 16

@js8544
Copy link
Contributor

js8544 commented Feb 27, 2024

+1. Thanks!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 27, 2024
@felipecrv
Copy link
Contributor Author

@js8544 thank you for the review. I pushed a revert for the last commit because it isn't necessary for this PR and might conflict with a PR someone else is working on.

If not one objects, I will probably merge this right before I'm ready to send more Take-related PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add at least "ChunkedChunked" variations for each of these types as well? (FSB, FSL, String). I'm assuming the performance characteristics might be different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("ChunkedFlat" is less interesting arguably)

@pitrou

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Feb 27, 2024

@felipecrv Would you like to rebase if not already done?

@felipecrv
Copy link
Contributor Author

@felipecrv Would you like to rebase if not already done?

Soon. I'm currently working on top of this branch locally 😅

…eCAC

...which would concatenate values on every loop iteration.
 - Use args.size correctly (no division by sizeof(int64_t)
 - More chunks: 100 instead of just 10
 - Keep the number of indices in the chunked version equal to the number
   of items (just like the non-chunked benchmarks)
 - Variations without chunking of the indices
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 2c13a19.

There were 2 benchmark results with an error:

There were 13 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants