-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5760: [C++] New compute::Take implementation for better performance, faster dispatch, smaller code size / faster compilation #7382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'll fix the unit tests tomorrow. There's a lot of benchmarks but here are the take-int64 ones: before after A branch to use for comparison benchmarks is https://github.com/wesm/arrow/tree/ARROW-5760-comparison since I had to make some benchmark related fixes |
|
I'm also moving the index boundschecking to util/int_util.h, fixing bugs in it and adding unit tests |
|
I've got a lot of changes locally so recommend holding off on reviewing for a bit until the CI is looking more green |
|
OK, I have added more unit tests for the boundchecking code and added more rigorous unit testing for string/fixed-size-binary types in vector_take_test.cc. I'm going to get the test suite passing again but this can be reviewed Latest benchmark run: |
|
I added a bunch more unit tests, I think I'm done with this aside from CI fixes and will move on to working on Filter |
|
Appveyor is failing because of b058cf0 |
|
noticed a few small nits. Somebody more familiar with Take should review this. |
|
I noticed some bugs while reviewing, I'll push some changes here in a few minutes |
1bb6759 to
4de6b70
Compare
|
Done. Please review |
|
I am preparing another large patch that uses this branch as a base so this patch will need to be reviewed and merged before the next patch (providing a streamlined Filter implementation) can be reviewed. CI is passing -- the only failure is an out-of-space error on the 390x box |
… fixed size binary, large string
…rks in items/s rather than bytes for comparability across data types
|
I can confirm a performance increase here (2x to 5x faster) as well as a decrease in code size ( |
|
I'm going to push changes on this PR, I think. Please hold on :-). |
|
Please go ahead |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me. This is a very nice improvement.
|
@pitrou thanks for the fixes + improvements! |
|
Here's an appveyor build https://ci.appveyor.com/project/wesm/arrow/builds/33463261. Will merge this shortly |
|
BTW I ran benchmarks for this with MSVC 2017 with and without mimalloc and mimalloc has a pretty big impact, we should definitely endeavor to ship mimalloc in all of our Windows binaries in the next release https://gist.github.com/wesm/5b0841e583a27a8dc71f951b30475015/revisions?diff=split |
|
Does mimalloc have the same special build constraints like jemalloc such that we have to build it bundled? (Also, are we going to get that jemalloc static lib symbol gluing done for 1.0?) |
|
Same issue as jemalloc I think. I really hope that the static lib issue will be addressed for 1.0. I can try to do it if no one else can work on it but I see that as a last resort as there are other high value things I'm trying to get done. |
I was going to work on Filter in this PR but it got to be too much to review so I'll tackle that in a separate PR.
Summary of what's in this PR:
Note: support for doing Take with unions has been temporarily disabled. Since there is so little code in the codebase that deals with unions at the moment I felt it would be better to address this in follow up work. Here are the currently available kernels:
There's some code duplication that can be improved so I think things can be improved in follow up PRs using the benchmarks and code size metrics as a strict guideline for making changes.