-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework #7240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For code reviewers:
I am aware that there are some obviously messy things that will need to be cleaned up a bit, but we have to start somewhere. |
d1dca6d to
eb7f776
Compare
|
Aside from fixing the builds, I'm going to do a few things to help with this:
|
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving a handful of cleaning TODOs for myself
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions that came while reading through the API.
cpp/src/arrow/compute/options.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to have this as part of the cast options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm aware, but I wasn't able to figure out another way to communicate the output type to the standard kernel APIs
|
The R failure: I'm happy to help debug but that sounds pretty straightforward. |
|
@nealrichardson yeah, I will fix (wasn't thinking about timezones when adding the dispatch rules for comparisons) |
Can I start working on this on this branch? |
|
@kou yes please go ahead, I'm sure you'll be careful about rebasing since I'm going to push some commits to the branch later today probably in response to the comments above |
|
OK! |
eb7f776 to
b3286d3
Compare
|
I rebased without pushing any new changes |
|
CC @brills I'm not sure if this impacts TFX at all. |
|
@emkornfield @brills no Python APIs are changed (not sure how much of the C++ API TFX uses) |
cpp/src/arrow/compute/function.h
Outdated
|
|
||
| /// \brief Contains the number of required arguments for the function | ||
| struct ARROW_EXPORT FunctionArity { | ||
| static FunctionArity Nullary() { return FunctionArity(0, false); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Void or Null for name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp/src/arrow/compute/function.h
Outdated
| static FunctionArity Unary() { return FunctionArity(1, false); } | ||
| static FunctionArity Binary() { return FunctionArity(2, false); } | ||
| static FunctionArity Ternary() { return FunctionArity(3, false); } | ||
| static FunctionArity Varargs(int min_args = 1) { return FunctionArity(min_args, true); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: VarArgs
It also seems like it might be more readable for callers to have to always specify min_args (especially because it isn't clear to me why 1 is the default and not 0 or 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need at least one argument spec for type checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm changing the name, but will leave the default issue for follow up work
|
I got sucked into some other things today, I'll focus on getting the CI passing tomorrow so we can hopefully go into the weekend with a green build |
… bindings for FilterOptions for Python, support to call_function
1f9b758 to
877618f
Compare
|
@jorisvandenbossche I fixed the segfault you hit and now have changed the implementation of I also rebased -- I am going to merge this when the build is passing so we do not continue to stack changes that need to be reviewed on this PR. I welcome further code reviews on this PR or follow up JIRAs |
|
+1 |
|
The PR has been merged, but the code review feature is still available. Please continue to leave comments and I will address them |
…tation and execution framework This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting. The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution. There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name: ```c++ Result<Datum> CallFunction(ExecContext* ctx, const std::string& func_name, const std::vector<Datum>& args, const FunctionOptions* options = NULLPTR); ``` What occurs when you do this: * A `Function` instance is looked up in the global `FunctionRegistry` * Given the descriptors of `args` (their types and shapes -- array or scalar), the Function searches for `Kernel` that is able to process those types and shapes. A kernel might be able to do `array[T0], array[T1]` or only `scalar[T0], scalar[T1]`, for example. This permits kernel specialization to treat different type and shape combinations * The kernel is executed iteratively against `args` based on what `args` contains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or Scalar * The Executor implementation is able to split contiguous Array inputs into smaller chunks, which is important for parallel execution. See `ExecContext::set_exec_chunksize` To summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination. An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels. There are some other incidental changes in the PR, such as changing the convenience APIs like `Cast` to return `Result`. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs. I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project. Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues. Closes apache#7240 from wesm/ARROW-8792-kernels-revamp Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>
…nd outdated language I also fixed the `Arity` ctor to be explicit which I thought I'd done in #7240 but it is taken care of here. Closes #7264 from wesm/ARROW-8926 Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>
|
I'll leave this PR open until next week sometime to allow more time for comments |
|
Closing the PR again. Please open JIRA issues or review other PRs to provide more feedback. |
Summary:
* Deprecate SortToIndices()
* Add SortIndices() because we use "sort_indices" as kernel name in
apache#7240
* Add support for sort indices in descending order on Array
* Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support RecordBatch and
Table
Benchmark:
Summary:
* No performance regression in existing sort on array
* Same performance as sort on array when the number of sort keys is 1
* 1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
----------------------------------------------------------------------------------
Benchmark Time CPU
----------------------------------------------------------------------------------
SortToIndicesInt64Count/32768/10000/min_time:1.000 15685 ns 15682 ns
SortToIndicesInt64Count/32768/100/min_time:1.000 15961 ns 15957 ns
SortToIndicesInt64Count/32768/10/min_time:1.000 16473 ns 16469 ns
SortToIndicesInt64Count/32768/2/min_time:1.000 27993 ns 27987 ns
SortToIndicesInt64Count/32768/1/min_time:1.000 5609 ns 5608 ns
SortToIndicesInt64Count/32768/0/min_time:1.000 13143 ns 13141 ns
SortToIndicesInt64Count/1048576/1/min_time:1.000 134695 ns 134670 ns
SortToIndicesInt64Count/8388608/1/min_time:1.000 1243992 ns 1243260 ns
SortToIndicesInt64Compare/32768/10000/min_time:1.000 193632 ns 193595 ns
SortToIndicesInt64Compare/32768/100/min_time:1.000 194876 ns 194837 ns
SortToIndicesInt64Compare/32768/10/min_time:1.000 182362 ns 182324 ns
SortToIndicesInt64Compare/32768/2/min_time:1.000 111607 ns 111584 ns
SortToIndicesInt64Compare/32768/1/min_time:1.000 5642 ns 5641 ns
SortToIndicesInt64Compare/32768/0/min_time:1.000 190302 ns 190268 ns
SortToIndicesInt64Compare/1048576/1/min_time:1.000 134743 ns 134718 ns
SortToIndicesInt64Compare/8388608/1/min_time:1.000 1261404 ns 1249362 ns
After:
-------------------------------------------------------------------------------------
Benchmark Time CPU
-------------------------------------------------------------------------------------
ArraySortIndicesInt64Count/32768/10000/min_time:1.000 14769 ns 14766 ns
ArraySortIndicesInt64Count/32768/100/min_time:1.000 15207 ns 15204 ns
ArraySortIndicesInt64Count/32768/10/min_time:1.000 15892 ns 15889 ns
ArraySortIndicesInt64Count/32768/2/min_time:1.000 30107 ns 30100 ns
ArraySortIndicesInt64Count/32768/1/min_time:1.000 5509 ns 5508 ns
ArraySortIndicesInt64Count/32768/0/min_time:1.000 12699 ns 12696 ns
ArraySortIndicesInt64Count/1048576/1/min_time:1.000 132585 ns 132557 ns
ArraySortIndicesInt64Count/8388608/1/min_time:1.000 1236596 ns 1235842 ns
ArraySortIndicesInt64Compare/32768/10000/min_time:1.000 193259 ns 193217 ns
ArraySortIndicesInt64Compare/32768/100/min_time:1.000 190010 ns 189973 ns
ArraySortIndicesInt64Compare/32768/10/min_time:1.000 179923 ns 179879 ns
ArraySortIndicesInt64Compare/32768/2/min_time:1.000 111100 ns 111074 ns
ArraySortIndicesInt64Compare/32768/1/min_time:1.000 5660 ns 5659 ns
ArraySortIndicesInt64Compare/32768/0/min_time:1.000 186521 ns 186476 ns
ArraySortIndicesInt64Compare/1048576/1/min_time:1.000 132863 ns 132832 ns
ArraySortIndicesInt64Compare/8388608/1/min_time:1.000 1266383 ns 1265765 ns
TableSortIndicesInt64Count/32768/10000/min_time:1.000 21812 ns 21807 ns
TableSortIndicesInt64Count/32768/100/min_time:1.000 22494 ns 22490 ns
TableSortIndicesInt64Count/32768/10/min_time:1.000 17300 ns 17296 ns
TableSortIndicesInt64Count/32768/2/min_time:1.000 29927 ns 29921 ns
TableSortIndicesInt64Count/32768/1/min_time:1.000 5877 ns 5875 ns
TableSortIndicesInt64Count/32768/0/min_time:1.000 20394 ns 20390 ns
TableSortIndicesInt64Count/1048576/1/min_time:1.000 132904 ns 132871 ns
TableSortIndicesInt64Count/8388608/1/min_time:1.000 1342693 ns 1341943 ns
TableSortIndicesInt64Compare/32768/10000/min_time:1.000 203163 ns 203106 ns
TableSortIndicesInt64Compare/32768/100/min_time:1.000 199531 ns 199477 ns
TableSortIndicesInt64Compare/32768/10/min_time:1.000 185968 ns 185916 ns
TableSortIndicesInt64Compare/32768/2/min_time:1.000 113571 ns 113540 ns
TableSortIndicesInt64Compare/32768/1/min_time:1.000 6251 ns 6249 ns
TableSortIndicesInt64Compare/32768/0/min_time:1.000 183650 ns 183609 ns
TableSortIndicesInt64Compare/1048576/1/min_time:1.000 131701 ns 131674 ns
TableSortIndicesInt64Compare/8388608/1/min_time:1.000 1264413 ns 1263622 ns
TableSortIndicesInt64Int64/32768/10000/min_time:1.000 313368 ns 313310 ns
TableSortIndicesInt64Int64/32768/100/min_time:1.000 313425 ns 313361 ns
TableSortIndicesInt64Int64/32768/10/min_time:1.000 337051 ns 336987 ns
TableSortIndicesInt64Int64/32768/2/min_time:1.000 402063 ns 401973 ns
TableSortIndicesInt64Int64/32768/1/min_time:1.000 254660 ns 254612 ns
TableSortIndicesInt64Int64/32768/0/min_time:1.000 305887 ns 305815 ns
TableSortIndicesInt64Int64/1048576/1/min_time:1.000 11157920 ns 11155020 ns
TableSortIndicesInt64Int64/8388608/1/min_time:1.000 106529839 ns 106501576 ns
Follow-up tasks:
* Improve performance when the number of sort keys is 2 or greater
* Idea1: Use counting sort for small range data like on array
* Idea2: Use threading and merge sort
* Add support multi-column partition Nth indices on Table
Summary:
* Deprecate SortToIndices()
* Add SortIndices() because we use "sort_indices" as kernel name in
#7240
* Add support for sort indices in descending order on Array
* Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support ChunkedArray,
RecordBatch and Table
* Require benchmark 1.5.2 or later
Benchmark:
Summary:
* No performance regression in existing sort on array
* Same performance as sort on array when the number of sort keys is 1
* 1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
----------------------------------------------------------------------------------
Benchmark Time CPU
----------------------------------------------------------------------------------
SortToIndicesInt64Count/32768/10000/min_time:1.000 16057 ns 16054 ns
SortToIndicesInt64Count/32768/100/min_time:1.000 16592 ns 16589 ns
SortToIndicesInt64Count/32768/10/min_time:1.000 15979 ns 15975 ns
SortToIndicesInt64Count/32768/2/min_time:1.000 28379 ns 28372 ns
SortToIndicesInt64Count/32768/1/min_time:1.000 5767 ns 5766 ns
SortToIndicesInt64Count/32768/0/min_time:1.000 12560 ns 12557 ns
SortToIndicesInt64Count/1048576/100/min_time:1.000 729683 ns 729497 ns
SortToIndicesInt64Count/8388608/100/min_time:1.000 6696415 ns 6693711 ns
SortToIndicesInt64Compare/32768/10000/min_time:1.000 190488 ns 190442 ns
SortToIndicesInt64Compare/32768/100/min_time:1.000 190879 ns 190838 ns
SortToIndicesInt64Compare/32768/10/min_time:1.000 179156 ns 179115 ns
SortToIndicesInt64Compare/32768/2/min_time:1.000 109098 ns 109074 ns
SortToIndicesInt64Compare/32768/1/min_time:1.000 5682 ns 5681 ns
SortToIndicesInt64Compare/32768/0/min_time:1.000 185022 ns 184984 ns
SortToIndicesInt64Compare/1048576/100/min_time:1.000 9275270 ns 9273414 ns
SortToIndicesInt64Compare/8388608/100/min_time:1.000 93126006 ns 93095276 ns
After:
------------------------------------------------------------------------------------------
Benchmark Time CPU
------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 15722 ns 15721 ns
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16007 ns 16006 ns
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 15178 ns 15177 ns
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 29886 ns 29885 ns
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 5968 ns 5968 ns
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 12681 ns 12681 ns
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 813376 ns 813331 ns
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 6543874 ns 6543621 ns
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 189957 ns 189956 ns
ArraySortIndicesInt64Wide/32768/100/min_time:1.000 190269 ns 190267 ns
ArraySortIndicesInt64Wide/32768/10/min_time:1.000 175425 ns 175422 ns
ArraySortIndicesInt64Wide/32768/2/min_time:1.000 107976 ns 107975 ns
ArraySortIndicesInt64Wide/32768/1/min_time:1.000 5941 ns 5941 ns
ArraySortIndicesInt64Wide/32768/0/min_time:1.000 184858 ns 184857 ns
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9355194 ns 9354942 ns
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 94101796 ns 94099551 ns
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns 1386176541 ns
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 1350031141 ns 1349982623 ns
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 1401741018 ns 1401632081 ns
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 1373174328 ns 1373145968 ns
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1035377805 ns 1035362806 ns
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1035218085 ns 1035201824 ns
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 6859319 ns 6859251 ns
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 6847314 ns 6847048 ns
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 626909516 ns 626904696 ns
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 591632035 ns 591602144 ns
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 613197155 ns 613182727 ns
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 595568690 ns 595554829 ns
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 424472984 ns 424453915 ns
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 397339109 ns 397335604 ns
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 7027310 ns 7027241 ns
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 6891364 ns 6891272 ns
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 516832749 ns 516823293 ns
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 495273237 ns 495269975 ns
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 515550360 ns 515531501 ns
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 496670125 ns 496663084 ns
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 340060863 ns 340053172 ns
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 331281499 ns 331277004 ns
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 6691062 ns 6690924 ns
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6384382 ns 6384323 ns
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 622544467 ns 622531557 ns
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 619193843 ns 619155597 ns
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 615885889 ns 615884764 ns
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 589917731 ns 589912005 ns
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 600951149 ns 600947256 ns
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 592304437 ns 592286953 ns
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 98781239 ns 98777123 ns
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 94136230 ns 94085863 ns
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 232323308 ns 232319347 ns
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 223708791 ns 223700834 ns
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 221975360 ns 221968035 ns
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 218214843 ns 218210306 ns
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 224756430 ns 224745751 ns
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 220809969 ns 220809044 ns
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 96159427 ns 96156899 ns
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 92307749 ns 92304526 ns
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 166390841 ns 166382427 ns
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 164576492 ns 164570581 ns
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 165724919 ns 165718584 ns
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 164048003 ns 164046222 ns
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 170131788 ns 170124950 ns
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 170874129 ns 170871245 ns
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 92314674 ns 92312953 ns
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 92023019 ns 92022643 ns
Follow-up tasks:
* Improve performance when the number of sort keys is 2 or greater
* Idea1: Use counting sort for small range data like on array
* Idea2: Use threading for radix sort
* Add support for multi-column partition Nth indices on Table
Closes #8612 from kou/cpp-compute-sort-indices-table
Lead-authored-by: Sutou Kouhei <kou@clear-code.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting.
The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution.
There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name:
What occurs when you do this:
Functioninstance is looked up in the globalFunctionRegistryargs(their types and shapes -- array or scalar), the Function searches forKernelthat is able to process those types and shapes. A kernel might be able to doarray[T0], array[T1]or onlyscalar[T0], scalar[T1], for example. This permits kernel specialization to treat different type and shape combinationsargsbased on whatargscontains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or ScalarExecContext::set_exec_chunksizeTo summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination.
An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels.
There are some other incidental changes in the PR, such as changing the convenience APIs like
Castto returnResult. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs.I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project.
Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues.