ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework #7240

wesm · 2020-05-21T04:50:29Z

This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting.

The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution.

There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name:

Result<Datum> CallFunction(ExecContext* ctx, const std::string& func_name,
                           const std::vector<Datum>& args,
                           const FunctionOptions* options = NULLPTR);

What occurs when you do this:

A Function instance is looked up in the global FunctionRegistry
Given the descriptors of args (their types and shapes -- array or scalar), the Function searches for Kernel that is able to process those types and shapes. A kernel might be able to do array[T0], array[T1] or only scalar[T0], scalar[T1], for example. This permits kernel specialization to treat different type and shape combinations
The kernel is executed iteratively against args based on what args contains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or Scalar
The Executor implementation is able to split contiguous Array inputs into smaller chunks, which is important for parallel execution. See ExecContext::set_exec_chunksize

To summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination.

An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels.

There are some other incidental changes in the PR, such as changing the convenience APIs like Cast to return Result. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs.

I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project.

Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues.

wesm · 2020-05-21T04:52:40Z

For code reviewers:

The code diff is a bit deceiving because many kernel files were renamed to make their taxonomy more clear
I suggest focusing your energies on the files in the arrow/compute top level directory. This is almost entirely new code, what was there before is almost all gone

I am aware that there are some obviously messy things that will need to be cleaned up a bit, but we have to start somewhere.

github-actions · 2020-05-21T05:01:56Z

https://issues.apache.org/jira/browse/ARROW-8792

cpp/CMakeLists.txt

wesm · 2020-05-21T14:54:07Z

Aside from fixing the builds, I'm going to do a few things to help with this:

Add as many comments as possible to the new header files
Add some high level explanation to compute/README.md about how kernels work and how to begin to develop a new one
Add simple Python bindings for FunctionRegistry, Function, and Kernel so that we can inspect the contents of the registry and the available kernels (and their type signatures) for each function

wesm

leaving a handful of cleaning TODOs for myself

cpp/src/arrow/compute/exec.cc

cpp/src/arrow/compute/cast.cc

cpp/src/arrow/compute/exec_internal.h

cpp/src/arrow/compute/kernel.h

cpp/src/arrow/compute/kernels/CMakeLists.txt

cpp/src/arrow/compute/kernels/codegen_internal.h

cpp/src/arrow/compute/kernels/scalar_cast.cc

pitrou

Some questions that came while reading through the API.

pitrou · 2020-05-21T14:11:42Z

cpp/src/arrow/compute/options.h

It's weird to have this as part of the cast options.

Yes, I'm aware, but I wasn't able to figure out another way to communicate the output type to the standard kernel APIs

cpp/src/arrow/compute/registry.cc

cpp/src/arrow/compute/options.h

cpp/src/arrow/compute/kernels/scalar_cast_internal.h

cpp/src/arrow/compute/kernels/codegen_internal.h

cpp/src/arrow/compute/kernels/CMakeLists.txt

cpp/src/arrow/compute/function.h

cpp/src/arrow/compute/kernel.cc

cpp/src/arrow/compute/kernel.h

nealrichardson · 2020-05-21T15:42:54Z

The R failure:

  ── 1. Error: filter() on timestamp columns (@test-dataset.R#381)  ──────────────
  NotImplemented: Function >= has no kernel matching input types (array[timestamp[us, tz=UTC]], scalar[timestamp[us, tz=UTC]])
  Backtrace:
    1. arrow:::expect_equivalent(...)
    2. dplyr::filter(., ts >= lubridate::ymd_hms("2015-05-04 03:12:39"))
    2. dplyr::filter(., part == 1)
    2. dplyr::select(., ts)
   10. dplyr::collect(.)
   13. Scanner$create(x)$ToTable()
   16. arrow:::dataset___Scanner__ToTable(self)

I'm happy to help debug but that sounds pretty straightforward.

wesm · 2020-05-21T15:46:08Z

@nealrichardson yeah, I will fix (wasn't thinking about timezones when adding the dispatch rules for comparisons)

cpp/src/arrow/compute/registry.cc

kou · 2020-05-21T21:16:40Z

Port GLib bindings

Can I start working on this on this branch?

wesm · 2020-05-21T21:59:58Z

@kou yes please go ahead, I'm sure you'll be careful about rebasing since I'm going to push some commits to the branch later today probably in response to the comments above

kou · 2020-05-21T23:43:59Z

OK!

wesm · 2020-05-22T01:37:29Z

I rebased without pushing any new changes

cpp/src/arrow/compute/api_scalar.cc

emkornfield · 2020-05-22T03:02:06Z

CC @brills I'm not sure if this impacts TFX at all.

wesm · 2020-05-22T03:03:18Z

@emkornfield @brills no Python APIs are changed (not sure how much of the C++ API TFX uses)

cpp/src/arrow/compute/exec.h

cpp/src/arrow/compute/exec_internal.h

emkornfield · 2020-05-22T03:12:17Z

cpp/src/arrow/compute/function.h

+
+/// \brief Contains the number of required arguments for the function
+struct ARROW_EXPORT FunctionArity {
+  static FunctionArity Nullary() { return FunctionArity(0, false); }


nit: Void or Null for name?

Hmm https://en.wikipedia.org/wiki/Arity

emkornfield · 2020-05-22T03:14:19Z

cpp/src/arrow/compute/function.h

+  static FunctionArity Unary() { return FunctionArity(1, false); }
+  static FunctionArity Binary() { return FunctionArity(2, false); }
+  static FunctionArity Ternary() { return FunctionArity(3, false); }
+  static FunctionArity Varargs(int min_args = 1) { return FunctionArity(min_args, true); }


nit: VarArgs

It also seems like it might be more readable for callers to have to always specify min_args (especially because it isn't clear to me why 1 is the default and not 0 or 2

You need at least one argument spec for type checking

I'm changing the name, but will leave the default issue for follow up work

cpp/src/arrow/compute/function.h

wesm · 2020-05-22T03:35:57Z

I got sucked into some other things today, I'll focus on getting the CI passing tomorrow so we can hopefully go into the weekend with a green build

cpp/src/arrow/compute/kernels/codegen_internal.h

cpp/src/arrow/compute/kernels/scalar_cast_nested.cc

cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc

… bindings for FilterOptions for Python, support to call_function

wesm · 2020-05-24T13:16:19Z

@jorisvandenbossche I fixed the segfault you hit and now have changed the implementation of Array.filter to be

pc = _pc()
options = pc.FilterOptions(null_selection_behavior)
return pc.call_function('filter', [self, mask], options)

I also rebased -- I am going to merge this when the build is passing so we do not continue to stack changes that need to be reviewed on this PR. I welcome further code reviews on this PR or follow up JIRAs

wesm · 2020-05-24T13:35:22Z

+1

wesm · 2020-05-24T14:35:57Z

The PR has been merged, but the code review feature is still available. Please continue to leave comments and I will address them

…tation and execution framework This patch is a major reworking of our development strategy for implementing array-valued functions and applying them in a query processing setting. The design was partly inspired by my previous work designing Ibis (https://github.com/ibis-project/ibis -- the "expr" subsystem and the way that operators validate input types and resolve output types). Using only function names and input types, you can determine the output types of each function and resolve the "execute" function that performs a unit of work processing a batch of data. This will allow us to build deferred column expressions and then (eventually) do parallel execution. There are a ton of details, but one nice thing is that there is now a single API entry point for invoking any function by its name: ```c++ Result<Datum> CallFunction(ExecContext* ctx, const std::string& func_name, const std::vector<Datum>& args, const FunctionOptions* options = NULLPTR); ``` What occurs when you do this: * A `Function` instance is looked up in the global `FunctionRegistry` * Given the descriptors of `args` (their types and shapes -- array or scalar), the Function searches for `Kernel` that is able to process those types and shapes. A kernel might be able to do `array[T0], array[T1]` or only `scalar[T0], scalar[T1]`, for example. This permits kernel specialization to treat different type and shape combinations * The kernel is executed iteratively against `args` based on what `args` contains -- if there are ChunkedArrays, they will be split into contiguous pieces. Kernels never see ChunkedArray, only Array or Scalar * The Executor implementation is able to split contiguous Array inputs into smaller chunks, which is important for parallel execution. See `ExecContext::set_exec_chunksize` To summarize: the REGISTRY contains FUNCTIONS. A FUNCTION contains KERNELS. A KERNEL is a specific implementation of a function that services a particular type combination. An additional effort in this patch is to radically simplify the process of creating kernels that are based on a scalar function. To do this, there is a growing collection of template-based kernel generation classes in compute/kernels/codegen_internal.h that will surely be the topic of much debate. I want to make it a lot easier for people to add new kernels. There are some other incidental changes in the PR, such as changing the convenience APIs like `Cast` to return `Result`. I'm afraid we may have to live with the API breakage unless someone else wants to add backward compatibility code for the old APIs. I have to apologize for making such a large PR. I've been working long hours on this for nearly a month and the process of porting all of our existing functionality and making the unit tests pass caused much iteration in the "framework" part of the code, such that it would have been a huge time drain to review incomplete iterations of the framework that had not been proven to capture the functionality that previously existed in the project. Given the size of this PR and that fact that it completely blocks any work into src/arrow/compute, I don't think we should let this sit unmerged for more than 4 or 5 days, tops. I'm committed to responding to all of your questions and working to address your feedback about the design and improving the documentation and code comments. I tried to leave copious comments to explain my thought process in various places. Feel free to make any and all comments in this PR or in whatever form you like. I don't think that merging should be blocked on stylistic issues. Closes apache#7240 from wesm/ARROW-8792-kernels-revamp Lead-authored-by: Wes McKinney <wesm+git@apache.org> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Wes McKinney <wesm+git@apache.org>

…nd outdated language I also fixed the `Arity` ctor to be explicit which I thought I'd done in #7240 but it is taken care of here. Closes #7264 from wesm/ARROW-8926 Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>

wesm · 2020-05-31T00:27:37Z

I'll leave this PR open until next week sometime to allow more time for comments

wesm · 2020-06-07T17:13:23Z

Closing the PR again. Please open JIRA issues or review other PRs to provide more feedback.

Summary: * Deprecate SortToIndices() * Add SortIndices() because we use "sort_indices" as kernel name in apache#7240 * Add support for sort indices in descending order on Array * Rename existing "sort_indices" kernel to "array_sort_indices" and introduce "sort_indices" meta function to support RecordBatch and Table Benchmark: Summary: * No performance regression in existing sort on array * Same performance as sort on array when the number of sort keys is 1 * 1.5x-100x slower than sort on array when the number of sort keys is 2 Before: ---------------------------------------------------------------------------------- Benchmark Time CPU ---------------------------------------------------------------------------------- SortToIndicesInt64Count/32768/10000/min_time:1.000 15685 ns 15682 ns SortToIndicesInt64Count/32768/100/min_time:1.000 15961 ns 15957 ns SortToIndicesInt64Count/32768/10/min_time:1.000 16473 ns 16469 ns SortToIndicesInt64Count/32768/2/min_time:1.000 27993 ns 27987 ns SortToIndicesInt64Count/32768/1/min_time:1.000 5609 ns 5608 ns SortToIndicesInt64Count/32768/0/min_time:1.000 13143 ns 13141 ns SortToIndicesInt64Count/1048576/1/min_time:1.000 134695 ns 134670 ns SortToIndicesInt64Count/8388608/1/min_time:1.000 1243992 ns 1243260 ns SortToIndicesInt64Compare/32768/10000/min_time:1.000 193632 ns 193595 ns SortToIndicesInt64Compare/32768/100/min_time:1.000 194876 ns 194837 ns SortToIndicesInt64Compare/32768/10/min_time:1.000 182362 ns 182324 ns SortToIndicesInt64Compare/32768/2/min_time:1.000 111607 ns 111584 ns SortToIndicesInt64Compare/32768/1/min_time:1.000 5642 ns 5641 ns SortToIndicesInt64Compare/32768/0/min_time:1.000 190302 ns 190268 ns SortToIndicesInt64Compare/1048576/1/min_time:1.000 134743 ns 134718 ns SortToIndicesInt64Compare/8388608/1/min_time:1.000 1261404 ns 1249362 ns After: ------------------------------------------------------------------------------------- Benchmark Time CPU ------------------------------------------------------------------------------------- ArraySortIndicesInt64Count/32768/10000/min_time:1.000 14769 ns 14766 ns ArraySortIndicesInt64Count/32768/100/min_time:1.000 15207 ns 15204 ns ArraySortIndicesInt64Count/32768/10/min_time:1.000 15892 ns 15889 ns ArraySortIndicesInt64Count/32768/2/min_time:1.000 30107 ns 30100 ns ArraySortIndicesInt64Count/32768/1/min_time:1.000 5509 ns 5508 ns ArraySortIndicesInt64Count/32768/0/min_time:1.000 12699 ns 12696 ns ArraySortIndicesInt64Count/1048576/1/min_time:1.000 132585 ns 132557 ns ArraySortIndicesInt64Count/8388608/1/min_time:1.000 1236596 ns 1235842 ns ArraySortIndicesInt64Compare/32768/10000/min_time:1.000 193259 ns 193217 ns ArraySortIndicesInt64Compare/32768/100/min_time:1.000 190010 ns 189973 ns ArraySortIndicesInt64Compare/32768/10/min_time:1.000 179923 ns 179879 ns ArraySortIndicesInt64Compare/32768/2/min_time:1.000 111100 ns 111074 ns ArraySortIndicesInt64Compare/32768/1/min_time:1.000 5660 ns 5659 ns ArraySortIndicesInt64Compare/32768/0/min_time:1.000 186521 ns 186476 ns ArraySortIndicesInt64Compare/1048576/1/min_time:1.000 132863 ns 132832 ns ArraySortIndicesInt64Compare/8388608/1/min_time:1.000 1266383 ns 1265765 ns TableSortIndicesInt64Count/32768/10000/min_time:1.000 21812 ns 21807 ns TableSortIndicesInt64Count/32768/100/min_time:1.000 22494 ns 22490 ns TableSortIndicesInt64Count/32768/10/min_time:1.000 17300 ns 17296 ns TableSortIndicesInt64Count/32768/2/min_time:1.000 29927 ns 29921 ns TableSortIndicesInt64Count/32768/1/min_time:1.000 5877 ns 5875 ns TableSortIndicesInt64Count/32768/0/min_time:1.000 20394 ns 20390 ns TableSortIndicesInt64Count/1048576/1/min_time:1.000 132904 ns 132871 ns TableSortIndicesInt64Count/8388608/1/min_time:1.000 1342693 ns 1341943 ns TableSortIndicesInt64Compare/32768/10000/min_time:1.000 203163 ns 203106 ns TableSortIndicesInt64Compare/32768/100/min_time:1.000 199531 ns 199477 ns TableSortIndicesInt64Compare/32768/10/min_time:1.000 185968 ns 185916 ns TableSortIndicesInt64Compare/32768/2/min_time:1.000 113571 ns 113540 ns TableSortIndicesInt64Compare/32768/1/min_time:1.000 6251 ns 6249 ns TableSortIndicesInt64Compare/32768/0/min_time:1.000 183650 ns 183609 ns TableSortIndicesInt64Compare/1048576/1/min_time:1.000 131701 ns 131674 ns TableSortIndicesInt64Compare/8388608/1/min_time:1.000 1264413 ns 1263622 ns TableSortIndicesInt64Int64/32768/10000/min_time:1.000 313368 ns 313310 ns TableSortIndicesInt64Int64/32768/100/min_time:1.000 313425 ns 313361 ns TableSortIndicesInt64Int64/32768/10/min_time:1.000 337051 ns 336987 ns TableSortIndicesInt64Int64/32768/2/min_time:1.000 402063 ns 401973 ns TableSortIndicesInt64Int64/32768/1/min_time:1.000 254660 ns 254612 ns TableSortIndicesInt64Int64/32768/0/min_time:1.000 305887 ns 305815 ns TableSortIndicesInt64Int64/1048576/1/min_time:1.000 11157920 ns 11155020 ns TableSortIndicesInt64Int64/8388608/1/min_time:1.000 106529839 ns 106501576 ns Follow-up tasks: * Improve performance when the number of sort keys is 2 or greater * Idea1: Use counting sort for small range data like on array * Idea2: Use threading and merge sort * Add support multi-column partition Nth indices on Table

Summary: * Deprecate SortToIndices() * Add SortIndices() because we use "sort_indices" as kernel name in #7240 * Add support for sort indices in descending order on Array * Rename existing "sort_indices" kernel to "array_sort_indices" and introduce "sort_indices" meta function to support ChunkedArray, RecordBatch and Table * Require benchmark 1.5.2 or later Benchmark: Summary: * No performance regression in existing sort on array * Same performance as sort on array when the number of sort keys is 1 * 1.5x-100x slower than sort on array when the number of sort keys is 2 Before: ---------------------------------------------------------------------------------- Benchmark Time CPU ---------------------------------------------------------------------------------- SortToIndicesInt64Count/32768/10000/min_time:1.000 16057 ns 16054 ns SortToIndicesInt64Count/32768/100/min_time:1.000 16592 ns 16589 ns SortToIndicesInt64Count/32768/10/min_time:1.000 15979 ns 15975 ns SortToIndicesInt64Count/32768/2/min_time:1.000 28379 ns 28372 ns SortToIndicesInt64Count/32768/1/min_time:1.000 5767 ns 5766 ns SortToIndicesInt64Count/32768/0/min_time:1.000 12560 ns 12557 ns SortToIndicesInt64Count/1048576/100/min_time:1.000 729683 ns 729497 ns SortToIndicesInt64Count/8388608/100/min_time:1.000 6696415 ns 6693711 ns SortToIndicesInt64Compare/32768/10000/min_time:1.000 190488 ns 190442 ns SortToIndicesInt64Compare/32768/100/min_time:1.000 190879 ns 190838 ns SortToIndicesInt64Compare/32768/10/min_time:1.000 179156 ns 179115 ns SortToIndicesInt64Compare/32768/2/min_time:1.000 109098 ns 109074 ns SortToIndicesInt64Compare/32768/1/min_time:1.000 5682 ns 5681 ns SortToIndicesInt64Compare/32768/0/min_time:1.000 185022 ns 184984 ns SortToIndicesInt64Compare/1048576/100/min_time:1.000 9275270 ns 9273414 ns SortToIndicesInt64Compare/8388608/100/min_time:1.000 93126006 ns 93095276 ns After: ------------------------------------------------------------------------------------------ Benchmark Time CPU ------------------------------------------------------------------------------------------ ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 15722 ns 15721 ns ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16007 ns 16006 ns ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 15178 ns 15177 ns ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 29886 ns 29885 ns ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 5968 ns 5968 ns ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 12681 ns 12681 ns ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 813376 ns 813331 ns ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 6543874 ns 6543621 ns ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 189957 ns 189956 ns ArraySortIndicesInt64Wide/32768/100/min_time:1.000 190269 ns 190267 ns ArraySortIndicesInt64Wide/32768/10/min_time:1.000 175425 ns 175422 ns ArraySortIndicesInt64Wide/32768/2/min_time:1.000 107976 ns 107975 ns ArraySortIndicesInt64Wide/32768/1/min_time:1.000 5941 ns 5941 ns ArraySortIndicesInt64Wide/32768/0/min_time:1.000 184858 ns 184857 ns ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9355194 ns 9354942 ns ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 94101796 ns 94099551 ns TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns 1386176541 ns TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 1350031141 ns 1349982623 ns TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 1401741018 ns 1401632081 ns TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 1373174328 ns 1373145968 ns TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1035377805 ns 1035362806 ns TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1035218085 ns 1035201824 ns TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 6859319 ns 6859251 ns TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 6847314 ns 6847048 ns TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 626909516 ns 626904696 ns TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 591632035 ns 591602144 ns TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 613197155 ns 613182727 ns TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 595568690 ns 595554829 ns TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 424472984 ns 424453915 ns TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 397339109 ns 397335604 ns TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 7027310 ns 7027241 ns TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 6891364 ns 6891272 ns TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 516832749 ns 516823293 ns TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 495273237 ns 495269975 ns TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 515550360 ns 515531501 ns TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 496670125 ns 496663084 ns TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 340060863 ns 340053172 ns TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 331281499 ns 331277004 ns TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 6691062 ns 6690924 ns TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6384382 ns 6384323 ns TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 622544467 ns 622531557 ns TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 619193843 ns 619155597 ns TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 615885889 ns 615884764 ns TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 589917731 ns 589912005 ns TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 600951149 ns 600947256 ns TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 592304437 ns 592286953 ns TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 98781239 ns 98777123 ns TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 94136230 ns 94085863 ns TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 232323308 ns 232319347 ns TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 223708791 ns 223700834 ns TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 221975360 ns 221968035 ns TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 218214843 ns 218210306 ns TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 224756430 ns 224745751 ns TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 220809969 ns 220809044 ns TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 96159427 ns 96156899 ns TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 92307749 ns 92304526 ns TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 166390841 ns 166382427 ns TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 164576492 ns 164570581 ns TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 165724919 ns 165718584 ns TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 164048003 ns 164046222 ns TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 170131788 ns 170124950 ns TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 170874129 ns 170871245 ns TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 92314674 ns 92312953 ns TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 92023019 ns 92022643 ns Follow-up tasks: * Improve performance when the number of sort keys is 2 or greater * Idea1: Use counting sort for small range data like on array * Idea2: Use threading for radix sort * Add support for multi-column partition Nth indices on Table Closes #8612 from kou/cpp-compute-sort-indices-table Lead-authored-by: Sutou Kouhei <kou@clear-code.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

wesm force-pushed the ARROW-8792-kernels-revamp branch from d1dca6d to eb7f776 Compare May 21, 2020 04:57

wesm commented May 21, 2020

View reviewed changes

cpp/CMakeLists.txt Outdated Show resolved Hide resolved

wesm commented May 21, 2020

View reviewed changes

pitrou reviewed May 21, 2020

View reviewed changes

fsaintjacques reviewed May 21, 2020

View reviewed changes

cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved

wesm force-pushed the ARROW-8792-kernels-revamp branch from eb7f776 to b3286d3 Compare May 22, 2020 01:37