Skip to content

Conversation

@cyb70289
Copy link
Contributor

@cyb70289 cyb70289 commented Apr 19, 2021

Arrow handles errors by returning Status/Result. But in compute kernels,
errors are populated in KernelContext.status. This is not consistent,
and updating KernelContext.status is not thread safe.

This patch removes KernelContext.status and returns kernel errors as
Status/Result.

See big performance improvement for arithmetic kernels, especially the
checked version (up to 4x).

Also see ~50% drops from some filter kernels. Will investigate deeper
as follow up task.

@cyb70289 cyb70289 marked this pull request as draft April 19, 2021 10:15
@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 19, 2021

Turns out a much bigger change than I thought.
Better to merge after 4.0 release.

Some hints maybe helpful to reviewer:

  • Start with kernel.h, exec.cc, function.cc.
  • Review codegen_internal.h for common kernel implementation changes.
  • Review scalar_arithmetic.cc to see changed error population.
  • Most changes are similar and boring.

@github-actions
Copy link

@cyb70289
Copy link
Contributor Author

@github-actions crossbow submit -g nightly

@github-actions
Copy link

Revision: b7229010b5413113a6274f7ad6d6d9ad900ff2e8

Submitted crossbow builds: ursacomputing/crossbow @ actions-336

Task Status
centos-7-amd64 Github Actions
centos-8-amd64 Github Actions
centos-8-arm64 TravisCI
conda-clean Azure
conda-linux-gcc-py36-arm64 Drone
conda-linux-gcc-py36-cpu-r36 Azure
conda-linux-gcc-py36-cuda Azure
conda-linux-gcc-py37-arm64 Drone
conda-linux-gcc-py37-cpu-r40 Azure
conda-linux-gcc-py37-cuda Azure
conda-linux-gcc-py38-arm64 Drone
conda-linux-gcc-py38-cpu Azure
conda-linux-gcc-py38-cuda Azure
conda-linux-gcc-py39-arm64 Drone
conda-linux-gcc-py39-cpu Azure
conda-linux-gcc-py39-cuda Azure
conda-osx-arm64-clang-py38 Azure
conda-osx-arm64-clang-py39 Azure
conda-osx-clang-py36-r36 Azure
conda-osx-clang-py37-r40 Azure
conda-osx-clang-py38 Azure
conda-osx-clang-py39 Azure
conda-win-vs2017-py36-r36 Azure
conda-win-vs2017-py37-r40 Azure
conda-win-vs2017-py38 Azure
conda-win-vs2017-py39 Azure
debian-bullseye-amd64 Github Actions
debian-bullseye-arm64 TravisCI
debian-buster-amd64 Github Actions
debian-buster-arm64 TravisCI
example-cpp-minimal-build-static Github Actions
example-cpp-minimal-build-static-system-dependency Github Actions
gandiva-jar-osx Github Actions
gandiva-jar-ubuntu Github Actions
homebrew-cpp Github Actions
homebrew-r-autobrew Github Actions
nuget Github Actions
python-sdist Github Actions
test-build-vcpkg-win Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Github Actions
test-conda-python-3.6 Github Actions
test-conda-python-3.6-pandas-0.23 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-dask-latest Github Actions
test-conda-python-3.7-hdfs-3.2 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-pandas-master Github Actions
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.7-turbodbc-latest Github Actions
test-conda-python-3.7-turbodbc-master Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-dask-master Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-jpype Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-master Github Actions
test-conda-python-3.9 Github Actions
test-debian-10-cpp Github Actions
test-debian-10-go-1.15 Azure
test-debian-10-python-3 Azure
test-debian-c-glib Github Actions
test-debian-ruby Github Actions
test-fedora-33-cpp Github Actions
test-fedora-33-python-3 Azure
test-r-devdocs Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-minimal-build Azure
test-r-rhub-ubuntu-gcc-release Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-version-compatibility Github Actions
test-r-versions Github Actions
test-r-without-arrow Azure
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-18.04-python-3 Azure
test-ubuntu-18.04-r-sanitizer Azure
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-20.10-docs Azure
test-ubuntu-c-glib Github Actions
test-ubuntu-ruby Github Actions
ubuntu-bionic-amd64 Github Actions
ubuntu-bionic-arm64 TravisCI
ubuntu-focal-amd64 Github Actions
ubuntu-focal-arm64 TravisCI
ubuntu-groovy-amd64 Github Actions
ubuntu-groovy-arm64 TravisCI
wheel-manylinux2010-cp36-amd64 Github Actions
wheel-manylinux2010-cp37-amd64 Github Actions
wheel-manylinux2010-cp38-amd64 Github Actions
wheel-manylinux2010-cp39-amd64 Github Actions
wheel-manylinux2014-cp36-amd64 Github Actions
wheel-manylinux2014-cp36-arm64 TravisCI
wheel-manylinux2014-cp37-amd64 Github Actions
wheel-manylinux2014-cp37-arm64 TravisCI
wheel-manylinux2014-cp38-amd64 Github Actions
wheel-manylinux2014-cp38-arm64 TravisCI
wheel-manylinux2014-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-arm64 TravisCI
wheel-osx-high-sierra-cp36 Github Actions
wheel-osx-high-sierra-cp37 Github Actions
wheel-osx-high-sierra-cp38 Github Actions
wheel-osx-high-sierra-cp39 Github Actions
wheel-osx-mavericks-cp36 Github Actions
wheel-osx-mavericks-cp37 Github Actions
wheel-osx-mavericks-cp38 Github Actions
wheel-osx-mavericks-cp39 Github Actions
wheel-windows-cp36 Github Actions
wheel-windows-cp37 Github Actions
wheel-windows-cp38 Github Actions
wheel-windows-cp39 Github Actions

@cyb70289
Copy link
Contributor Author

cyb70289 commented Apr 20, 2021

Benchmark diff of all computer kernels, skylake, clang-9. Removed tests with less than 10% deviation.
Full log attached bench.log

$ archery benchmark diff --suite-filter="arrow-compute.*" --cc=clang-9 --cxx=clang++-9

-------------------------------------------------------------------------------------------------------------
Non-regressions: (1022)
-------------------------------------------------------------------------------------------------------------
                                                  benchmark            baseline           contender  change %
// Big improvement for scalar arithmetic kernels
         ArrayScalarKernel<AddChecked, UInt8Type>/1048576/0     349.774 MiB/sec       1.747 GiB/sec   411.453
         ArrayScalarKernel<AddChecked, Int32Type>/1048576/0       1.376 GiB/sec       6.769 GiB/sec   391.759
        ArrayScalarKernel<AddChecked, UInt32Type>/1048576/0       1.394 GiB/sec       6.826 GiB/sec   389.795
   ArrayScalarKernel<SubtractChecked, UInt32Type>/1048576/0       1.398 GiB/sec       6.777 GiB/sec   384.917
    ArrayScalarKernel<SubtractChecked, Int32Type>/1048576/0       1.418 GiB/sec       6.772 GiB/sec   377.701
   ArrayScalarKernel<MultiplyChecked, UInt64Type>/1048576/0       2.007 GiB/sec       8.868 GiB/sec   341.724
     ArrayScalarKernel<SubtractChecked, Int8Type>/1048576/0     366.740 MiB/sec       1.505 GiB/sec   320.282
         ArrayScalarKernel<AddChecked, Int64Type>/1048576/0       2.661 GiB/sec      11.028 GiB/sec   314.509
        ArrayScalarKernel<AddChecked, UInt64Type>/1048576/0       2.664 GiB/sec      11.040 GiB/sec   314.463
   ArrayScalarKernel<SubtractChecked, UInt64Type>/1048576/0       2.664 GiB/sec      11.029 GiB/sec   314.009
    ArrayScalarKernel<MultiplyChecked, Int64Type>/1048576/0       2.664 GiB/sec      11.003 GiB/sec   313.000
    ArrayScalarKernel<SubtractChecked, Int64Type>/1048576/0       2.669 GiB/sec      10.977 GiB/sec   311.283
    ArrayScalarKernel<MultiplyChecked, UInt8Type>/1048576/0     337.150 MiB/sec       1.178 GiB/sec   257.666
     ArrayScalarKernel<MultiplyChecked, Int8Type>/1048576/0     337.496 MiB/sec       1.177 GiB/sec   257.252
   ArrayScalarKernel<SubtractChecked, UInt16Type>/1048576/0     700.270 MiB/sec       2.340 GiB/sec   242.181
    ArrayScalarKernel<SubtractChecked, Int16Type>/1048576/0     704.317 MiB/sec       2.341 GiB/sec   240.410
    ArrayScalarKernel<SubtractChecked, UInt8Type>/1048576/0     354.861 MiB/sec       1.179 GiB/sec   240.318
         ArrayScalarKernel<AddChecked, Int16Type>/1048576/0     708.315 MiB/sec       2.345 GiB/sec   238.971
    ArrayScalarKernel<MultiplyChecked, Int16Type>/1048576/0     709.473 MiB/sec       2.345 GiB/sec   238.434
    ArrayScalarKernel<MultiplyChecked, Int32Type>/1048576/0       1.378 GiB/sec       4.626 GiB/sec   235.769
        ArrayScalarKernel<AddChecked, UInt16Type>/1048576/0     716.639 MiB/sec       2.347 GiB/sec   235.301
          ArrayScalarKernel<AddChecked, Int8Type>/1048576/0     360.752 MiB/sec       1.178 GiB/sec   234.405
   ArrayScalarKernel<MultiplyChecked, UInt32Type>/1048576/0       1.394 GiB/sec       4.615 GiB/sec   231.130
       ArrayScalarKernel<AddChecked, UInt8Type>/1048576/100     282.095 MiB/sec     833.895 MiB/sec   195.608
 ArrayScalarKernel<SubtractChecked, UInt32Type>/1048576/100       1.257 GiB/sec       3.402 GiB/sec   170.611
      ArrayScalarKernel<AddChecked, UInt32Type>/1048576/100       1.250 GiB/sec       3.334 GiB/sec   166.715
 ArrayScalarKernel<MultiplyChecked, UInt64Type>/1048576/100       1.940 GiB/sec       5.131 GiB/sec   164.530
       ArrayScalarKernel<AddChecked, Int64Type>/1048576/100       2.490 GiB/sec       6.560 GiB/sec   163.506
  ArrayScalarKernel<SubtractChecked, Int32Type>/1048576/100       1.242 GiB/sec       3.265 GiB/sec   162.862
 ArrayScalarKernel<SubtractChecked, UInt64Type>/1048576/100       2.500 GiB/sec       6.570 GiB/sec   162.842
       ArrayScalarKernel<AddChecked, Int32Type>/1048576/100       1.300 GiB/sec       3.374 GiB/sec   159.518
      ArrayScalarKernel<AddChecked, UInt64Type>/1048576/100       2.487 GiB/sec       6.454 GiB/sec   159.469
      ArrayScalarKernel<AddChecked, UInt16Type>/1048576/100     613.857 MiB/sec       1.538 GiB/sec   156.520
  ArrayScalarKernel<MultiplyChecked, Int64Type>/1048576/100       2.487 GiB/sec       6.377 GiB/sec   156.476
  ArrayScalarKernel<SubtractChecked, Int64Type>/1048576/100       2.485 GiB/sec       6.348 GiB/sec   155.494
  ArrayScalarKernel<MultiplyChecked, Int32Type>/1048576/100       1.186 GiB/sec       3.005 GiB/sec   153.449
   ArrayScalarKernel<MultiplyChecked, UInt16Type>/1048576/0     719.088 MiB/sec       1.768 GiB/sec   151.795
  ArrayScalarKernel<SubtractChecked, Int16Type>/1048576/100     616.207 MiB/sec       1.504 GiB/sec   149.924
  ArrayScalarKernel<SubtractChecked, UInt8Type>/1048576/100     313.662 MiB/sec     781.336 MiB/sec   149.101
 ArrayScalarKernel<SubtractChecked, UInt16Type>/1048576/100     634.606 MiB/sec       1.535 GiB/sec   147.738
       ArrayScalarKernel<AddChecked, Int16Type>/1048576/100     635.095 MiB/sec       1.503 GiB/sec   142.399
   ArrayScalarKernel<SubtractChecked, Int8Type>/1048576/100     326.252 MiB/sec     788.643 MiB/sec   141.728
  ArrayScalarKernel<MultiplyChecked, UInt8Type>/1048576/100     296.782 MiB/sec     712.569 MiB/sec   140.099
  ArrayScalarKernel<MultiplyChecked, Int16Type>/1048576/100     647.690 MiB/sec       1.499 GiB/sec   136.932
        ArrayScalarKernel<AddChecked, Int8Type>/1048576/100     327.534 MiB/sec     775.919 MiB/sec   136.897
              ArrayScalarKernel<Divide, Int8Type>/1048576/0     208.195 MiB/sec     487.420 MiB/sec   134.117
   ArrayScalarKernel<MultiplyChecked, Int8Type>/1048576/100     310.126 MiB/sec     718.038 MiB/sec   131.531
            ArrayScalarKernel<Divide, UInt32Type>/1048576/0     970.831 MiB/sec       2.174 GiB/sec   129.285
            ArrayScalarKernel<Divide, UInt16Type>/1048576/0     488.664 MiB/sec       1.087 GiB/sec   127.734
 ArrayScalarKernel<MultiplyChecked, UInt32Type>/1048576/100       1.161 GiB/sec       2.622 GiB/sec   125.901
             ArrayScalarKernel<Divide, Int32Type>/1048576/0     861.831 MiB/sec       1.891 GiB/sec   124.636
                            FilterRecordBatchNoNulls/100/10       2.638 GiB/sec       5.790 GiB/sec   119.457
 ArrayScalarKernel<MultiplyChecked, UInt16Type>/1048576/100     588.209 MiB/sec       1.160 GiB/sec   101.878
          ArrayScalarKernel<Divide, UInt32Type>/1048576/100     818.218 MiB/sec       1.570 GiB/sec    96.480
            ArrayScalarKernel<Divide, Int8Type>/1048576/100     197.748 MiB/sec     373.719 MiB/sec    88.987
             ArrayScalarKernel<Divide, Int16Type>/1048576/0     522.489 MiB/sec     973.035 MiB/sec    86.231
             ArrayScalarKernel<Divide, UInt8Type>/1048576/0     303.411 MiB/sec     561.742 MiB/sec    85.142
     ArrayScalarKernel<DivideChecked, UInt16Type>/1048576/0     611.330 MiB/sec       1.096 GiB/sec    83.519
           ArrayScalarKernel<Divide, Int32Type>/1048576/100     824.261 MiB/sec       1.474 GiB/sec    83.157
     ArrayScalarKernel<DivideChecked, UInt32Type>/1048576/0       1.149 GiB/sec       2.022 GiB/sec    75.941
          ArrayScalarKernel<Divide, UInt16Type>/1048576/100     438.420 MiB/sec     762.859 MiB/sec    74.002
           ArrayScalarKernel<Divide, UInt8Type>/1048576/100     242.966 MiB/sec     413.023 MiB/sec    69.992
           ArrayScalarKernel<Divide, Int16Type>/1048576/100     465.203 MiB/sec     757.562 MiB/sec    62.845
   ArrayScalarKernel<DivideChecked, UInt32Type>/1048576/100     948.407 MiB/sec       1.441 GiB/sec    55.582
      ArrayScalarKernel<DivideChecked, UInt8Type>/1048576/0     332.761 MiB/sec     514.054 MiB/sec    54.482
                             FilterRecordBatchNoNulls/100/4       2.512 GiB/sec       3.879 GiB/sec    54.422
                                MinMaxKernelInt16/1048576/1      39.612 GiB/sec      60.771 GiB/sec    53.417
                           FilterRecordBatchWithNulls/100/9       4.477 GiB/sec       6.698 GiB/sec    49.612
   ArrayScalarKernel<DivideChecked, UInt16Type>/1048576/100     521.495 MiB/sec     761.777 MiB/sec    46.075
            ArrayScalarKernel<Divide, UInt64Type>/1048576/0     937.700 MiB/sec       1.327 GiB/sec    44.893
    ArrayArrayKernel<MultiplyChecked, UInt32Type>/1048576/0     674.480 MiB/sec     958.659 MiB/sec    42.133
     ArrayArrayKernel<SubtractChecked, Int16Type>/1048576/0     338.890 MiB/sec     479.176 MiB/sec    41.396
          ArrayArrayKernel<AddChecked, Int16Type>/1048576/0     339.763 MiB/sec     479.439 MiB/sec    41.109
          ArrayArrayKernel<AddChecked, Int32Type>/1048576/0     676.750 MiB/sec     954.291 MiB/sec    41.011
     ArrayArrayKernel<SubtractChecked, Int32Type>/1048576/0     676.800 MiB/sec     954.080 MiB/sec    40.969
     ArrayArrayKernel<MultiplyChecked, Int32Type>/1048576/0     678.729 MiB/sec     954.374 MiB/sec    40.612
     ArrayArrayKernel<SubtractChecked, Int64Type>/1048576/0       1.322 GiB/sec       1.851 GiB/sec    40.030
          ArrayArrayKernel<AddChecked, Int64Type>/1048576/0       1.322 GiB/sec       1.851 GiB/sec    40.007
     ArrayArrayKernel<SubtractChecked, UInt8Type>/1048576/0     169.620 MiB/sec     236.246 MiB/sec    39.280
         ArrayArrayKernel<AddChecked, UInt16Type>/1048576/0     339.027 MiB/sec     471.381 MiB/sec    39.040
    ArrayArrayKernel<SubtractChecked, UInt32Type>/1048576/0     675.001 MiB/sec     938.000 MiB/sec    38.963
      ArrayArrayKernel<MultiplyChecked, Int8Type>/1048576/0     166.941 MiB/sec     231.713 MiB/sec    38.800
    ArrayArrayKernel<SubtractChecked, UInt16Type>/1048576/0     339.110 MiB/sec     470.670 MiB/sec    38.795
      ArrayArrayKernel<SubtractChecked, Int8Type>/1048576/0     169.938 MiB/sec     235.311 MiB/sec    38.469
          ArrayArrayKernel<AddChecked, UInt8Type>/1048576/0     170.228 MiB/sec     235.584 MiB/sec    38.393
         ArrayArrayKernel<AddChecked, UInt32Type>/1048576/0     678.413 MiB/sec     938.478 MiB/sec    38.334
     ArrayArrayKernel<MultiplyChecked, Int16Type>/1048576/0     340.105 MiB/sec     470.034 MiB/sec    38.203
    ArrayArrayKernel<MultiplyChecked, UInt16Type>/1048576/0     339.876 MiB/sec     469.177 MiB/sec    38.043
    ArrayArrayKernel<MultiplyChecked, UInt64Type>/1048576/0       1.301 GiB/sec       1.795 GiB/sec    37.992
    ArrayArrayKernel<SubtractChecked, UInt64Type>/1048576/0       1.320 GiB/sec       1.820 GiB/sec    37.871
         ArrayArrayKernel<AddChecked, UInt64Type>/1048576/0       1.321 GiB/sec       1.818 GiB/sec    37.610
     ArrayArrayKernel<MultiplyChecked, Int64Type>/1048576/0       1.318 GiB/sec       1.812 GiB/sec    37.487
           ArrayArrayKernel<AddChecked, Int8Type>/1048576/0     171.415 MiB/sec     234.567 MiB/sec    36.841
             ArrayArrayKernel<Divide, UInt16Type>/1048576/0     342.233 MiB/sec     467.884 MiB/sec    36.715
     ArrayArrayKernel<MultiplyChecked, UInt8Type>/1048576/0     170.257 MiB/sec     232.121 MiB/sec    36.336
          ArrayScalarKernel<Divide, UInt64Type>/1048576/100     855.306 MiB/sec       1.130 GiB/sec    35.229
             ArrayArrayKernel<Divide, UInt32Type>/1048576/0     693.181 MiB/sec     926.563 MiB/sec    33.668
              ArrayArrayKernel<Divide, Int16Type>/1048576/0     345.203 MiB/sec     454.299 MiB/sec    31.604
              ArrayArrayKernel<Divide, Int32Type>/1048576/0     686.166 MiB/sec     902.149 MiB/sec    31.477
              ArrayArrayKernel<Divide, UInt8Type>/1048576/0     178.432 MiB/sec     233.546 MiB/sec    30.888
               ArrayArrayKernel<Divide, Int8Type>/1048576/0     174.181 MiB/sec     227.417 MiB/sec    30.564
    ArrayScalarKernel<DivideChecked, UInt8Type>/1048576/100     294.850 MiB/sec     379.794 MiB/sec    28.810
                        FilterStringFilterNoNulls/1048576/3       3.229 GiB/sec       4.156 GiB/sec    28.713
              ArrayScalarKernel<Add, UInt8Type>/1048576/100       1.738 GiB/sec       2.221 GiB/sec    27.803
                ArrayScalarKernel<Add, UInt8Type>/1048576/0       1.750 GiB/sec       2.229 GiB/sec    27.325
      ArrayScalarKernel<DivideChecked, Int16Type>/1048576/0     488.198 MiB/sec     610.351 MiB/sec    25.021
             ArrayScalarKernel<Divide, Int64Type>/1048576/0     896.636 MiB/sec       1.092 GiB/sec    24.768
   ArrayScalarKernel<DivideChecked, UInt64Type>/1048576/100     906.612 MiB/sec       1.079 GiB/sec    21.887
     ArrayScalarKernel<DivideChecked, UInt64Type>/1048576/0    1008.802 MiB/sec       1.190 GiB/sec    20.807
           ArrayScalarKernel<Divide, Int64Type>/1048576/100     831.512 MiB/sec     997.169 MiB/sec    19.922
            ArrayArrayKernel<Divide, Int32Type>/1048576/100     702.645 MiB/sec     842.239 MiB/sec    19.867
                                              UniqueUInt8/4     186.220 MiB/sec     222.854 MiB/sec    19.672
                 TakeStringRandomIndicesWithNulls/1048576/0   22.605M items/sec   26.930M items/sec    19.135
                                              UniqueUInt8/3     390.698 MiB/sec     464.741 MiB/sec    18.952
                            FilterRecordBatchNoNulls/100/14       1.940 GiB/sec       2.277 GiB/sec    17.412
                            FilterRecordBatchNoNulls/100/11       1.942 GiB/sec       2.276 GiB/sec    17.215
                  NthToIndicesInt64/32768/10/min_time:1.000       1.491 GiB/sec       1.744 GiB/sec    16.962
           ArrayScalarKernel<Divide, FloatType>/1048576/100       4.005 GiB/sec       4.674 GiB/sec    16.705
            ArrayArrayKernel<Divide, Int16Type>/1048576/100     363.674 MiB/sec     422.799 MiB/sec    16.258
                             FilterRecordBatchNoNulls/50/14       2.677 GiB/sec       3.099 GiB/sec    15.755
                         FilterInt64FilterNoNulls/1048576/5       9.471 GiB/sec      10.962 GiB/sec    15.742
    ArrayScalarKernel<DivideChecked, Int32Type>/1048576/100     795.621 MiB/sec     920.398 MiB/sec    15.683
      ArrayArrayKernel<DivideChecked, Int8Type>/1048576/100     180.839 MiB/sec     209.135 MiB/sec    15.647
                             FilterRecordBatchNoNulls/50/11       2.670 GiB/sec       3.077 GiB/sec    15.266
           ArrayArrayKernel<Divide, UInt64Type>/1048576/100     815.618 MiB/sec     938.747 MiB/sec    15.096
                         FilterInt64FilterNoNulls/1048576/8       9.134 GiB/sec      10.510 GiB/sec    15.063
                  TakeStringRandomIndicesNoNulls/1048576/10   20.446M items/sec   23.519M items/sec    15.030
                             FilterRecordBatchNoNulls/100/5       1.968 GiB/sec       2.262 GiB/sec    14.965
    ArrayScalarKernel<DivideChecked, Int16Type>/1048576/100     443.189 MiB/sec     508.942 MiB/sec    14.836
                             FilterRecordBatchNoNulls/100/8       1.973 GiB/sec       2.265 GiB/sec    14.770
                              FilterRecordBatchNoNulls/50/2       2.685 GiB/sec       3.081 GiB/sec    14.742
                        FilterInt64FilterNoNulls/1048576/11       9.234 GiB/sec      10.584 GiB/sec    14.620
                        FilterInt64FilterNoNulls/1048576/14       9.234 GiB/sec      10.572 GiB/sec    14.493
                             FilterRecordBatchNoNulls/100/2       1.988 GiB/sec       2.272 GiB/sec    14.321
                TakeStringRandomIndicesNoNulls/1048576/1000   19.784M items/sec   22.521M items/sec    13.835
                              FilterRecordBatchNoNulls/50/5       2.713 GiB/sec       3.088 GiB/sec    13.803
                              FilterRecordBatchNoNulls/50/8       2.716 GiB/sec       3.087 GiB/sec    13.639
                                MinMaxKernelInt32/1048576/0      41.909 GiB/sec      47.490 GiB/sec    13.318
             ArrayArrayKernel<Divide, UInt64Type>/1048576/0     943.436 MiB/sec       1.040 GiB/sec    12.839
                   NthToIndicesInt64/32768/2/min_time:1.000       1.686 GiB/sec       1.899 GiB/sec    12.654
     ArrayArrayKernel<DivideChecked, Int16Type>/1048576/100     364.192 MiB/sec     410.070 MiB/sec    12.597
              ArrayArrayKernel<Divide, Int64Type>/1048576/0     760.960 MiB/sec     856.098 MiB/sec    12.502
                        ModeKernelWide<Int32Type>/1048576/0      59.513 MiB/sec      66.945 MiB/sec    12.487
                          FilterRecordBatchWithNulls/100/11       1.563 GiB/sec       1.755 GiB/sec    12.316
               NthToIndicesInt64/32768/10000/min_time:1.000       1.051 GiB/sec       1.179 GiB/sec    12.216
                           FilterRecordBatchWithNulls/100/5       1.574 GiB/sec       1.765 GiB/sec    12.115
                      ModeKernelWide<Int32Type>/1048576/100      59.339 MiB/sec      66.525 MiB/sec    12.110
    ArrayArrayKernel<DivideChecked, UInt64Type>/1048576/100     872.643 MiB/sec     978.159 MiB/sec    12.092
                    ModeKernelWide<Int32Type>/1048576/10000      59.369 MiB/sec      66.402 MiB/sec    11.846
                           FilterRecordBatchWithNulls/100/4       2.433 GiB/sec       2.720 GiB/sec    11.803
                           FilterRecordBatchWithNulls/100/8       1.581 GiB/sec       1.766 GiB/sec    11.726
       ArrayArrayKernel<DivideChecked, Int16Type>/1048576/0     420.097 MiB/sec     468.641 MiB/sec    11.555
                          FilterRecordBatchWithNulls/100/14       1.581 GiB/sec       1.754 GiB/sec    10.963
                       ModeKernelWide<Int32Type>/1048576/10      64.052 MiB/sec      70.965 MiB/sec    10.792
                      FilterFSLInt64FilterNoNulls/1048576/2       8.509 GiB/sec       9.396 GiB/sec    10.424
                           FilterRecordBatchWithNulls/100/2       1.588 GiB/sec       1.752 GiB/sec    10.315
// ......
// Remove tests within 10% deviation
// ......

-------------------------------------------------------------------------------------------------------------
Regressions: (69)
-------------------------------------------------------------------------------------------------------------
                                               benchmark            baseline           contender  change %
// ......
// Remove tests within 10% deviation
// ......
                     ArraySortIndicesInt64Narrow/32768/1       3.471 GiB/sec       3.120 GiB/sec   -10.114
                     FilterStringFilterNoNulls/1048576/8      16.955 GiB/sec      15.227 GiB/sec   -10.194
                  FilterStringFilterWithNulls/1048576/10       1.110 GiB/sec    1020.087 MiB/sec   -10.275
                             MinMaxKernelInt64/1048576/2       1.848 GiB/sec       1.654 GiB/sec   -10.495
                             MinMaxKernelInt16/1048576/0      49.493 GiB/sec      44.193 GiB/sec   -10.708
                   FilterStringFilterWithNulls/1048576/9     895.776 MiB/sec     798.304 MiB/sec   -10.881
                                SumKernelInt32/1048576/0      47.880 GiB/sec      42.591 GiB/sec   -11.046
                          FilterRecordBatchNoNulls/100/3       6.567 GiB/sec       5.826 GiB/sec   -11.272
                         MinMaxKernelInt64/1048576/10000      32.208 GiB/sec      28.189 GiB/sec   -12.478
                                   IsAlphaNumericUnicode     815.958 MiB/sec     713.851 MiB/sec   -12.514
                              MinMaxKernelInt8/1048576/0      48.010 GiB/sec      41.938 GiB/sec   -12.646
    ArrayArrayKernel<DivideChecked, Int64Type>/1048576/0     832.842 MiB/sec     721.932 MiB/sec   -13.317
                          FilterRecordBatchNoNulls/100/0       6.589 GiB/sec       5.628 GiB/sec   -14.597
                 FilterFSLInt64FilterWithNulls/1048576/8       5.738 GiB/sec       4.605 GiB/sec   -19.749
                 FilterFSLInt64FilterWithNulls/1048576/5       5.763 GiB/sec       4.618 GiB/sec   -19.864
                FilterFSLInt64FilterWithNulls/1048576/11       5.658 GiB/sec       4.521 GiB/sec   -20.102
                 FilterFSLInt64FilterWithNulls/1048576/2       5.859 GiB/sec       4.657 GiB/sec   -20.507
                FilterFSLInt64FilterWithNulls/1048576/14       5.515 GiB/sec       4.377 GiB/sec   -20.636
            ArrayScalarKernel<Add, Int8Type>/1048576/100       2.199 GiB/sec       1.737 GiB/sec   -21.005
              ArrayScalarKernel<Add, Int8Type>/1048576/0       2.227 GiB/sec       1.754 GiB/sec   -21.251
                    FilterStringFilterNoNulls/1048576/11      14.952 GiB/sec      11.585 GiB/sec   -22.518
                        CastDoubleToInt32Safe/1048576/10  524.464M items/sec  384.396M items/sec   -26.707
                    FilterStringFilterNoNulls/1048576/14       1.888 GiB/sec       1.381 GiB/sec   -26.848
                         CastDoubleToInt32Safe/1048576/2  524.420M items/sec  383.502M items/sec   -26.871
// Regression for MinMax 100% null case, not very useful
                            MinMaxKernelDouble/1048576/1     199.396 GiB/sec     138.520 GiB/sec   -30.530
                             MinMaxKernelInt64/1048576/1     216.727 GiB/sec     148.518 GiB/sec   -31.473
// Big regression for some filter kernels, needs further investigation
                        FilterRecordBatchWithNulls/100/7       4.803 GiB/sec       3.138 GiB/sec   -34.672
                       FilterRecordBatchWithNulls/100/13       5.190 GiB/sec       3.322 GiB/sec   -35.993
                       FilterRecordBatchWithNulls/100/10       4.961 GiB/sec       2.861 GiB/sec   -42.334
                          FilterRecordBatchNoNulls/100/6       6.517 GiB/sec       3.660 GiB/sec   -43.833
                          FilterRecordBatchNoNulls/100/1       4.680 GiB/sec       2.441 GiB/sec   -47.857
                        FilterRecordBatchWithNulls/100/3       6.429 GiB/sec       3.267 GiB/sec   -49.184
                         FilterRecordBatchNoNulls/100/12       6.543 GiB/sec       3.002 GiB/sec   -54.117
                        FilterRecordBatchWithNulls/100/1       5.236 GiB/sec       2.375 GiB/sec   -54.651
                       FilterRecordBatchWithNulls/100/12       6.436 GiB/sec       2.914 GiB/sec   -54.726
                         FilterRecordBatchNoNulls/100/13       5.180 GiB/sec       2.339 GiB/sec   -54.850

@cyb70289
Copy link
Contributor Author

MacOS CI error is due to LLVM-12 update, https://issues.apache.org/jira/browse/ARROW-12467

@cyb70289 cyb70289 marked this pull request as ready for review April 20, 2021 07:25
@cyb70289 cyb70289 requested review from bkietz, pitrou and wesm April 20, 2021 07:25
@pitrou
Copy link
Member

pitrou commented Apr 20, 2021

Here are the changes (only with abs(diff) > 10%) on AMD Zen 2, clang 10.0. They are consistent with @cyb70289 's results.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (1044)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                benchmark           baseline          contender  change %                                                                                                                                                                                                                             counters
    ArrayScalarKernel<MultiplyChecked, Int8Type>/524288/0    406.448 MiB/sec      1.884 GiB/sec   374.532                                                                 {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 570, 'null_percent': 0.0}
  ArrayScalarKernel<SubtractChecked, UInt16Type>/524288/0    849.706 MiB/sec      3.622 GiB/sec   336.471                                                              {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1197, 'null_percent': 0.0}
   ArrayScalarKernel<SubtractChecked, UInt8Type>/524288/0    426.079 MiB/sec      1.808 GiB/sec   334.443                                                                {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 605, 'null_percent': 0.0}
   ArrayScalarKernel<SubtractChecked, Int16Type>/524288/0    853.553 MiB/sec      3.545 GiB/sec   325.287                                                               {'run_name': 'ArrayScalarKernel<SubtractChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1190, 'null_percent': 0.0}
         ArrayScalarKernel<AddChecked, Int8Type>/524288/0    423.877 MiB/sec      1.760 GiB/sec   325.275                                                                      {'run_name': 'ArrayScalarKernel<AddChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 597, 'null_percent': 0.0}
        ArrayScalarKernel<AddChecked, UInt8Type>/524288/0    427.306 MiB/sec      1.752 GiB/sec   319.902                                                                     {'run_name': 'ArrayScalarKernel<AddChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 605, 'null_percent': 0.0}
        ArrayScalarKernel<AddChecked, Int16Type>/524288/0    852.901 MiB/sec      3.491 GiB/sec   319.077                                                                    {'run_name': 'ArrayScalarKernel<AddChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1202, 'null_percent': 0.0}
   ArrayScalarKernel<MultiplyChecked, Int16Type>/524288/0    828.433 MiB/sec      3.389 GiB/sec   318.862                                                               {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1136, 'null_percent': 0.0}
    ArrayScalarKernel<SubtractChecked, Int8Type>/524288/0    440.739 MiB/sec      1.784 GiB/sec   314.522                                                                 {'run_name': 'ArrayScalarKernel<SubtractChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 613, 'null_percent': 0.0}
   ArrayScalarKernel<SubtractChecked, Int32Type>/524288/0      1.697 GiB/sec      6.982 GiB/sec   311.485                                                               {'run_name': 'ArrayScalarKernel<SubtractChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2441, 'null_percent': 0.0}
   ArrayScalarKernel<MultiplyChecked, Int32Type>/524288/0      1.652 GiB/sec      6.644 GiB/sec   302.146                                                               {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2370, 'null_percent': 0.0}
        ArrayScalarKernel<AddChecked, Int64Type>/524288/0      3.289 GiB/sec     13.082 GiB/sec   297.695                                                                    {'run_name': 'ArrayScalarKernel<AddChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4740, 'null_percent': 0.0}
        ArrayScalarKernel<AddChecked, Int32Type>/524288/0      1.718 GiB/sec      6.771 GiB/sec   294.099                                                                    {'run_name': 'ArrayScalarKernel<AddChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2335, 'null_percent': 0.0}
   ArrayScalarKernel<MultiplyChecked, Int64Type>/524288/0      3.278 GiB/sec     12.797 GiB/sec   290.434                                                               {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4645, 'null_percent': 0.0}
   ArrayScalarKernel<SubtractChecked, Int64Type>/524288/0      3.453 GiB/sec     13.468 GiB/sec   290.010                                                               {'run_name': 'ArrayScalarKernel<SubtractChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4938, 'null_percent': 0.0}
       ArrayScalarKernel<AddChecked, UInt16Type>/524288/0    865.478 MiB/sec      3.282 GiB/sec   288.356                                                                   {'run_name': 'ArrayScalarKernel<AddChecked, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1212, 'null_percent': 0.0}
  ArrayScalarKernel<SubtractChecked, UInt32Type>/524288/0      1.719 GiB/sec      6.648 GiB/sec   286.825                                                              {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2467, 'null_percent': 0.0}
  ArrayScalarKernel<SubtractChecked, UInt64Type>/524288/0      3.551 GiB/sec     13.622 GiB/sec   283.562                                                              {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 5092, 'null_percent': 0.0}
       ArrayScalarKernel<AddChecked, UInt32Type>/524288/0      1.735 GiB/sec      6.603 GiB/sec   280.560                                                                   {'run_name': 'ArrayScalarKernel<AddChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2495, 'null_percent': 0.0}
       ArrayScalarKernel<AddChecked, UInt64Type>/524288/0      3.430 GiB/sec     13.048 GiB/sec   280.470                                                                   {'run_name': 'ArrayScalarKernel<AddChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4907, 'null_percent': 0.0}
  ArrayScalarKernel<MultiplyChecked, UInt32Type>/524288/0      1.608 GiB/sec      5.874 GiB/sec   265.317                                                              {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2307, 'null_percent': 0.0}
   ArrayScalarKernel<MultiplyChecked, UInt8Type>/524288/0    405.349 MiB/sec      1.375 GiB/sec   247.399                                                                {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 570, 'null_percent': 0.0}
  ArrayScalarKernel<MultiplyChecked, UInt64Type>/524288/0      3.228 GiB/sec     11.185 GiB/sec   246.529                                                              {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4657, 'null_percent': 0.0}
  ArrayScalarKernel<MultiplyChecked, UInt16Type>/524288/0    788.099 MiB/sec      2.641 GiB/sec   243.118                                                              {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1113, 'null_percent': 0.0}
  ArrayScalarKernel<MultiplyChecked, Int8Type>/524288/100    353.649 MiB/sec      1.062 GiB/sec   207.453                                                               {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 492, 'null_percent': 1.0}
 ArrayScalarKernel<SubtractChecked, Int32Type>/524288/100      1.458 GiB/sec      4.401 GiB/sec   201.765                                                             {'run_name': 'ArrayScalarKernel<SubtractChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2084, 'null_percent': 1.0}
 ArrayScalarKernel<SubtractChecked, Int16Type>/524288/100    739.167 MiB/sec      2.169 GiB/sec   200.525                                                             {'run_name': 'ArrayScalarKernel<SubtractChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1029, 'null_percent': 1.0}
 ArrayScalarKernel<SubtractChecked, UInt8Type>/524288/100    375.488 MiB/sec      1.089 GiB/sec   196.889                                                              {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 525, 'null_percent': 1.0}
       ArrayScalarKernel<AddChecked, Int8Type>/524288/100    371.736 MiB/sec      1.072 GiB/sec   195.411                                                                    {'run_name': 'ArrayScalarKernel<AddChecked, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 528, 'null_percent': 1.0}
 ArrayScalarKernel<MultiplyChecked, Int16Type>/524288/100    727.630 MiB/sec      2.051 GiB/sec   188.590                                                             {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1018, 'null_percent': 1.0}
  ArrayScalarKernel<SubtractChecked, Int8Type>/524288/100    380.429 MiB/sec      1.062 GiB/sec   185.971                                                               {'run_name': 'ArrayScalarKernel<SubtractChecked, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 536, 'null_percent': 1.0}
 ArrayScalarKernel<SubtractChecked, Int64Type>/524288/100      3.054 GiB/sec      8.629 GiB/sec   182.548                                                             {'run_name': 'ArrayScalarKernel<SubtractChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4402, 'null_percent': 1.0}
 ArrayScalarKernel<MultiplyChecked, Int64Type>/524288/100      2.947 GiB/sec      8.274 GiB/sec   180.794                                                             {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4228, 'null_percent': 1.0}
     ArrayScalarKernel<AddChecked, UInt32Type>/524288/100      1.524 GiB/sec      4.265 GiB/sec   179.848                                                                 {'run_name': 'ArrayScalarKernel<AddChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2184, 'null_percent': 1.0}
     ArrayScalarKernel<AddChecked, UInt16Type>/524288/100    762.545 MiB/sec      2.084 GiB/sec   179.826                                                                 {'run_name': 'ArrayScalarKernel<AddChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1075, 'null_percent': 1.0}
ArrayScalarKernel<SubtractChecked, UInt16Type>/524288/100    733.937 MiB/sec      1.998 GiB/sec   178.697                                                            {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1026, 'null_percent': 1.0}
 ArrayScalarKernel<MultiplyChecked, UInt8Type>/524288/100    354.426 MiB/sec    983.661 MiB/sec   177.537                                                              {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 500, 'null_percent': 1.0}
ArrayScalarKernel<SubtractChecked, UInt32Type>/524288/100      1.472 GiB/sec      4.078 GiB/sec   177.086                                                            {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2105, 'null_percent': 1.0}
 ArrayScalarKernel<MultiplyChecked, Int32Type>/524288/100      1.462 GiB/sec      4.033 GiB/sec   175.824                                                             {'run_name': 'ArrayScalarKernel<MultiplyChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2086, 'null_percent': 1.0}
      ArrayScalarKernel<AddChecked, Int32Type>/524288/100      1.473 GiB/sec      4.014 GiB/sec   172.503                                                                  {'run_name': 'ArrayScalarKernel<AddChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2105, 'null_percent': 1.0}
ArrayScalarKernel<MultiplyChecked, UInt32Type>/524288/100      1.378 GiB/sec      3.744 GiB/sec   171.786                                                            {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1980, 'null_percent': 1.0}
      ArrayScalarKernel<AddChecked, Int16Type>/524288/100    757.682 MiB/sec      2.010 GiB/sec   171.599                                                                  {'run_name': 'ArrayScalarKernel<AddChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1065, 'null_percent': 1.0}
     ArrayScalarKernel<AddChecked, UInt64Type>/524288/100      3.068 GiB/sec      8.289 GiB/sec   170.150                                                                 {'run_name': 'ArrayScalarKernel<AddChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4395, 'null_percent': 1.0}
      ArrayScalarKernel<AddChecked, UInt8Type>/524288/100    377.371 MiB/sec   1012.139 MiB/sec   168.208                                                                   {'run_name': 'ArrayScalarKernel<AddChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 530, 'null_percent': 1.0}
ArrayScalarKernel<MultiplyChecked, UInt64Type>/524288/100      2.851 GiB/sec      7.634 GiB/sec   167.782                                                            {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4124, 'null_percent': 1.0}
      ArrayScalarKernel<AddChecked, Int64Type>/524288/100      2.963 GiB/sec      7.836 GiB/sec   164.490                                                                  {'run_name': 'ArrayScalarKernel<AddChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4266, 'null_percent': 1.0}
ArrayScalarKernel<SubtractChecked, UInt64Type>/524288/100      3.103 GiB/sec      8.062 GiB/sec   159.837                                                            {'run_name': 'ArrayScalarKernel<SubtractChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 4446, 'null_percent': 1.0}
ArrayScalarKernel<MultiplyChecked, UInt16Type>/524288/100    667.309 MiB/sec      1.509 GiB/sec   131.593                                                             {'run_name': 'ArrayScalarKernel<MultiplyChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 934, 'null_percent': 1.0}
          ArrayArrayKernel<AddChecked, Int8Type>/524288/0    192.396 MiB/sec    442.457 MiB/sec   129.972                                                                       {'run_name': 'ArrayArrayKernel<AddChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 270, 'null_percent': 0.0}
         ArrayArrayKernel<AddChecked, Int64Type>/524288/0      1.636 GiB/sec      3.618 GiB/sec   121.137                                                                     {'run_name': 'ArrayArrayKernel<AddChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3771, 'null_percent': 0.0}
   ArrayArrayKernel<MultiplyChecked, UInt64Type>/524288/0      1.735 GiB/sec      3.711 GiB/sec   113.955                                                               {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3659, 'null_percent': 0.0}
    ArrayArrayKernel<MultiplyChecked, Int64Type>/524288/0      1.911 GiB/sec      3.802 GiB/sec    98.993                                                                {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2366, 'null_percent': 0.0}
    ArrayArrayKernel<MultiplyChecked, Int16Type>/524288/0    537.150 MiB/sec    938.091 MiB/sec    74.642                                                                 {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 540, 'null_percent': 0.0}
                           FilterRecordBatchNoNulls/100/4      5.088 GiB/sec      8.664 GiB/sec    70.281      {'run_name': 'FilterRecordBatchNoNulls/100/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 84, 'data null%': 0.1, 'extracted_size': 40106400.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 50.0}
                          FilterRecordBatchNoNulls/100/12      5.693 GiB/sec      9.598 GiB/sec    68.594    {'run_name': 'FilterRecordBatchNoNulls/100/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 49, 'data null%': 90.0, 'extracted_size': 79928800.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 99.9}
                           FilterRecordBatchNoNulls/100/1      5.346 GiB/sec      8.746 GiB/sec    63.590      {'run_name': 'FilterRecordBatchNoNulls/100/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 94, 'data null%': 0.0, 'extracted_size': 40106400.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 50.0}
                         FilterRecordBatchWithNulls/100/3      4.579 GiB/sec      7.480 GiB/sec    63.346    {'run_name': 'FilterRecordBatchWithNulls/100/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 50, 'data null%': 0.1, 'extracted_size': 75973600.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 99.9}
             ArrayArrayKernel<Divide, UInt8Type>/524288/0    196.040 MiB/sec    305.072 MiB/sec    55.618                                                                          {'run_name': 'ArrayArrayKernel<Divide, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 279, 'null_percent': 0.0}
    ArrayArrayKernel<MultiplyChecked, Int32Type>/524288/0      1.267 GiB/sec      1.928 GiB/sec    52.097                                                                {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1838, 'null_percent': 0.0}
                  TakeStringRandomIndicesNoNulls/524288/1 215.736M items/sec 324.406M items/sec    50.372                                                                             {'run_name': 'TakeStringRandomIndicesNoNulls/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 280, 'null_percent': 100.0}
           ArrayScalarKernel<Subtract, Int8Type>/524288/0      1.896 GiB/sec      2.835 GiB/sec    49.520                                                                       {'run_name': 'ArrayScalarKernel<Subtract, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2810, 'null_percent': 0.0}
         ArrayScalarKernel<Subtract, Int8Type>/524288/100      1.957 GiB/sec      2.866 GiB/sec    46.420                                                                     {'run_name': 'ArrayScalarKernel<Subtract, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2800, 'null_percent': 1.0}
            ArrayArrayKernel<Divide, UInt16Type>/524288/0    388.912 MiB/sec    568.989 MiB/sec    46.303                                                                         {'run_name': 'ArrayArrayKernel<Divide, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 564, 'null_percent': 0.0}
         ArrayArrayKernel<AddChecked, Int32Type>/524288/0      1.261 GiB/sec      1.843 GiB/sec    46.148                                                                     {'run_name': 'ArrayArrayKernel<AddChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1894, 'null_percent': 0.0}
     ArrayArrayKernel<MultiplyChecked, Int8Type>/524288/0    316.416 MiB/sec    460.802 MiB/sec    45.632                                                                  {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 444, 'null_percent': 0.0}
                         FilterRecordBatchWithNulls/100/0      4.489 GiB/sec      6.522 GiB/sec    45.288    {'run_name': 'FilterRecordBatchWithNulls/100/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 49, 'data null%': 0.0, 'extracted_size': 75973600.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 99.9}
                        FilterRecordBatchWithNulls/100/12      6.365 GiB/sec      9.243 GiB/sec    45.214  {'run_name': 'FilterRecordBatchWithNulls/100/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 53, 'data null%': 90.0, 'extracted_size': 75973600.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 99.9}
        ArrayArrayKernel<AddChecked, UInt32Type>/524288/0      1.279 GiB/sec      1.836 GiB/sec    43.490                                                                    {'run_name': 'ArrayArrayKernel<AddChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1888, 'null_percent': 0.0}
                       FilterStringFilterNoNulls/524288/3      6.298 GiB/sec      9.001 GiB/sec    42.928                                                {'run_name': 'FilterStringFilterNoNulls/524288/3', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 10045, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
          QuantileKernelMedianNarrow<Int64Type>/1048576/0      2.870 GiB/sec      4.098 GiB/sec    42.763                                                                      {'run_name': 'QuantileKernelMedianNarrow<Int64Type>/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2060, 'null_percent': 0.0}
              ArrayArrayKernel<Divide, Int8Type>/524288/0    201.039 MiB/sec    284.003 MiB/sec    41.267                                                                           {'run_name': 'ArrayArrayKernel<Divide, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 285, 'null_percent': 0.0}
   ArrayArrayKernel<SubtractChecked, UInt64Type>/524288/0      2.584 GiB/sec      3.649 GiB/sec    41.195                                                               {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3877, 'null_percent': 0.0}
        ArrayArrayKernel<AddChecked, UInt64Type>/524288/0      2.575 GiB/sec      3.625 GiB/sec    40.787                                                                    {'run_name': 'ArrayArrayKernel<AddChecked, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3733, 'null_percent': 0.0}
             ArrayArrayKernel<Divide, Int16Type>/524288/0    406.353 MiB/sec    567.497 MiB/sec    39.656                                                                          {'run_name': 'ArrayArrayKernel<Divide, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 805, 'null_percent': 0.0}
            ArrayArrayKernel<Divide, UInt64Type>/524288/0      1.585 GiB/sec      2.200 GiB/sec    38.821                                                                        {'run_name': 'ArrayArrayKernel<Divide, UInt64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2276, 'null_percent': 0.0}
      ArrayArrayKernel<AddChecked, UInt32Type>/524288/100      1.152 GiB/sec      1.595 GiB/sec    38.532                                                                  {'run_name': 'ArrayArrayKernel<AddChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1652, 'null_percent': 1.0}
         ArrayArrayKernel<AddChecked, Int16Type>/524288/0    637.334 MiB/sec    882.760 MiB/sec    38.508                                                                      {'run_name': 'ArrayArrayKernel<AddChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 897, 'null_percent': 0.0}
         ArrayArrayKernel<AddChecked, UInt8Type>/524288/0    320.724 MiB/sec    442.507 MiB/sec    37.971                                                                      {'run_name': 'ArrayArrayKernel<AddChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 448, 'null_percent': 0.0}
        ArrayArrayKernel<AddChecked, UInt16Type>/524288/0    638.633 MiB/sec    879.473 MiB/sec    37.712                                                                     {'run_name': 'ArrayArrayKernel<AddChecked, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 900, 'null_percent': 0.0}
 ArrayArrayKernel<MultiplyChecked, UInt64Type>/524288/100      2.339 GiB/sec      3.213 GiB/sec    37.360                                                             {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3495, 'null_percent': 1.0}
  ArrayArrayKernel<MultiplyChecked, Int32Type>/524288/100      1.212 GiB/sec      1.662 GiB/sec    37.136                                                              {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1721, 'null_percent': 1.0}
        ArrayArrayKernel<AddChecked, Int8Type>/524288/100    282.031 MiB/sec    386.574 MiB/sec    37.068                                                                     {'run_name': 'ArrayArrayKernel<AddChecked, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 400, 'null_percent': 1.0}
  ArrayArrayKernel<MultiplyChecked, Int64Type>/524288/100      2.430 GiB/sec      3.293 GiB/sec    35.496                                                              {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3568, 'null_percent': 1.0}
            ArrayArrayKernel<Divide, UInt32Type>/524288/0    838.088 MiB/sec      1.108 GiB/sec    35.356                                                                        {'run_name': 'ArrayArrayKernel<Divide, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1000, 'null_percent': 0.0}
  ArrayArrayKernel<MultiplyChecked, Int16Type>/524288/100    602.099 MiB/sec    811.312 MiB/sec    34.747                                                               {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 813, 'null_percent': 1.0}
    ArrayArrayKernel<MultiplyChecked, UInt8Type>/524288/0    196.036 MiB/sec    263.115 MiB/sec    34.218                                                                 {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 275, 'null_percent': 0.0}
 ArrayArrayKernel<SubtractChecked, UInt32Type>/524288/100      1.238 GiB/sec      1.652 GiB/sec    33.462                                                             {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1775, 'null_percent': 1.0}
  ArrayArrayKernel<SubtractChecked, Int32Type>/524288/100      1.206 GiB/sec      1.594 GiB/sec    32.114                                                              {'run_name': 'ArrayArrayKernel<SubtractChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1908, 'null_percent': 1.0}
    ArrayArrayKernel<DivideChecked, Int16Type>/524288/100    437.518 MiB/sec    573.490 MiB/sec    31.078                                                                 {'run_name': 'ArrayArrayKernel<DivideChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 679, 'null_percent': 1.0}
    ArrayArrayKernel<SubtractChecked, UInt8Type>/524288/0    203.233 MiB/sec    263.283 MiB/sec    29.547                                                                 {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 283, 'null_percent': 0.0}
 ArrayArrayKernel<MultiplyChecked, UInt32Type>/524288/100      1.252 GiB/sec      1.618 GiB/sec    29.262                                                             {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1790, 'null_percent': 1.0}
       ArrayArrayKernel<AddChecked, Int32Type>/524288/100      1.221 GiB/sec      1.574 GiB/sec    28.925                                                                   {'run_name': 'ArrayArrayKernel<AddChecked, Int32Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1730, 'null_percent': 1.0}
   ArrayArrayKernel<MultiplyChecked, Int8Type>/524288/100    313.009 MiB/sec    403.317 MiB/sec    28.851                                                                {'run_name': 'ArrayArrayKernel<MultiplyChecked, Int8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 438, 'null_percent': 1.0}
 ArrayArrayKernel<MultiplyChecked, UInt16Type>/524288/100    581.596 MiB/sec    746.956 MiB/sec    28.432                                                              {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 827, 'null_percent': 1.0}
       ArrayArrayKernel<AddChecked, Int64Type>/524288/100      2.460 GiB/sec      3.157 GiB/sec    28.318                                                                   {'run_name': 'ArrayArrayKernel<AddChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3556, 'null_percent': 1.0}
  ArrayArrayKernel<SubtractChecked, Int64Type>/524288/100      2.450 GiB/sec      3.112 GiB/sec    27.025                                                              {'run_name': 'ArrayArrayKernel<SubtractChecked, Int64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3592, 'null_percent': 1.0}
      ArrayArrayKernel<AddChecked, UInt64Type>/524288/100      2.486 GiB/sec      3.154 GiB/sec    26.871                                                                  {'run_name': 'ArrayArrayKernel<AddChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3532, 'null_percent': 1.0}
 ArrayArrayKernel<SubtractChecked, UInt16Type>/524288/100    634.568 MiB/sec    797.697 MiB/sec    25.707                                                              {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 841, 'null_percent': 1.0}
        QuantileKernelMedianNarrow<Int64Type>/1048576/100      2.627 GiB/sec      3.299 GiB/sec    25.559                                                                    {'run_name': 'QuantileKernelMedianNarrow<Int64Type>/1048576/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1883, 'null_percent': 1.0}
    ArrayArrayKernel<SubtractChecked, Int16Type>/524288/0    402.186 MiB/sec    504.329 MiB/sec    25.397                                                                 {'run_name': 'ArrayArrayKernel<SubtractChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 568, 'null_percent': 0.0}
   ArrayArrayKernel<DivideChecked, UInt16Type>/524288/100    450.096 MiB/sec    562.292 MiB/sec    24.927                                                                {'run_name': 'ArrayArrayKernel<DivideChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 644, 'null_percent': 1.0}
                                   UniqueString10bytes/12      4.336 GiB/sec      5.408 GiB/sec    24.703                                                                        {'run_name': 'UniqueString10bytes/12', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 87, 'null_percent': 99.0, 'num_unique': 100000.0}
                    ModeKernelNarrow<Int32Type>/1048576/0      1.637 GiB/sec      2.011 GiB/sec    22.845                                                                                {'run_name': 'ModeKernelNarrow<Int32Type>/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1173, 'null_percent': 0.0}
 ArrayArrayKernel<SubtractChecked, UInt64Type>/524288/100      2.672 GiB/sec      3.266 GiB/sec    22.199                                                             {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt64Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3842, 'null_percent': 1.0}
      ArrayArrayKernel<DivideChecked, Int16Type>/524288/0    489.545 MiB/sec    590.599 MiB/sec    20.642                                                                   {'run_name': 'ArrayArrayKernel<DivideChecked, Int16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 684, 'null_percent': 0.0}
                                       IsInInt8SmallSet/2    470.457 MiB/sec    567.360 MiB/sec    20.598                                                                                                                        {'run_name': 'IsInInt8SmallSet/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1314}
       ArrayArrayKernel<AddChecked, UInt8Type>/524288/100    317.058 MiB/sec    382.313 MiB/sec    20.581                                                                    {'run_name': 'ArrayArrayKernel<AddChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 445, 'null_percent': 1.0}
                                       IsInInt8SmallSet/4    531.478 MiB/sec    631.166 MiB/sec    18.757                                                                                                                        {'run_name': 'IsInInt8SmallSet/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1482}
       ArrayArrayKernel<AddChecked, Int16Type>/524288/100    651.381 MiB/sec    772.780 MiB/sec    18.637                                                                    {'run_name': 'ArrayArrayKernel<AddChecked, Int16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 909, 'null_percent': 1.0}
                         FilterRecordBatchWithNulls/100/6      4.600 GiB/sec      5.427 GiB/sec    17.984    {'run_name': 'FilterRecordBatchWithNulls/100/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 45, 'data null%': 1.0, 'extracted_size': 75973600.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 99.9}
  ArrayArrayKernel<SubtractChecked, UInt8Type>/524288/100    293.908 MiB/sec    343.493 MiB/sec    16.871                                                               {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 409, 'null_percent': 1.0}
    ArrayArrayKernel<SubtractChecked, Int64Type>/524288/0      2.686 GiB/sec      3.139 GiB/sec    16.837                                                                {'run_name': 'ArrayArrayKernel<SubtractChecked, Int64Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 3338, 'null_percent': 0.0}
                    ModeKernelNarrow<Int64Type>/1048576/0      2.994 GiB/sec      3.497 GiB/sec    16.806                                                                                {'run_name': 'ModeKernelNarrow<Int64Type>/1048576/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2146, 'null_percent': 0.0}
      ArrayArrayKernel<AddChecked, UInt16Type>/524288/100    650.588 MiB/sec    753.678 MiB/sec    15.846                                                                   {'run_name': 'ArrayArrayKernel<AddChecked, UInt16Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 904, 'null_percent': 1.0}
      ArrayArrayKernel<DivideChecked, Int32Type>/524288/0      1.003 GiB/sec      1.152 GiB/sec    14.821                                                                  {'run_name': 'ArrayArrayKernel<DivideChecked, Int32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1451, 'null_percent': 0.0}
                   FilterFSLInt64FilterWithNulls/524288/2      8.179 GiB/sec      9.332 GiB/sec    14.092                                             {'run_name': 'FilterFSLInt64FilterWithNulls/524288/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 11611, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}
                                      IsInInt16SmallSet/2    555.483 MiB/sec    633.531 MiB/sec    14.050                                                                                                                        {'run_name': 'IsInInt16SmallSet/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 777}
  ArrayArrayKernel<MultiplyChecked, UInt8Type>/524288/100    312.479 MiB/sec    352.267 MiB/sec    12.733                                                               {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 438, 'null_percent': 1.0}
                                       IsInInt8SmallSet/8    535.551 MiB/sec    603.303 MiB/sec    12.651                                                                                                                        {'run_name': 'IsInInt8SmallSet/8', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1496}
                                SumKernelDouble/1048576/2      1.762 GiB/sec      1.982 GiB/sec    12.485                                                                                           {'run_name': 'SumKernelDouble/1048576/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1268, 'null_percent': 50.0}
                     ModeKernelNarrow<Int8Type>/1048576/1    535.523 GiB/sec    602.049 GiB/sec    12.423                                                                             {'run_name': 'ModeKernelNarrow<Int8Type>/1048576/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 369515, 'null_percent': 100.0}
          ArrayScalarKernel<Multiply, UInt8Type>/524288/0      2.021 GiB/sec      2.263 GiB/sec    11.965                                                                      {'run_name': 'ArrayScalarKernel<Multiply, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2883, 'null_percent': 0.0}
                                      IsInInt16SmallSet/4    544.441 MiB/sec    608.172 MiB/sec    11.706                                                                                                                        {'run_name': 'IsInInt16SmallSet/4', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 760}
        ArrayScalarKernel<Multiply, UInt8Type>/524288/100      2.021 GiB/sec      2.253 GiB/sec    11.470                                                                    {'run_name': 'ArrayScalarKernel<Multiply, UInt8Type>/524288/100', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 2893, 'null_percent': 1.0}
                                            UniqueUInt8/2      1.056 GiB/sec      1.170 GiB/sec    10.825                                                                                    {'run_name': 'UniqueUInt8/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 190, 'null_percent': 1.0, 'num_unique': 200.0}
                             SumKernelInt32/1048576/10000     51.477 GiB/sec     56.872 GiB/sec    10.481                                                                                       {'run_name': 'SumKernelInt32/1048576/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 37082, 'null_percent': 0.01}
                             SumKernelInt16/1048576/10000     33.084 GiB/sec     36.505 GiB/sec    10.340                                                                                       {'run_name': 'SumKernelInt16/1048576/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 23877, 'null_percent': 0.01}
                                     IsInInt32SmallSet/64      1.134 GiB/sec      1.251 GiB/sec    10.337                                                                                                                       {'run_name': 'IsInInt32SmallSet/64', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 816}
              TakeFSLInt64RandomIndicesWithNulls/524288/1 108.967M items/sec 120.123M items/sec    10.238                                                                         {'run_name': 'TakeFSLInt64RandomIndicesWithNulls/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 145, 'null_percent': 100.0}

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regressions: (47)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                             benchmark           baseline          contender  change %                                                                                                                                                                                                                            counters
             TakeFSLInt64RandomIndicesNoNulls/524288/0 274.990M items/sec 246.829M items/sec   -10.241                                                                            {'run_name': 'TakeFSLInt64RandomIndicesNoNulls/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 366, 'null_percent': 0.0}
                                IndexInInt16SmallSet/2    452.872 MiB/sec    404.165 MiB/sec   -10.755                                                                                                                    {'run_name': 'IndexInInt16SmallSet/2', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 644}
                   TakeStringMonotonicIndices/524288/0 336.818M items/sec 300.204M items/sec   -10.871                                                                                  {'run_name': 'TakeStringMonotonicIndices/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 448, 'null_percent': 0.0}
                                 IsAlphaNumericUnicode   1015.210 MiB/sec    901.727 MiB/sec   -11.178                                                                                                                      {'run_name': 'IsAlphaNumericUnicode', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 45}
             TakeStringRandomIndicesWithNulls/524288/0  54.467M items/sec  48.000M items/sec   -11.873                                                                             {'run_name': 'TakeStringRandomIndicesWithNulls/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 65, 'null_percent': 0.0}
                   FilterInt64FilterWithNulls/524288/5     10.581 GiB/sec      9.314 GiB/sec   -11.971                                               {'run_name': 'FilterInt64FilterWithNulls/524288/5', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 15212, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
             TakeStringRandomIndicesWithNulls/524288/1   1.089G items/sec 954.385M items/sec   -12.332                                                                         {'run_name': 'TakeStringRandomIndicesWithNulls/524288/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1279, 'null_percent': 100.0}
ArrayArrayKernel<MultiplyChecked, UInt32Type>/524288/0      1.302 GiB/sec      1.136 GiB/sec   -12.749                                                              {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1873, 'null_percent': 0.0}
                            MinMaxKernelInt8/1048576/1     36.024 GiB/sec     30.625 GiB/sec   -14.986                                                                                       {'run_name': 'MinMaxKernelInt8/1048576/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 25401, 'null_percent': 100.0}
ArrayArrayKernel<MultiplyChecked, UInt16Type>/524288/0    611.156 MiB/sec    519.439 MiB/sec   -15.007                                                               {'run_name': 'ArrayArrayKernel<MultiplyChecked, UInt16Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 879, 'null_percent': 0.0}
                      FilterRecordBatchWithNulls/100/1      5.488 GiB/sec      4.587 GiB/sec   -16.422  {'run_name': 'FilterRecordBatchWithNulls/100/1', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 132, 'data null%': 0.0, 'extracted_size': 38082400.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 50.0}
ArrayArrayKernel<SubtractChecked, UInt32Type>/524288/0      1.353 GiB/sec      1.129 GiB/sec   -16.592                                                              {'run_name': 'ArrayArrayKernel<SubtractChecked, UInt32Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 1977, 'null_percent': 0.0}
                                             Utf8Lower    801.890 MiB/sec    664.757 MiB/sec   -17.101                                                                                                                                  {'run_name': 'Utf8Lower', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 35}
                     FilterRecordBatchWithNulls/100/13      5.360 GiB/sec      4.333 GiB/sec   -19.158 {'run_name': 'FilterRecordBatchWithNulls/100/13', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 85, 'data null%': 90.0, 'extracted_size': 38082400.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 50.0}
   ArrayArrayKernel<DivideChecked, UInt8Type>/524288/0    302.659 MiB/sec    238.884 MiB/sec   -21.072                                                                  {'run_name': 'ArrayArrayKernel<DivideChecked, UInt8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 421, 'null_percent': 0.0}
  ArrayArrayKernel<SubtractChecked, Int8Type>/524288/0    332.802 MiB/sec    261.330 MiB/sec   -21.476                                                                 {'run_name': 'ArrayArrayKernel<SubtractChecked, Int8Type>/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 467, 'null_percent': 0.0}
                      FilterRecordBatchWithNulls/100/7      6.880 GiB/sec      5.118 GiB/sec   -25.613  {'run_name': 'FilterRecordBatchWithNulls/100/7', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 100, 'data null%': 1.0, 'extracted_size': 38082400.0, 'mask null%': 5.0, 'num_cols': 100.0, 'select%': 50.0}
                        MinMaxKernelInt8/1048576/10000     24.846 GiB/sec     18.183 GiB/sec   -26.818                                                                                    {'run_name': 'MinMaxKernelInt8/1048576/10000', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 17854, 'null_percent': 0.01}
                        FilterRecordBatchNoNulls/100/6      7.682 GiB/sec      5.566 GiB/sec   -27.550     {'run_name': 'FilterRecordBatchNoNulls/100/6', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 52, 'data null%': 1.0, 'extracted_size': 79928800.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 99.9}
                        FilterRecordBatchNoNulls/100/9      9.150 GiB/sec      5.740 GiB/sec   -37.268    {'run_name': 'FilterRecordBatchNoNulls/100/9', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 64, 'data null%': 10.0, 'extracted_size': 79928800.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 99.9}
                       FilterRecordBatchNoNulls/100/10      8.753 GiB/sec      5.257 GiB/sec   -39.940  {'run_name': 'FilterRecordBatchNoNulls/100/10', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 162, 'data null%': 10.0, 'extracted_size': 40106400.0, 'mask null%': 0.0, 'num_cols': 100.0, 'select%': 50.0}
                    FilterStringFilterNoNulls/524288/0     14.141 GiB/sec      6.955 GiB/sec   -50.819                                               {'run_name': 'FilterStringFilterNoNulls/524288/0', 'repetitions': 0, 'repetition_index': 0, 'threads': 1, 'iterations': 20271, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}

@pitrou
Copy link
Member

pitrou commented Apr 20, 2021

I'm in support of this. Performance improvements notwithstanding, the main benefits IMHO are:

  • consistent error handling throughout the codebase, which significantly improves overall usability and approachibility
  • the return value of a function (Status vs. void, or Result<T> vs. T) advertises whether it can raise an error or not, which wasn't the case with the KernelContext-bound error status

@nealrichardson
Copy link
Member

Out of curiosity, why does this improve performance?

@pitrou
Copy link
Member

pitrou commented Apr 20, 2021

@nealrichardson Not sure, probably it makes things easier for the compiler.

@pitrou
Copy link
Member

pitrou commented Apr 20, 2021

@bkietz @wesm It would be nice if you could give an opinion soon. This will inevitably conflict with other PRs and it would be nice to minimize the work required when rebasing/merging.

@westonpace
Copy link
Member

@nealrichardson In the realm of wild guesses I would investigate before taking much stock in, a change like this...

https://github.com/apache/arrow/pull/10098/files#diff-3eafd7246f6a8c699f10d46e3276852fe44b6853b5517ef10396e561730c09f4L88

...changes from setting a variable on a class (that variable could then potentially be read in a lot of different places) to an out parameter (which can only be seen by the caller). By reducing the visible scope of the variable you are writing too you increase the chance the compiler decides that no one else needs to see the change and it can keep it in a register instead of writing it out to RAM somewhere.

@cyb70289
Copy link
Contributor Author

Rebased and fixed merge conflicts.

@cyb70289
Copy link
Contributor Author

MinGW32 python test timeout looks a spurious issue happens occasionally.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this cleanup, looks great!

Two minor nits:

Arrow handles errors by returning Status/Result. But in compute kernels,
errors are populated in KernelContext.status. This is not consistent,
and updating KernelContext.status is not thread safe.

This patch removes KernelContext.status and returns kernel errors as
Status/Result.
@cyb70289
Copy link
Contributor Author

RTools CI failure looks not related. @nealrichardson

@pitrou
Copy link
Member

pitrou commented Apr 27, 2021

Really a nice improvement, thank you!

@pitrou
Copy link
Member

pitrou commented Apr 27, 2021

I do think the RTools failures are unrelated (I see them on other PRs), so will merge.

@pitrou pitrou closed this in 29130ca Apr 27, 2021
@cyb70289 cyb70289 deleted the 11990-kernel-error-handling branch April 27, 2021 09:30
@nealrichardson
Copy link
Member

FTR the R failure is

  Insufficient package version (submitted: 3.0.0.9000, existing: 4.0.0)
  Version contains large components (3.0.0.9000)

This is because the 4.0.0 release just his CRAN, and apparently we have not bumped the version numbers post-release on master yet. @kszucs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants