Skip to content

Conversation

@drin
Copy link
Contributor

@drin drin commented Jun 30, 2022

We would like to expose hashing functions via new compute functions. This PR would add a scalar compute function, FastHash32, which uses Hashing32::HashMultiColumn to compute a 32-bit hash for each row of the input array (or scalar).

This PR focuses on 32-bit hashing, but I would like to use this PR to help decide if a different design is preferable before adding 64-bit hashing. I also am not sure what should be in the unit test except for code coverage.

Potential design changes:

  • Using an Options argument.
  • Using an init function for the kernel (but I don't know the input length until run time).
  • Changing how hashing is used.

Confusion about unit tests:

  • I'm not sure how to validate the hash outputs are "as expected". Further, unit tests for the hashing functions don't seem to validate hash outputs.
  • I'm not sure validating the output data types is necessary.
  • Code coverage of the various input types seems to be the most important, but I need help to figure out the best way to approach this.

Edit (8/9/22):

This PR has been open awhile but seems to have hit a good level of functionality. This PR does not:

  • address whether the hash functions are "acceptable" statistically
  • comprehensively benchmark, or test, the various data types

I think these can be addressed in future improvements:

@github-actions
Copy link

@drin drin changed the title ARROW-8991: [C++] Add new scalar compute function for hashing ARROW-16945: [C++] Add new scalar compute function for 32-bit hashing Jun 30, 2022
@westonpace
Copy link
Member

I'm not sure how to validate the hash outputs are "as expected". Further, unit tests for the hashing functions don't seem to validate hash outputs.

For a hashing function I would expect:

  • If two values are equal then their hashes are equal
  • Given a random selection of non-equal values there should be some kind of expected false positive rate (e.g. equal hashes on unequal values). Ideally we would include, as part of this, a benchmark that measures the FPR on random values. You could then take then, pick a safe threshold (e.g. if the benchmark tends to show a 5% FPR then pick 10%) and put that into the unit test (e.g. assert the FPR is less than the safe threshold).

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@westonpace
Copy link
Member

@save-buffer would you mind reviewing this when you get a chance?

@drin
Copy link
Contributor Author

drin commented Jun 30, 2022

I renamed this PR for ARROW-16945, so that I can track just a 32-bit version for now. Not sure if this will confuse some of the automated issue tracking/associations

@pitrou
Copy link
Member

pitrou commented Jul 19, 2022

  • Given a random selection of non-equal values

I'd add that non-random but likely selections should also show a nice hash distribution. Including:

  • ranges of consecutive [0, N] numbers
  • clustered strings (for example a subset of /usr/share/dict/words, or something generated that looks like that)

Also, ideally, the bottom K bits of the hash should also keep those nice statistical properties (so that a hash table implementation can simply use a power-of-two modulo).

The implementations from arrow/util/hashing.h try to satisfy those requirements, btw, so you could reuse them.

@pitrou
Copy link
Member

pitrou commented Jul 19, 2022

Why 32-bit rather than 64-bit?

@drin
Copy link
Contributor Author

drin commented Jul 19, 2022

Why 32-bit rather than 64-bit?

I think Weston mentioned 32-bit in passing and I just made a semi-arbitrary decision to start there. I plan on following this up with a 64-bit version, though not sure if other things should be addressed before I do so (memory management, etc.).

@westonpace
Copy link
Member

I'd add that non-random but likely selections should also show a nice hash distribution. Including:
The implementations from arrow/util/hashing.h try to satisfy those requirements, btw, so you could reuse them.

It would be great, by the way, to have benchmarks of the two hashing utilities we have. I believe the utilities we have in key_hash.h trade off some distribution performance in favor of runtime performance when compared with the utilities in hashing.h. It would be nice to have some objective measures of this tradeoff.

Also, can we expand in the function doc that these hashes are not suitable for cryptographic purposes?

@pitrou
Copy link
Member

pitrou commented Jul 19, 2022

There's https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/hashing_benchmark.cc at least

@drin
Copy link
Contributor Author

drin commented Jul 19, 2022

Thanks @pitrou , I'll start with that as a base and add benchmarks that use the utilities from key_hash.h (I'll see if these also exist elsewhere).

I recently was exposed to hashing.h when looking at the count_distinct function, so I should be able to compare them. Maybe I'll skeleton another compute function, StandardHash (arbitrary, temporary name to distinguish from FastHash), and we can use it to better consider how many hash functions we should have vs what should be knobs available via an Options class.

@drin drin force-pushed the ARROW-8991-newfn-scalar-hash branch from 6af47cc to 7787bca Compare July 20, 2022 03:27
@drin
Copy link
Contributor Author

drin commented Jul 20, 2022

rebased

@drin
Copy link
Contributor Author

drin commented Jul 20, 2022

I wanted to update with initial changes to hashing_benchmark.cc. To make sure comparisons covered at least a few cases, I decided to add FastHash64, although it is currently very copy pasta.

Barring code style/organization improvements, I was hoping to get some feedback on whether these benchmarks are fundamentally what we would like to see. Some questions I have:

  • how do I ensure I am not including the time to construct an arrow::Array from a vector in the benchmark time?
  • do we want to time calling the function, or just the mechanisms that the compute function wraps?
    • do we want to do both to also capture any overhead in compute kernel invocation?

@drin
Copy link
Contributor Author

drin commented Jul 20, 2022

Also, I see now that util/hashing.h wraps xxHash, vendored from Cyan4973's github repo. Hashing32 has the following comment above it:

// Implementations are based on xxh3 32-bit algorithm description from:
// https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md

I need to experiment a bit, but I suspect that for int primitives, Hashing32 and Hashing64 just wrap xxHash and add logic for iterating over rows and columns. Specifically, key_hash.cc#L745 looks a lot like hashing.h#L88. hashing_benchmark.cc#L70 benchmarks using 2 multipliers, and I think if it uses just the 1 multiplier then the implementation looks a lot like Hashing64.

All this to say that I think it should be accommodated in the benchmark, but I also wonder if some of the hashing code can be simplified/better organized in case they aren't as independent as they seem at first glance.

cc @michalursa

@westonpace
Copy link
Member

Here is my understanding, which may not be complete:

util/hashing.h is a more-or-less direct adapter between Arrow and xxhash.

key_hash.cc is based on xxhash but most of the algorithms have been reimplemented to some degree. The key_hash.cc utilities are, in theory, better suited to take advantage of columnar formats and/or vectorized CPUs. However, as part of this rewrite, certain sacrifices were made in hash quality, in favor of performance. This is primarily (only?) used today in Acero for the hash-join and hash-aggregate.

In theory, util/hashing.cc should have a better distribution but worse performance than key_hash.cc.

@save-buffer
Copy link
Contributor

Weston is correct, the key_hash interface is only meant for internal use of hash tables. It subtly doesn't match xxh in some cases (iirc it for example can pad with 0's instead of having special handling for arbitrary lengths). The interface is also optimized for hashing in mini batches and reusing as much memory as possible. In general I would suggest using util/hashing instead of key_hash, as key_hash is for internal use; we may want to change it at any time (e.g. we had an idea to see if we can make a really crappy hash that's very fast to compute but is "good enough" for a hash table).

I can leave a review of the current code, but I'd suggest moving away from the key_hash interface overall.

Copy link
Contributor

@save-buffer save-buffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest feedback is to not allocate a large temporary std::vector and instead write directly into the buffer you'll be returning. Also you made your TempVectorStack way too big.

@drin
Copy link
Contributor Author

drin commented Jul 20, 2022

Biggest feedback is to not allocate a large temporary std::vector and instead write directly into the buffer you'll be returning. Also you made your TempVectorStack way too big.

thanks! I had troubles with smaller sizes and tried to not size it too large, but I'll revisit.

For writing directly into the buffer to return, I assume that means I need to create an ArrayData? is there a better interface via Builders?

@drin
Copy link
Contributor Author

drin commented Jul 20, 2022

never mind, you actually mentioned how to do it already. the mobile interface hid it

@pitrou
Copy link
Member

pitrou commented Jul 21, 2022

util/hashing.h is a more-or-less direct adapter between Arrow and xxhash.

That's how it ended up working in practice for non-tiny strings, but the basic objective is to have some fast though statistically good hash functions.

@ianmcook
Copy link
Member

ianmcook commented Jul 21, 2022

@drin could you please add entries for these functions in the Compute Functions docs (cpp/compute.rst)? Thanks!

@drin
Copy link
Contributor Author

drin commented Jul 22, 2022

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 22, 2022

Benchmark runs are scheduled for baseline = 4d931ff and contender = 3c1fd3b03fa2b143d582244ca1bb93fbc0c84bcf. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.69% ⬆️0.17%] test-mac-arm
[Skipped ⚠️ Only ['JavaScript', 'Python', 'R'] langs are supported on ursa-i9-9960x] ursa-i9-9960x
[Finished ⬇️0.46% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3c1fd3b0 test-mac-arm
[Finished] 3c1fd3b0 ursa-thinkcentre-m75q
[Failed] 4d931ff1 ec2-t3-xlarge-us-east-2
[Failed] 4d931ff1 test-mac-arm
[Failed] 4d931ff1 ursa-i9-9960x
[Finished] 4d931ff1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@drin
Copy link
Contributor Author

drin commented Jul 22, 2022

before figuring out the conbench stuff, I put some of the local benchmark results in a google slide:

https://docs.google.com/presentation/d/1cUU_F3jB6LsOLbClhl34YdQiodtbz7l76l3juHTsC5k/edit#slide=id.g13e9d117f47_0_63

Just wanted to put this in as a sneak peek even though the test size is small (10,000 items) and this is only for a couple primitives.

@drin
Copy link
Contributor Author

drin commented Jul 22, 2022

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 22, 2022

Commit 3c1fd3b03fa2b143d582244ca1bb93fbc0c84bcf already has scheduled benchmark runs.

Cerdore and others added 12 commits January 25, 2024 17:11
…ark (apache#39794)

### Rationale for this change

`ScanOnlyBench` use 2 kind of options to scan, but ignore the function `MakeScanNode` will return 4 extra columns.

### What changes are included in this PR?

Change the expected result of bench `ScanOnlyBench`

### Are these changes tested?

They covered by existing tests.
* Closes: apache#39765

Authored-by: xiansen.chen <xiansen.chen@openpie.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

Fix a conversion warning that fails compiling arrow-dataset-file-benchmark on MSVC.

### Are these changes tested?

Yes, by CI.

### Are there any user-facing changes?

No.

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ey hash avx2 (apache#39800)

### Rationale for this change

Issue apache#39778 seems caused by a careless (but hard to spot) bug in key hash avx2.

### What changes are included in this PR?

Fix the careless bug.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

No.

* Closes: apache#39778

Authored-by: Ruoxi Sun <zanmato1984@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…to int32 (apache#39528)

Be defensive instead of writing invalid data.

### Rationale for this change

Users can provide this API pages that are large to validly write and we silently truncate lengths before writing.

### What changes are included in this PR?

Add validations and throw an exception if sizes are too large (this was previously checked only if page indexes are being build).

### Are these changes tested?

Unit tested

### Are there any user-facing changes?

This might start raising exceptions instead of writing out invalid parquet files.

Closes apache#39527 

**This PR contains a "Critical Fix".** 
* Closes: apache#39527

Lead-authored-by: emkornfield <emkornfield@gmail.com>
Co-authored-by: Micah Kornfield <micahk@google.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
…roto for building arrays (apache#39721)

This PR updates the ArrowReaderHelper to use an ArrowField object for building an Array instead of a protobuf field obj.  This removes leveraging protobuf from building out the Arrays and makes the code easier to reuse (like for the C Data Interface)
* Closes: apache#39720

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…lve() (apache#39817)

### Rationale for this change

There has been interest in improving operations on chunked-arrays and even though `ChunkResolver::Resolve()` is not a big contributor in most kernels, the fact that it can be used from tight loops warrants careful attention to branch prediction and memory effects of its implementation.

### What changes are included in this PR?

 - Documentation of invariants and behavior of functions
 - Multiple optimizations justified by microbenchmarks
 - Addition of a variation of `Resolve` that takes a hint as parameter
 - Fix of an out-of-bounds memory access that doesn't affect correctness (it can only reduce effectiveness of cache in very rare situations, but is nevertheless an issue)

### Are these changes tested?

Yes, by existing tests.

### Are there any user-facing changes?

 - The `arrow::internal::ChunkResolver::Bisect()` function was `protected` and is now `private` with a different signature
* Closes: apache#39815

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
…ntervals (apache#39795)

### Rationale for this change

The filter and take functions were not correctly supported on month_day_nano intervals.

### What changes are included in this PR?

* Expand the primitive filter implementation to handle all possible fixed-width primitive types (including fixed-size binary)
* Expand the take filter implementation to handle all well-known fixed-width primitive types (including month_day_nano, decimal128 and decimal256)
* Add benchmarks for taking and filtering fixed-size binary

These changes allow for very significant performance improvements filtering and taking fixed-size binary data:
```
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Non-regressions: (90)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                              benchmark           baseline          contender  change %                                                                                                                                                                                                                                                                   counters
          FilterFixedSizeBinaryFilterNoNulls/524288/0/8      1.716 GiB/sec     33.814 GiB/sec  1870.862      {'family_index': 0, 'per_family_instance_index': 0, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2462, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/8 380.056M items/sec   7.098G items/sec  1767.491                                {'family_index': 3, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 505, 'byte_width': 8.0, 'null_percent': 100.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/0/9      1.916 GiB/sec     33.721 GiB/sec  1659.766      {'family_index': 0, 'per_family_instance_index': 1, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2750, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/9/8    917.713 MiB/sec      9.193 GiB/sec   925.719    {'family_index': 0, 'per_family_instance_index': 18, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/9/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1271, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
         FilterFixedSizeBinaryFilterNoNulls/524288/12/8      1.004 GiB/sec      9.374 GiB/sec   833.673   {'family_index': 0, 'per_family_instance_index': 24, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/12/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1440, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/3/8      1.625 GiB/sec     15.009 GiB/sec   823.442      {'family_index': 0, 'per_family_instance_index': 6, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/3/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2328, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/9/9   1021.638 MiB/sec      9.126 GiB/sec   814.670    {'family_index': 0, 'per_family_instance_index': 19, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/9/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1428, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/6/8      1.235 GiB/sec     10.814 GiB/sec   775.869     {'family_index': 0, 'per_family_instance_index': 12, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/6/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1762, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
         FilterFixedSizeBinaryFilterNoNulls/524288/12/9      1.123 GiB/sec      9.120 GiB/sec   712.196   {'family_index': 0, 'per_family_instance_index': 25, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/12/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1598, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/6/9      1.370 GiB/sec     10.499 GiB/sec   666.348     {'family_index': 0, 'per_family_instance_index': 13, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/6/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1958, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/3/9      1.814 GiB/sec     13.394 GiB/sec   638.343      {'family_index': 0, 'per_family_instance_index': 7, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/3/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 2600, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/2/8     12.155 GiB/sec     77.799 GiB/sec   540.051      {'family_index': 0, 'per_family_instance_index': 4, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17222, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/2/9     13.507 GiB/sec     84.361 GiB/sec   524.592      {'family_index': 0, 'per_family_instance_index': 5, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 19469, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 1.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/1/8 194.493M items/sec 732.378M items/sec   276.557                                      {'family_index': 4, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 259, 'byte_width': 8.0, 'null_percent': 100.0}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/8 200.981M items/sec 747.628M items/sec   271.989                                  {'family_index': 2, 'per_family_instance_index': 6, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 268, 'byte_width': 8.0, 'null_percent': 100.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/0/8    947.631 MiB/sec      3.318 GiB/sec   258.565    {'family_index': 1, 'per_family_instance_index': 0, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1329, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 99.9}
        FilterFixedSizeBinaryFilterWithNulls/524288/3/8    911.406 MiB/sec      3.121 GiB/sec   250.677    {'family_index': 1, 'per_family_instance_index': 6, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/3/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1275, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 99.9}
          FilterFixedSizeBinaryFilterNoNulls/524288/1/8      1.045 GiB/sec      3.535 GiB/sec   238.406      {'family_index': 0, 'per_family_instance_index': 2, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1496, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/6/8    899.161 MiB/sec      2.915 GiB/sec   232.029   {'family_index': 1, 'per_family_instance_index': 12, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/6/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1260, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 99.9}
        FilterFixedSizeBinaryFilterWithNulls/524288/9/8    829.852 MiB/sec      2.617 GiB/sec   222.914  {'family_index': 1, 'per_family_instance_index': 18, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/9/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1157, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 99.9}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/8 234.268M items/sec 752.809M items/sec   221.345                                    {'family_index': 2, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 312, 'byte_width': 8.0, 'null_percent': 0.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/1/9      1.171 GiB/sec      3.711 GiB/sec   216.957      {'family_index': 0, 'per_family_instance_index': 3, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1674, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 0.0, 'select%': 50.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/0/8 249.393M items/sec 787.274M items/sec   215.676                                        {'family_index': 4, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 333, 'byte_width': 8.0, 'null_percent': 0.0}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/8 234.268M items/sec 736.727M items/sec   214.481                                  {'family_index': 3, 'per_family_instance_index': 8, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 313, 'byte_width': 8.0, 'null_percent': 0.0}
      TakeFixedSizeBinaryMonotonicIndices/524288/1000/8 134.852M items/sec 423.748M items/sec   214.231                                     {'family_index': 4, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 202, 'byte_width': 8.0, 'null_percent': 0.1}
       FilterFixedSizeBinaryFilterWithNulls/524288/12/8    913.734 MiB/sec      2.599 GiB/sec   191.245 {'family_index': 1, 'per_family_instance_index': 24, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/12/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1292, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 99.9}
  TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/8 138.218M items/sec 309.307M items/sec   123.783                                 {'family_index': 2, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 184, 'byte_width': 8.0, 'null_percent': 0.1}
TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/8 132.755M items/sec 293.027M items/sec   120.727                               {'family_index': 3, 'per_family_instance_index': 0, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 179, 'byte_width': 8.0, 'null_percent': 0.1}
    TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/8 125.492M items/sec 272.996M items/sec   117.540                                  {'family_index': 2, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 174, 'byte_width': 8.0, 'null_percent': 10.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/9/9    926.938 MiB/sec      1.904 GiB/sec   110.379  {'family_index': 1, 'per_family_instance_index': 19, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/9/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1295, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 99.9}
        TakeFixedSizeBinaryMonotonicIndices/524288/10/8 158.754M items/sec 331.106M items/sec   108.565                                      {'family_index': 4, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 167, 'byte_width': 8.0, 'null_percent': 10.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/0/9      1.031 GiB/sec      2.129 GiB/sec   106.621    {'family_index': 1, 'per_family_instance_index': 1, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1477, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 99.9}
        FilterFixedSizeBinaryFilterWithNulls/524288/3/9   1020.776 MiB/sec      2.056 GiB/sec   106.293    {'family_index': 1, 'per_family_instance_index': 7, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/3/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1430, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 99.9}
        FilterFixedSizeBinaryFilterWithNulls/524288/4/8    890.785 MiB/sec      1.768 GiB/sec   103.293    {'family_index': 1, 'per_family_instance_index': 8, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/4/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1242, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/6/9   1005.839 MiB/sec      1.984 GiB/sec   102.023   {'family_index': 1, 'per_family_instance_index': 13, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/6/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1407, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 99.9}
        FilterFixedSizeBinaryFilterWithNulls/524288/1/8    916.810 MiB/sec      1.762 GiB/sec    96.757    {'family_index': 1, 'per_family_instance_index': 2, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/1/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1270, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/7/8    890.211 MiB/sec      1.694 GiB/sec    94.853   {'family_index': 1, 'per_family_instance_index': 14, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/7/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1235, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 50.0}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/8  95.788M items/sec 184.004M items/sec    92.095                                   {'family_index': 2, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 124, 'byte_width': 8.0, 'null_percent': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/10/8    862.497 MiB/sec      1.616 GiB/sec    91.823 {'family_index': 1, 'per_family_instance_index': 20, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1200, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/12/9      1.005 GiB/sec      1.904 GiB/sec    89.431 {'family_index': 1, 'per_family_instance_index': 25, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/12/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1442, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 99.9}
         TakeFixedSizeBinaryMonotonicIndices/524288/2/8 123.065M items/sec 228.755M items/sec    85.881                                       {'family_index': 4, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164, 'byte_width': 8.0, 'null_percent': 50.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/10/8    930.637 MiB/sec      1.669 GiB/sec    83.659   {'family_index': 0, 'per_family_instance_index': 20, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1293, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/4/8      1.034 GiB/sec      1.871 GiB/sec    81.019      {'family_index': 0, 'per_family_instance_index': 8, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/4/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1482, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/7/8   1004.789 MiB/sec      1.772 GiB/sec    80.538     {'family_index': 0, 'per_family_instance_index': 14, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/7/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1404, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/13/8    920.819 MiB/sec      1.616 GiB/sec    79.686 {'family_index': 1, 'per_family_instance_index': 26, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/13/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1285, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 50.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/13/8    974.713 MiB/sec      1.669 GiB/sec    75.388   {'family_index': 0, 'per_family_instance_index': 26, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/13/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1363, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
  TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/8 107.165M items/sec 187.372M items/sec    74.845                                {'family_index': 3, 'per_family_instance_index': 2, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 143, 'byte_width': 8.0, 'null_percent': 10.0}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/8  72.662M items/sec 114.781M items/sec    57.965                                  {'family_index': 3, 'per_family_instance_index': 4, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 96, 'byte_width': 8.0, 'null_percent': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/10/9    976.180 MiB/sec      1.480 GiB/sec    55.260 {'family_index': 1, 'per_family_instance_index': 21, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1358, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 50.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/10/9      1.023 GiB/sec      1.581 GiB/sec    54.502   {'family_index': 0, 'per_family_instance_index': 21, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1466, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/4/9    992.477 MiB/sec      1.453 GiB/sec    49.957    {'family_index': 1, 'per_family_instance_index': 9, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/4/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1400, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/7/9    997.679 MiB/sec      1.450 GiB/sec    48.846   {'family_index': 1, 'per_family_instance_index': 15, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/7/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1389, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 50.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/13/9      1.071 GiB/sec      1.581 GiB/sec    47.526   {'family_index': 0, 'per_family_instance_index': 27, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/13/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1538, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/13/9      1.008 GiB/sec      1.485 GiB/sec    47.328 {'family_index': 1, 'per_family_instance_index': 27, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/13/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1446, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 50.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/1/9      1.003 GiB/sec      1.452 GiB/sec    44.708    {'family_index': 1, 'per_family_instance_index': 3, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1437, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 50.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/7/9      1.105 GiB/sec      1.568 GiB/sec    41.954     {'family_index': 0, 'per_family_instance_index': 15, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/7/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1587, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 50.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/4/9      1.163 GiB/sec      1.613 GiB/sec    38.639      {'family_index': 0, 'per_family_instance_index': 9, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/4/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1662, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 50.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/14/9      8.884 GiB/sec     12.117 GiB/sec    36.381 {'family_index': 1, 'per_family_instance_index': 29, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/14/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12508, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 1.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/11/9      8.886 GiB/sec     12.075 GiB/sec    35.892 {'family_index': 1, 'per_family_instance_index': 23, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/11/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12716, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 1.0}
      TakeFixedSizeBinaryMonotonicIndices/524288/1000/9 134.765M items/sec 182.868M items/sec    35.694                                     {'family_index': 4, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 206, 'byte_width': 9.0, 'null_percent': 0.1}
          FilterFixedSizeBinaryFilterNoNulls/524288/5/8     11.393 GiB/sec     15.091 GiB/sec    32.453     {'family_index': 0, 'per_family_instance_index': 10, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/5/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 16510, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/8/8     11.573 GiB/sec     15.102 GiB/sec    30.496     {'family_index': 0, 'per_family_instance_index': 16, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/8/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 16684, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/11/8      7.740 GiB/sec     10.059 GiB/sec    29.956 {'family_index': 1, 'per_family_instance_index': 22, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/11/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10972, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 5.0, 'select%': 1.0}
       FilterFixedSizeBinaryFilterWithNulls/524288/14/8      7.733 GiB/sec      9.915 GiB/sec    28.213 {'family_index': 1, 'per_family_instance_index': 28, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/14/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10991, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 5.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/5/8      7.682 GiB/sec      9.765 GiB/sec    27.109   {'family_index': 1, 'per_family_instance_index': 10, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/5/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 10991, 'byte_width': 8.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/8/9      8.856 GiB/sec     11.180 GiB/sec    26.241   {'family_index': 1, 'per_family_instance_index': 17, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/8/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12571, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/8/8      7.735 GiB/sec      9.710 GiB/sec    25.530   {'family_index': 1, 'per_family_instance_index': 16, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/8/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11069, 'byte_width': 8.0, 'data null%': 1.0, 'mask null%': 5.0, 'select%': 1.0}
        TakeFixedSizeBinaryMonotonicIndices/524288/10/9 128.606M items/sec 160.249M items/sec    24.604                                      {'family_index': 4, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 209, 'byte_width': 9.0, 'null_percent': 10.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/11/8     12.033 GiB/sec     14.737 GiB/sec    22.478   {'family_index': 0, 'per_family_instance_index': 22, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/11/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17220, 'byte_width': 8.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/14/8     12.141 GiB/sec     14.761 GiB/sec    21.579   {'family_index': 0, 'per_family_instance_index': 28, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/14/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 17343, 'byte_width': 8.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/5/9      8.825 GiB/sec     10.633 GiB/sec    20.489   {'family_index': 1, 'per_family_instance_index': 11, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/5/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 12543, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 5.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/2/8      8.300 GiB/sec      9.969 GiB/sec    20.117    {'family_index': 1, 'per_family_instance_index': 4, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/2/8', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11819, 'byte_width': 8.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/5/9     12.954 GiB/sec     15.192 GiB/sec    17.273     {'family_index': 0, 'per_family_instance_index': 11, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/5/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 18572, 'byte_width': 9.0, 'data null%': 0.1, 'mask null%': 0.0, 'select%': 1.0}
          FilterFixedSizeBinaryFilterNoNulls/524288/8/9     13.181 GiB/sec     15.222 GiB/sec    15.490     {'family_index': 0, 'per_family_instance_index': 17, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/8/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 18904, 'byte_width': 9.0, 'data null%': 1.0, 'mask null%': 0.0, 'select%': 1.0}
        FilterFixedSizeBinaryFilterWithNulls/524288/2/9      9.344 GiB/sec     10.632 GiB/sec    13.784    {'family_index': 1, 'per_family_instance_index': 5, 'run_name': 'FilterFixedSizeBinaryFilterWithNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13291, 'byte_width': 9.0, 'data null%': 0.0, 'mask null%': 5.0, 'select%': 1.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/11/9     13.566 GiB/sec     14.894 GiB/sec     9.789   {'family_index': 0, 'per_family_instance_index': 23, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/11/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 19349, 'byte_width': 9.0, 'data null%': 10.0, 'mask null%': 0.0, 'select%': 1.0}
         FilterFixedSizeBinaryFilterNoNulls/524288/14/9     13.603 GiB/sec     14.863 GiB/sec     9.265   {'family_index': 0, 'per_family_instance_index': 29, 'run_name': 'FilterFixedSizeBinaryFilterNoNulls/524288/14/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 19490, 'byte_width': 9.0, 'data null%': 90.0, 'mask null%': 0.0, 'select%': 1.0}
    TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/9 124.390M items/sec 133.566M items/sec     7.377                                  {'family_index': 2, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164, 'byte_width': 9.0, 'null_percent': 10.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/2/9 116.792M items/sec 124.182M items/sec     6.328                                       {'family_index': 4, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 161, 'byte_width': 9.0, 'null_percent': 50.0}
  TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/9 135.860M items/sec 142.524M items/sec     4.905                                 {'family_index': 2, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 180, 'byte_width': 9.0, 'null_percent': 0.1}
TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/9 131.123M items/sec 137.400M items/sec     4.788                               {'family_index': 3, 'per_family_instance_index': 1, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1000/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 176, 'byte_width': 9.0, 'null_percent': 0.1}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/9 220.634M items/sec 230.872M items/sec     4.640                                    {'family_index': 2, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 295, 'byte_width': 9.0, 'null_percent': 0.0}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/9  97.425M items/sec 101.477M items/sec     4.159                                   {'family_index': 2, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 130, 'byte_width': 9.0, 'null_percent': 50.0}
  TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/9 104.830M items/sec 108.346M items/sec     3.354                                {'family_index': 3, 'per_family_instance_index': 3, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/10/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 100, 'byte_width': 9.0, 'null_percent': 10.0}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/9 378.858M items/sec 387.322M items/sec     2.234                                {'family_index': 3, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 506, 'byte_width': 9.0, 'null_percent': 100.0}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/9 221.900M items/sec 226.450M items/sec     2.050                                  {'family_index': 3, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 295, 'byte_width': 9.0, 'null_percent': 0.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/0/9 248.664M items/sec 253.037M items/sec     1.758                                        {'family_index': 4, 'per_family_instance_index': 9, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/0/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 332, 'byte_width': 9.0, 'null_percent': 0.0}
     TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/9 197.730M items/sec 201.173M items/sec     1.741                                  {'family_index': 2, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryRandomIndicesNoNulls/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 264, 'byte_width': 9.0, 'null_percent': 100.0}
   TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/9  73.196M items/sec  74.167M items/sec     1.327                                  {'family_index': 3, 'per_family_instance_index': 5, 'run_name': 'TakeFixedSizeBinaryRandomIndicesWithNulls/524288/2/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 96, 'byte_width': 9.0, 'null_percent': 50.0}
         TakeFixedSizeBinaryMonotonicIndices/524288/1/9 192.545M items/sec 188.138M items/sec    -2.289                                      {'family_index': 4, 'per_family_instance_index': 7, 'run_name': 'TakeFixedSizeBinaryMonotonicIndices/524288/1/9', 'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 257, 'byte_width': 9.0, 'null_percent': 100.0}

```

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* Closes: apache#39740

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
This commit ports the latest state of scalar_hash kernels without
pulling a long development history.

This kernel is an element-wise function that uses an xxHash-like
algorithm, prioritizing speed and not suitable for cryptographic
purposes. The function is implemented by the `FastHashScalar` struct
which is templated by the output type (which is assumed to be either
UInt32 or UInt64, but there is no validation of that at the moment).

The benchmarks in scalar_hash_benchmark.cc uses the hashing_benchmark.cc
file as a reference (in cpp/src/arrow/util/), but only covers various
input types and the key hashing functions (from key_hash.h).

The tests in scalar_hash_test.cc use a simplified version of hashing
based on what is implemented in the key_hash.cc. The idea being that the
high-level entry points for high-level types should eventually reach an
expected application of the low-level hash functions on simple data
types; the tests do this exact comparison. At the moment, the tests pass
for simple cases, but they do not work for nested types with non-trivial
row layouts (e.g. ListTypes).

Issue: ARROW-8991
Issue: apacheGH-17211
This commit pulls the latest changes to key_hash.h and implementations
in light_array without the burden of a long development history.

The only change in key_hash.h is the addition of a friend function which
is used in scalar_hash_test.cc.

Changes in light_array.[h,cc] are to accommodate two scenarios: (1) the
use of ArraySpan, which was introduced after light_array was written;
and (2) the need for a KeyColumnArray to allocate data for the purposes
of interpreting (or decoding) the structure of a nested type.

The main reason for the 2nd scenario is that a ListArray may have many
values represented in a single row which should be hashed together;
however, if the ListArray has a nested ListArray or other type, the row
may have further structure. In the simplest interpretation, only the
highest-level structure (the "outer" ListArray) needs to be preserved,
and any further nested structures must be explicitly handled by custom
kernels (or any future options, etc. that are upstreamed into Arrow).

In trying to efficiently interpret complex nested types, ArraySpan can
be useful because it is non-owning, thus the main reason for the 1st
aforementioned scenario.

Although unfinished, any tests added to light_array_test.cc should
accommodate the 2 scenarios above.
This commit includes changes to register a new compute function without
the burden of a long development history.

The change to cpp/src/arrow/CMakeLists.txt includes scalar_hash.cc in
compilation as it is used by the new Hash64 function defined in
api_scalar.[h,cc].

The change to cpp/src/arrow/compute/kernels/CMakeLists.txt includes
scalar_hash_test.cc in compilation for tests and it also adds a new
benchmark binary that is implemented by scalar_hash_benchmark.cc.

The registry files are updated to register the kernel implementations in
scalar_hash.cc with the function definitions in api_scalar.[h,cc].

Finally, docs/source/cpp/compute.rst adds documentation for the Hash64
function.

Issue: apacheGH-17211
Issue: ARROW-8991
This commit includes additions to the general hashing benchmarks that
cover the use of hashing functions in key_hash.h without carrying the
burden of a long dev history.

Some existing benchmark names were changed to distinguish between the
use of Int32 and Int64 types, new benchmarks were added that use the
functions declared in key_hash.h. The reason the new benchmarks are
added is because it is claimed they prioritize speed over cryptography
as they're primarily used for join algorithms and other processing
tasks, which the hashing benchmark can now provide observability for.

Issue: apacheGH-17211
Issue: ARROW-8991
… ARROW-8991-newfn-scalar-hash-fresh

using a merge commit to overwrite commit history while maintaining
current state of code
@drin
Copy link
Contributor Author

drin commented Jan 29, 2024

alright, I attempted to force change of commit history by using
git merge -s ort -Xtheirs ARROW-8991-newfn-scalar-hash-fresh

While this seems to be the "correct way", clearly it makes the PR look gratuitous. I'm going to close this one, and refer to it from a new PR. I'll try to update everything as appropriate.

@drin drin closed this Jan 29, 2024
@drin drin changed the title GH-17211: [C++] Add hash_64 scalar compute function GH-17211: [C++] Add hash_64 scalar compute function (old) Jan 29, 2024
@github-actions
Copy link

⚠️ GitHub issue #17211 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Compute] Add scalar_hash function