-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: add optimized zip implementation for scalars #8653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is useful for `IF <expr> THEN <scalar> ELSE <scalar> END` TODO: - [ ] Need to add comments if missing - [ ] Add benchmark
| let scalars: Vec<T::Native> = predicate | ||
| .iter() | ||
| .map(|b| if b { then_val } else { else_val }) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably use conditional move
arrow-select/src/zip.rs
Outdated
| fn combine_nulls_and_false(predicate: &BooleanArray) -> BooleanBuffer { | ||
| if let Some(nulls) = predicate.nulls().filter(|n| n.null_count() > 0) { | ||
| predicate.values().bitand( | ||
| // nulls are represented as 0 (false) in the values buffer | ||
| nulls.inner(), | ||
| ) | ||
| } else { | ||
| predicate.values().clone() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure there is already a helper function in arrow for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is prep_null_mask_filter: https://github.com/apache/arrow-rs/blob/a0bbe7faaad6303355c5e9461f91a177e267861f/arrow-select/src/filter.rs#L122-L121
# Which issue does this PR close? N/A # Rationale for this change I have a PR to improve zip perf for scalar but I don't see any benchmarks for it: - #8653 # What changes are included in this PR? created zip benchmarks for scalar and non scalar with different masks # Are these changes tested? N/A # Are there any user-facing changes? Nope
|
@alamb If you wanna run the benchmarks for zip, there are no more optimization left for this PR, only cleanups, tests and comments I saw for scalars major improvements while in array and scalar regression for some reason (maybe the extra check? even though it is a simple comparison. I run it on bare metal to reduce noise as much as possible) I tests it on: $ neofetch
.-/+oossssoo+/-. ubuntu@ip-
`:+ssssssssssssssssss+:` -----------------------
-+ssssssssssssssssssyyssss+- OS: Ubuntu 24.04.3 LTS x86_64
.ossssssssssssssssssdMMMNysssso. Host: c5.metal 1.0
/ssssssssssshdmmNNmmyNMMMMhssssss/ Kernel: 6.14.0-1011-aws
+ssssssssshmydMMMMMMMNddddyssssssss+ Uptime: 3 hours, 46 mins
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ Packages: 921 (dpkg), 5 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Shell: bash 5.2.21
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Terminal: /dev/pts/0
ossyNMMMNyMMhsssssssssssssshmmmhssssssso CPU: Intel Xeon Platinum 8275CL (96) @ 3.900GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Memory: 2144MiB / 193025MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
+sssssssssdmydMMMMMMMMddddyssssssss+
/ssssssssssshdmNNNNmyNMMMMhssssss/
.ossssssssssssssssssdMMMNysssso.
-+sssssssssssssssssyyyssss+-
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
|
this will be used in: - apache#8653
# Conflicts: # arrow-buffer/src/buffer/mutable.rs
…e-zip-for-scalars
|
I am struggling to find enough contiguous focus time to review these PRs. They are on my radar, I just can't review them as fast as I want to Hopefully other people will be able to help review too |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- this is (also) great 🚀
I think the only thing really needed is additional test coverage for the fallback impl and the special cases in BytesScalarImpl
Adding an implementation for ByteView types (Utf8View and BinaryView) will likely also improve performance a lot, but we can file a follow on ticket to track doing so -- this is better than what is currently on main
| /// - either Datum is not a scalar (or has more than 1 element) | ||
| /// | ||
| pub fn try_new(truthy: &dyn Datum, falsy: &dyn Datum) -> Result<Self, ArrowError> { | ||
| let (truthy, truthy_is_scalar) = truthy.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could potentially avoid the redundant call to truthy.get() and falsy.get() by returning Result<Option<Self>, ArrowError> (returning None if either argument was non scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it complicate things for no real benefit since Datum.get is really cheap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit is that it might be easier to see that the code uses Datum::get consistently.
I agree that Datum.get is cheap and it is a value judgement about if the change is a net improvement.
No changes are needed from my perspective, I just wanted to mention it
arrow-select/src/zip.rs
Outdated
| fn combine_nulls_and_false(predicate: &BooleanArray) -> BooleanBuffer { | ||
| if let Some(nulls) = predicate.nulls().filter(|n| n.null_count() > 0) { | ||
| predicate.values().bitand( | ||
| // nulls are represented as 0 (false) in the values buffer | ||
| nulls.inner(), | ||
| ) | ||
| } else { | ||
| predicate.values().clone() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is prep_null_mask_filter: https://github.com/apache/arrow-rs/blob/a0bbe7faaad6303355c5e9461f91a177e267861f/arrow-select/src/filter.rs#L122-L121
|
|
||
| let zip_impl = downcast_primitive! { | ||
| truthy.data_type() => (primitive_size_helper), | ||
| DataType::Utf8 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A natural extension of this work would be to add a special case for Datatype::Utf8View and DataType::BinaryView (as a follow on PR)
That would likely be super fast for many cases as it could simply copy views around and pre-compute the value buffer.
I'll file a follow on ticket
|
|
||
| let true_repeat_count = end - start; | ||
| // fill with truthy values | ||
| mutable.repeat_slice_n_times(truthy_val, true_repeat_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having to copy the same iterator so many times is quite unfortunate and is what Utf8View is designed to avoid -- you can have a single copy of the string and then copy them around
This is explained in blog form here if you are not familiar with them: https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, but if scalars are tend to be short (less than 12 bytes) than it won't be faster and possibly even slower due to the indirections as the bytes are inlined anyway, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think Utf8View is less efficient for some short string usecases.
In any event I think the caller can decide what they want to use
- I filed Implement special case
zipwith scalar for Utf8View #8724 to track
arrow-select/src/zip.rs
Outdated
| } | ||
|
|
||
| fn get_bytes_and_offset_for_all_same_value( | ||
| predicate: &BooleanBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it a little confusing at first that predicate was passed in but only its length is used. Maybe passing in the len would make it clearer that the callsite doesn't need to negate the predicate as is needed in get_scalar_and_null_buffer_for_single_non_nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
arrow-select/src/zip.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl<T: ByteArrayType> BytesScalarImpl<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this is in its own impl block (not in the same as above?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, updated
| fn create_output(&self, input: &BooleanArray) -> Result<ArrayRef, ArrowError>; | ||
| } | ||
|
|
||
| #[derive(Debug, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full report:
report.zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, you found a bug
| } | ||
| } | ||
|
|
||
| fn get_scalar_and_null_buffer_for_single_non_nullable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
While adding more tests I noticed there is a bug with nulls not treated as false in mask for array, creating a fix now: |
|
@alamb I've updated the code and added a lot of tests, this PR is ready for review |
…he underlying bit value (#8711) # Which issue does this PR close? - closes #8721 # Rationale for this change mask `nulls` should be treated as `false` (even if the underlying values are not 0) as described in the docs for zip # What changes are included in this PR? used `prep_null_mask_filter` before iterating over the mask, added tests for both scalar and non scalar (to prepare for #8653) # Are these changes tested? Yes # Are there any user-facing changes? Kinda
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- this is looking great
|
It looks like there are some conflicts with this PR -- once they are addressed we can merge it in. |
|
🤖 |
# Conflicts: # arrow-select/src/filter.rs # arrow-select/src/zip.rs
|
@alamb conflicts resolved |
|
🤖: Benchmark completed Details
|
|
Good, we are faster by up to 200 times |
|
Thanks @rluvaton |
…8963) # Which issue does this PR close? - Closes #8724 # Rationale for this change It's explained in the issue. # What changes are included in this PR? This adds a special implementation for Utf8View/BinaryView scalars for zip based on the design from #8653. It also includes tests. Benchmarks are available here: - #8988 # Are these changes tested? Yes. # Are there any user-facing changes? There is a new struct `ByteViewScalarImpl`. <details close> <summary>Benchmarks</summary> System: Apple M1 Max with 10 cores on macOS 26.1 ``` group branch main ----- ------ ---- zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.5±0.04µs ? ?/sec 37.06 128.9±1.36µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.5±0.07µs ? ?/sec 35.76 125.1±1.76µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.7±0.12µs ? ?/sec 36.91 136.8±2.17µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.5±0.06µs ? ?/sec 40.30 139.9±2.11µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.6±0.10µs ? ?/sec 30.57 108.5±2.62µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.5±0.05µs ? ?/sec 28.40 99.8±2.12µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/all_false 1.00 3.5±0.02µs ? ?/sec 36.04 127.4±3.14µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/all_true 1.00 3.5±0.08µs ? ?/sec 27.39 97.1±1.11µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/10pct_true 1.00 28.2±0.37µs ? ?/sec 2.70 75.9±0.61µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/1pct_true 1.00 7.2±0.24µs ? ?/sec 9.89 71.4±12.56µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/50pct_nulls 1.00 51.0±2.97µs ? ?/sec 1.75 89.4±2.50µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/50pct_true 1.00 62.1±1.00µs ? ?/sec 1.61 99.7±4.68µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/90pct_true 1.00 28.8±0.64µs ? ?/sec 2.63 75.7±1.22µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/99pct_true 1.00 7.7±0.11µs ? ?/sec 8.98 69.0±0.74µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/all_false 1.00 3.7±0.13µs ? ?/sec 19.06 69.8±1.55µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/all_true 1.00 3.6±0.10µs ? ?/sec 18.90 68.0±1.12µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/10pct_true 1.00 3.8±0.07µs ? ?/sec 28.85 108.4±3.09µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/1pct_true 1.00 3.8±0.09µs ? ?/sec 25.83 98.7±2.71µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/50pct_nulls 1.00 3.9±0.06µs ? ?/sec 32.25 127.3±7.41µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/50pct_true 1.00 3.7±0.06µs ? ?/sec 37.66 139.5±3.00µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/90pct_true 1.00 3.8±0.16µs ? ?/sec 34.52 129.5±1.53µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/99pct_true 1.00 3.7±0.05µs ? ?/sec 33.83 124.8±1.28µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/all_false 1.00 3.8±0.09µs ? ?/sec 26.08 98.8±2.02µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/all_true 1.00 3.8±0.08µs ? ?/sec 32.56 123.9±1.48µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.6±0.06µs ? ?/sec 36.09 129.8±6.06µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.6±0.35µs ? ?/sec 34.05 122.9±5.06µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.7±0.12µs ? ?/sec 36.77 137.9±5.49µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.6±0.09µs ? ?/sec 38.23 137.4±3.35µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.6±0.06µs ? ?/sec 29.20 104.8±1.64µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.6±0.15µs ? ?/sec 26.94 96.9±2.73µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/all_false 1.00 3.6±0.05µs ? ?/sec 34.97 127.5±5.81µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/all_true 1.00 3.8±1.05µs ? ?/sec 24.98 95.0±2.14µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/10pct_true 1.00 28.9±0.46µs ? ?/sec 2.69 77.7±1.57µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/1pct_true 1.00 7.3±0.09µs ? ?/sec 9.81 71.6±1.96µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/50pct_nulls 1.00 50.3±1.16µs ? ?/sec 1.74 87.7±1.14µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/50pct_true 1.00 63.5±1.44µs ? ?/sec 1.59 100.7±1.97µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/90pct_true 1.00 29.8±0.48µs ? ?/sec 2.64 78.6±2.85µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/99pct_true 1.00 8.2±0.12µs ? ?/sec 8.54 69.7±0.91µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/all_false 1.00 3.8±0.07µs ? ?/sec 18.77 71.6±1.51µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/all_true 1.00 3.8±0.11µs ? ?/sec 18.31 68.8±1.10µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/10pct_true 1.00 3.8±0.07µs ? ?/sec 27.36 104.3±1.35µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/1pct_true 1.00 3.8±0.07µs ? ?/sec 24.86 94.8±1.12µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/50pct_nulls 1.00 4.0±0.04µs ? ?/sec 29.84 117.9±1.34µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/50pct_true 1.00 3.9±0.21µs ? ?/sec 35.19 137.1±3.87µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/90pct_true 1.00 3.8±0.06µs ? ?/sec 32.78 125.8±1.73µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/99pct_true 1.00 3.8±0.11µs ? ?/sec 31.87 121.5±1.47µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/all_false 1.00 3.8±0.07µs ? ?/sec 25.36 95.5±1.89µs ? ?/sec zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/all_true 1.00 3.9±0.20µs ? ?/sec 30.83 121.7±3.36µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.7±0.73µs ? ?/sec 35.72 132.2±6.77µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.6±0.04µs ? ?/sec 35.35 125.8±2.79µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.8±0.11µs ? ?/sec 36.05 136.0±2.59µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.6±0.13µs ? ?/sec 39.36 142.5±6.32µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.6±0.11µs ? ?/sec 29.63 107.5±2.03µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.6±0.08µs ? ?/sec 28.40 102.2±6.74µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/all_false 1.00 3.6±0.05µs ? ?/sec 34.83 126.0±2.12µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/all_true 1.00 3.6±0.05µs ? ?/sec 27.38 98.6±1.62µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/10pct_true 1.00 29.9±2.79µs ? ?/sec 2.51 75.1±0.98µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/1pct_true 1.00 7.2±0.16µs ? ?/sec 9.48 68.3±1.01µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/50pct_nulls 1.00 50.5±1.90µs ? ?/sec 1.68 84.6±1.27µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/50pct_true 1.00 64.4±0.60µs ? ?/sec 1.53 98.6±1.71µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/90pct_true 1.00 29.7±0.61µs ? ?/sec 2.57 76.1±1.15µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/99pct_true 1.00 7.9±0.09µs ? ?/sec 8.89 70.5±2.13µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/all_false 1.00 3.7±0.06µs ? ?/sec 18.31 67.8±0.86µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/all_true 1.00 3.7±0.06µs ? ?/sec 18.35 67.9±1.16µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/10pct_true 1.00 3.8±0.12µs ? ?/sec 28.20 107.5±2.55µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/1pct_true 1.00 3.9±0.16µs ? ?/sec 25.73 99.5±2.19µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/50pct_nulls 1.00 4.1±0.14µs ? ?/sec 29.98 122.2±2.27µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/50pct_true 1.00 3.8±0.08µs ? ?/sec 37.05 140.1±2.01µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/90pct_true 1.00 3.9±0.20µs ? ?/sec 33.52 131.8±3.10µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/99pct_true 1.00 3.8±0.09µs ? ?/sec 33.55 127.6±3.56µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/all_false 1.00 3.8±0.08µs ? ?/sec 26.47 100.8±5.55µs ? ?/sec zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/all_true 1.00 3.9±0.06µs ? ?/sec 32.05 124.6±2.16µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.6±0.40µs ? ?/sec 35.16 126.4±1.92µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.5±0.07µs ? ?/sec 35.43 123.6±4.98µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.7±0.06µs ? ?/sec 36.06 132.4±1.80µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.6±0.06µs ? ?/sec 38.44 136.9±2.82µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.5±0.04µs ? ?/sec 29.82 105.2±2.25µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.5±0.08µs ? ?/sec 27.48 96.9±1.69µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/all_false 1.00 3.6±0.12µs ? ?/sec 33.80 123.0±2.52µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/all_true 1.00 3.6±0.14µs ? ?/sec 26.74 95.0±1.74µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/10pct_true 1.00 27.9±0.32µs ? ?/sec 2.65 73.9±1.31µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/1pct_true 1.00 6.9±0.09µs ? ?/sec 9.64 67.0±0.92µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/50pct_nulls 1.00 49.0±0.60µs ? ?/sec 1.73 84.7±2.45µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/50pct_true 1.00 62.4±2.22µs ? ?/sec 1.56 97.1±2.37µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/90pct_true 1.00 28.7±0.37µs ? ?/sec 2.59 74.1±1.17µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/99pct_true 1.00 7.8±0.20µs ? ?/sec 8.69 67.7±1.34µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/all_false 1.00 3.6±0.09µs ? ?/sec 18.78 68.2±2.16µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/all_true 1.00 3.6±0.05µs ? ?/sec 19.10 68.4±11.77µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/10pct_true 1.00 3.8±0.21µs ? ?/sec 27.30 104.1±1.34µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/1pct_true 1.00 3.7±0.04µs ? ?/sec 25.76 95.8±2.00µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/50pct_nulls 1.00 4.2±0.96µs ? ?/sec 28.05 118.0±1.17µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/50pct_true 1.00 3.9±0.13µs ? ?/sec 35.42 136.6±3.78µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/90pct_true 1.00 3.8±0.10µs ? ?/sec 33.31 125.5±1.89µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/99pct_true 1.00 3.8±0.04µs ? ?/sec 32.36 121.6±1.80µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/all_false 1.00 3.7±0.04µs ? ?/sec 25.64 95.1±0.98µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/all_true 1.00 3.9±0.07µs ? ?/sec 31.19 121.2±2.69µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.5±0.04µs ? ?/sec 35.69 126.5±2.89µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.6±0.05µs ? ?/sec 33.84 120.9±1.68µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.7±0.10µs ? ?/sec 35.72 133.2±3.49µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.6±0.12µs ? ?/sec 38.28 136.0±2.11µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.5±0.06µs ? ?/sec 29.81 104.4±1.56µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.5±0.08µs ? ?/sec 27.69 98.1±2.86µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/all_false 1.00 3.6±0.10µs ? ?/sec 33.58 122.3±1.77µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/all_true 1.00 3.5±0.08µs ? ?/sec 26.79 94.7±1.02µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/10pct_true 1.00 29.0±0.51µs ? ?/sec 2.59 75.1±1.08µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/1pct_true 1.00 7.4±0.10µs ? ?/sec 9.41 69.2±1.76µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/50pct_nulls 1.00 50.2±0.54µs ? ?/sec 1.70 85.2±1.17µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/50pct_true 1.00 64.1±1.59µs ? ?/sec 1.51 96.9±1.22µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/90pct_true 1.00 29.8±0.36µs ? ?/sec 2.55 75.9±2.47µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/99pct_true 1.00 8.2±0.17µs ? ?/sec 8.24 67.8±1.11µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/all_false 1.00 3.8±0.07µs ? ?/sec 17.96 68.8±1.15µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/all_true 1.00 3.8±0.12µs ? ?/sec 17.37 66.1±0.97µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/10pct_true 1.00 3.8±0.27µs ? ?/sec 27.57 105.2±3.06µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/1pct_true 1.00 3.7±0.08µs ? ?/sec 25.44 94.8±0.94µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/50pct_nulls 1.00 3.9±0.07µs ? ?/sec 30.10 118.6±2.83µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/50pct_true 1.00 3.9±0.30µs ? ?/sec 35.20 135.6±1.67µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/90pct_true 1.00 3.9±0.55µs ? ?/sec 32.58 125.9±2.14µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/99pct_true 1.00 3.8±0.36µs ? ?/sec 32.47 122.9±4.15µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/all_false 1.00 3.8±0.10µs ? ?/sec 25.24 94.9±0.97µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/all_true 1.00 3.8±0.09µs ? ?/sec 31.58 120.3±1.65µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/10pct_true 1.00 3.5±0.04µs ? ?/sec 37.39 131.4±4.74µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/1pct_true 1.00 3.5±0.09µs ? ?/sec 35.84 126.8±3.56µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/50pct_nulls 1.00 3.7±0.06µs ? ?/sec 37.15 137.8±3.16µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/50pct_true 1.00 3.5±0.06µs ? ?/sec 39.19 138.9±4.82µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/90pct_true 1.00 3.6±0.04µs ? ?/sec 30.30 107.9±5.71µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/99pct_true 1.00 3.6±0.05µs ? ?/sec 27.33 97.7±2.10µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/all_false 1.00 3.6±0.06µs ? ?/sec 34.64 124.7±2.24µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/all_true 1.00 3.7±0.19µs ? ?/sec 26.17 96.9±1.75µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/10pct_true 1.00 28.7±0.55µs ? ?/sec 2.66 76.2±1.45µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/1pct_true 1.00 7.2±0.12µs ? ?/sec 9.58 69.0±0.80µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/50pct_nulls 1.00 49.5±1.15µs ? ?/sec 1.75 86.8±2.09µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/50pct_true 1.00 62.6±0.88µs ? ?/sec 1.65 103.4±16.82µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/90pct_true 1.00 29.1±0.49µs ? ?/sec 2.69 78.3±2.51µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/99pct_true 1.00 7.8±0.09µs ? ?/sec 9.01 70.2±1.72µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/all_false 1.00 3.7±0.06µs ? ?/sec 18.77 68.7±0.73µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/all_true 1.00 3.6±0.10µs ? ?/sec 18.73 68.2±1.44µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/10pct_true 1.00 3.9±0.11µs ? ?/sec 27.68 106.9±2.29µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/1pct_true 1.00 3.9±0.19µs ? ?/sec 26.12 101.9±8.79µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/50pct_nulls 1.00 4.1±0.07µs ? ?/sec 29.91 122.7±3.28µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/50pct_true 1.00 3.8±0.14µs ? ?/sec 36.82 141.4±3.69µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/90pct_true 1.00 3.8±0.10µs ? ?/sec 34.15 131.4±2.99µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/99pct_true 1.00 3.8±0.06µs ? ?/sec 32.89 125.2±3.21µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/all_false 1.00 3.8±0.06µs ? ?/sec 26.05 99.2±2.30µs ? ?/sec zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/all_true 1.00 4.0±0.33µs ? ?/sec 32.00 126.7±25.05µs ? ?/sec ``` </details>


Waiting for the PRs below to be merged first:
zipkernel benchmarks #8654 - zip benchmarksThis PR include the following other PRs (unless merged) to make the review easier, so please make sure to review them first
repeat_slice_n_timestoMutableBuffer#8658 - extracted from thisWhich issue does this PR close?
N/A
Rationale for this change
Making zip really fast for scalars
This is useful for
IF <expr> THEN <literal> ELSE <literal> ENDWhat changes are included in this PR?
Created couple of implementation for zipping scalar, for primitive, bytes and fallback
Are these changes tested?
existing tests
Are there any user-facing changes?
new struct
ScalarZipperTODO: