Skip to content

Conversation

@tobixdev
Copy link
Contributor

@tobixdev tobixdev commented Dec 3, 2025

Which issue does this PR close?

Rationale for this change

Fix incorrect behavior of take kernel for FixedSizeBinary arrays.

What changes are included in this PR?

Check that the index is valid in the iterator for the result array.

Are these changes tested?

Yes, there is a new test.

Are there any user-facing changes?

Yes, the kernel behaves correctly.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 3, 2025
@tobixdev
Copy link
Contributor Author

tobixdev commented Dec 3, 2025

This might slow down the take kernel. A second approach I've thought of is that we keep the same iterator and apply the null mask at the end (only if the null count is > 0 for indices).

@alamb Could you maye be start the (EDIT: take kernel) benchmarks so we know the effect of this change? Thanks!

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmark take_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take-fsb-null-indices (10425c7) to 1c90efe diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take-fsb-null-indices
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     main                                   take-fsb-null-indices
-----                                                                     ----                                   ---------------------
take bool 1024                                                            1.00   1335.4±8.42ns        ? ?/sec    1.00  1341.1±53.88ns        ? ?/sec
take bool 512                                                             1.00   735.3±31.78ns        ? ?/sec    1.00   733.0±18.79ns        ? ?/sec
take bool null indices 1024                                               1.02  1652.4±10.54ns        ? ?/sec    1.00  1621.9±55.89ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.08µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.1±0.13µs        ? ?/sec    1.06      3.3±0.05µs        ? ?/sec
take check bounds i32 1024                                                1.00  1623.1±37.35ns        ? ?/sec    1.00  1624.2±13.15ns        ? ?/sec
take check bounds i32 512                                                 1.00    807.1±4.49ns        ? ?/sec    1.01   811.2±17.09ns        ? ?/sec
take i32 1024                                                             1.00    711.0±2.04ns        ? ?/sec    1.01   715.8±23.29ns        ? ?/sec
take i32 512                                                              1.00    389.5±5.54ns        ? ?/sec    1.12    436.2±7.66ns        ? ?/sec
take i32 null indices 1024                                                1.01   999.9±24.26ns        ? ?/sec    1.00    992.5±3.65ns        ? ?/sec
take i32 null values 1024                                                 1.00      2.0±0.01µs        ? ?/sec    1.00      2.0±0.02µs        ? ?/sec
take i32 null values null indices 1024                                    1.02      2.6±0.10µs        ? ?/sec    1.00      2.6±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      8.4±0.08µs        ? ?/sec    1.29     10.9±0.36µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.00      8.5±0.10µs        ? ?/sec    1.75     14.8±0.38µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.01     20.9±0.17µs        ? ?/sec    1.00     20.8±0.80µs        ? ?/sec
take str 1024                                                             1.00     11.0±0.06µs        ? ?/sec    1.00     10.9±0.17µs        ? ?/sec
take str 512                                                              1.00      5.3±0.02µs        ? ?/sec    1.00      5.3±0.04µs        ? ?/sec
take str null indices 1024                                                1.00      7.0±0.10µs        ? ?/sec    1.02      7.1±0.13µs        ? ?/sec
take str null indices 512                                                 1.00      3.3±0.02µs        ? ?/sec    1.02      3.4±0.01µs        ? ?/sec
take str null values 1024                                                 1.01      8.7±0.10µs        ? ?/sec    1.00      8.6±0.06µs        ? ?/sec
take str null values null indices 1024                                    1.00      6.3±0.07µs        ? ?/sec    1.02      6.4±0.14µs        ? ?/sec
take stringview 1024                                                      1.00    777.2±9.62ns        ? ?/sec    1.01    783.0±7.73ns        ? ?/sec
take stringview 512                                                       1.00    478.7±5.50ns        ? ?/sec    1.00    477.2±1.94ns        ? ?/sec
take stringview null indices 1024                                         1.00  1377.1±15.56ns        ? ?/sec    1.01  1391.7±29.53ns        ? ?/sec
take stringview null indices 512                                          1.00    695.1±3.90ns        ? ?/sec    1.02   710.9±45.82ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.06µs        ? ?/sec    1.00      2.1±0.02µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.9±0.04µs        ? ?/sec    1.01      2.9±0.02µs        ? ?/sec

@tobixdev
Copy link
Contributor Author

take primitive fsb value len: 12, indices: 1024 1.00 8.4±0.08µs ? ?/sec 1.29 10.9±0.36µs ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024 1.00 8.5±0.10µs ? ?/sec 1.75 14.8±0.38µs ? ?/sec

Thanks! I guess this confirms the performance regression. I'll give the other approach (applying nulls at the end) a shot.

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

I am always a bit suspicious about benchmarks in the usec range as they are so suseptable to noise. I'll run some more

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmark take_kernels

@apache apache deleted a comment from alamb-ghbot Dec 10, 2025
@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take-fsb-null-indices (885f31e) to 1c90efe diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take-fsb-null-indices
Results will be posted here when complete

@tobixdev
Copy link
Contributor Author

I am always a bit suspicious about benchmarks in the usec range as they are so suseptable to noise. I'll run some more

I agree but I think it's possible that the initial change introduced a performance regression. I've pushed a new version that performed better on my PC so hopefully we can see that in the benchmarks. 🤞

Nice robotic sheep for the bot btw! 🐑

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     main                                   take-fsb-null-indices
-----                                                                     ----                                   ---------------------
take bool 1024                                                            1.00   1331.7±5.48ns        ? ?/sec    1.00   1333.7±7.89ns        ? ?/sec
take bool 512                                                             1.00    731.1±4.81ns        ? ?/sec    1.00    731.8±3.22ns        ? ?/sec
take bool null indices 1024                                               1.00  1635.1±23.98ns        ? ?/sec    1.05  1709.1±36.39ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.06µs        ? ?/sec    1.00      2.6±0.02µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.1±0.02µs        ? ?/sec    1.07      3.3±0.04µs        ? ?/sec
take check bounds i32 1024                                                1.00  1619.2±13.94ns        ? ?/sec    1.00   1625.9±6.57ns        ? ?/sec
take check bounds i32 512                                                 1.00    805.6±2.15ns        ? ?/sec    1.11    894.2±9.02ns        ? ?/sec
take i32 1024                                                             1.00    710.7±2.03ns        ? ?/sec    1.02    725.6±3.23ns        ? ?/sec
take i32 512                                                              1.00    389.3±6.96ns        ? ?/sec    1.11   432.2±22.62ns        ? ?/sec
take i32 null indices 1024                                                1.00  1005.7±88.58ns        ? ?/sec    1.00   1002.6±6.38ns        ? ?/sec
take i32 null values 1024                                                 1.00      2.0±0.01µs        ? ?/sec    1.00      2.0±0.02µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.6±0.01µs        ? ?/sec    1.04      2.7±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           2.31      8.4±0.14µs        ? ?/sec    1.00      3.6±0.04µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.66      8.3±0.24µs        ? ?/sec    1.00      5.0±0.03µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.01     20.8±0.31µs        ? ?/sec    1.00     20.6±0.79µs        ? ?/sec
take str 1024                                                             1.00     11.0±0.16µs        ? ?/sec    1.00     11.0±0.23µs        ? ?/sec
take str 512                                                              1.02      5.4±0.14µs        ? ?/sec    1.00      5.3±0.06µs        ? ?/sec
take str null indices 1024                                                1.00      6.9±0.08µs        ? ?/sec    1.01      6.9±0.09µs        ? ?/sec
take str null indices 512                                                 1.00      3.4±0.11µs        ? ?/sec    1.00      3.4±0.04µs        ? ?/sec
take str null values 1024                                                 1.00      8.7±0.16µs        ? ?/sec    1.00      8.7±0.42µs        ? ?/sec
take str null values null indices 1024                                    1.00      6.5±0.04µs        ? ?/sec    1.00      6.5±0.04µs        ? ?/sec
take stringview 1024                                                      1.00   782.4±31.32ns        ? ?/sec    1.00    779.3±2.93ns        ? ?/sec
take stringview 512                                                       1.12    548.4±1.35ns        ? ?/sec    1.00   488.7±14.56ns        ? ?/sec
take stringview null indices 1024                                         1.00  1386.1±41.76ns        ? ?/sec    1.02  1413.2±18.39ns        ? ?/sec
take stringview null indices 512                                          1.00    676.0±3.94ns        ? ?/sec    1.04    706.2±8.41ns        ? ?/sec
take stringview null values 1024                                          1.00      2.0±0.02µs        ? ?/sec    1.01      2.1±0.06µs        ? ?/sec
take stringview null values null indices 1024                             1.01      2.9±0.02µs        ? ?/sec    1.00      2.8±0.05µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

Looking nice now! Let's run it again to make sure it is reproduceable

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmark take_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take-fsb-null-indices (885f31e) to 1c90efe diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take-fsb-null-indices
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     main                                    take-fsb-null-indices
-----                                                                     ----                                    ---------------------
take bool 1024                                                            1.01  1353.0±129.03ns        ? ?/sec    1.00  1336.0±10.41ns        ? ?/sec
take bool 512                                                             1.00    730.4±1.73ns        ? ?/sec     1.01   735.8±17.52ns        ? ?/sec
take bool null indices 1024                                               1.00   1624.0±8.54ns        ? ?/sec     1.05  1701.1±12.56ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.01µs        ? ?/sec     1.00      2.6±0.08µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.1±0.04µs        ? ?/sec     1.03      3.2±0.08µs        ? ?/sec
take check bounds i32 1024                                                1.02  1648.2±45.69ns        ? ?/sec     1.00  1619.3±16.88ns        ? ?/sec
take check bounds i32 512                                                 1.00    824.1±1.97ns        ? ?/sec     1.08   893.2±27.21ns        ? ?/sec
take i32 1024                                                             1.00   723.2±17.24ns        ? ?/sec     1.00    719.8±4.72ns        ? ?/sec
take i32 512                                                              1.00    399.9±3.01ns        ? ?/sec     1.06    425.3±2.10ns        ? ?/sec
take i32 null indices 1024                                                1.00   998.9±24.13ns        ? ?/sec     1.00   997.0±15.89ns        ? ?/sec
take i32 null values 1024                                                 1.01      2.1±0.04µs        ? ?/sec     1.00      2.0±0.04µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.6±0.12µs        ? ?/sec     1.02      2.7±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           2.31      8.4±0.15µs        ? ?/sec     1.00      3.6±0.05µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.66      8.3±0.33µs        ? ?/sec     1.00      5.0±0.02µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.5±0.13µs        ? ?/sec     1.01     20.6±0.30µs        ? ?/sec
take str 1024                                                             1.01     11.1±0.09µs        ? ?/sec     1.00     11.0±0.05µs        ? ?/sec
take str 512                                                              1.00      5.3±0.03µs        ? ?/sec     1.02      5.4±0.10µs        ? ?/sec
take str null indices 1024                                                1.01      6.8±0.07µs        ? ?/sec     1.00      6.8±0.15µs        ? ?/sec
take str null indices 512                                                 1.00      3.3±0.03µs        ? ?/sec     1.03      3.4±0.02µs        ? ?/sec
take str null values 1024                                                 1.01      8.7±0.23µs        ? ?/sec     1.00      8.7±0.10µs        ? ?/sec
take str null values null indices 1024                                    1.01      6.4±0.07µs        ? ?/sec     1.00      6.4±0.05µs        ? ?/sec
take stringview 1024                                                      1.08    847.7±4.66ns        ? ?/sec     1.00    782.0±9.77ns        ? ?/sec
take stringview 512                                                       1.14    546.4±0.82ns        ? ?/sec     1.00   477.9±16.14ns        ? ?/sec
take stringview null indices 1024                                         1.01  1398.4±24.26ns        ? ?/sec     1.00  1381.4±19.69ns        ? ?/sec
take stringview null indices 512                                          1.00    671.7±3.08ns        ? ?/sec     1.03    693.4±8.12ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.03µs        ? ?/sec     1.01      2.1±0.01µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.9±0.06µs        ? ?/sec     1.00      2.9±0.04µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- more correct and faster -- thank you @tobixdev 🚀

@alamb alamb closed this Dec 10, 2025
@alamb alamb force-pushed the take-fsb-null-indices branch from 2004e60 to c3226a4 Compare December 10, 2025 22:48
@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

I am sorry -- I accidentally pushed to this branch. I will fix shortly

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

vv

@alamb alamb mentioned this pull request Dec 10, 2025
@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

I am so sorry @tobixdev . I messed up this PR by accident. I did save the code in a new PR: #8981

I think you can revive the PR if you want to, by going to this commit: 885f31e

here is the command I used

git reset --hard 885f31e26ec

If you push that branch again, I think we'll be able to reopen this PR

Alternately, we can just go with #8981

@tobixdev
Copy link
Contributor Author

Hahaha no worries. We can just go with the new one if we can find another reviewer.

alamb added a commit that referenced this pull request Dec 11, 2025
# Which issue does this PR close?

- redo of #8948 from @tobixdev 
- closes #8947

# Rationale for this change

I messed up @tobixdev 's PR / branch by accident in
#8948

This PR contains the previous commits that were in that PR

# What changes are included in this PR?

- See #8948
# Are these changes tested?

Yes, by CI

# Are there any user-facing changes?

Less bugs, faster code

---------

Co-authored-by: Tobias Schwarzinger <tobias.schwarzinger@tuwien.ac.at>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

take_fixed_size_binary Does Not Consider NULL Indices

3 participants