bench: add dedicated Utf8View benchmarks for InList #19211

geoffreyclaude · 2025-12-08T13:43:26Z

Which issue does this PR close?

N/A - benchmark improvement

Stacked on top of #19204

Rationale for this change

We need to measure InList performance on both StringArray (Utf8) and StringViewArray (Utf8View) to compare Arrow's string representations.

What changes are included in this PR?

Add Utf8View benchmarks for InList, refactored with generics to make adding new array types trivial.

Are these changes tested?

Benchmark-only change.

Are there any user-facing changes?

No.

adriangb · 2025-12-08T17:40:03Z

Thank you! I merged #19204, could you please resolve conflicts?

Measure InList evaluation on both StringArray (Utf8) and StringViewArray (Utf8View) to compare performance across Arrow's string representations. Refactored with generics to make adding new array types trivial.

geoffreyclaude · 2025-12-08T19:19:40Z

Thank you! I merged #19204, could you please resolve conflicts?

Rebased on main

alamb

Looks good to me -- thank you @geoffreyclaude

I ran the tests locally and 👍

Example output

Gnuplot not found, using plotters backend
in_list/Utf8/list=3/nulls=0%/str=3
                        time:   [5.8309 µs 5.8479 µs 5.8681 µs]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

in_list/Utf8/list=3/nulls=0%/str=12
                        time:   [5.5662 µs 5.5731 µs 5.5819 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

in_list/Utf8/list=3/nulls=0%/str=100
                        time:   [8.8208 µs 8.8323 µs 8.8469 µs]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

in_list/Utf8/list=3/nulls=20%/str=3
                        time:   [5.7071 µs 5.7169 µs 5.7277 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

in_list/Utf8/list=3/nulls=20%/str=12
                        time:   [5.0895 µs 5.1067 µs 5.1240 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

in_list/Utf8/list=3/nulls=20%/str=100
                        time:   [8.6880 µs 8.7066 µs 8.7277 µs]
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

in_list/Utf8/list=8/nulls=0%/str=3
                        time:   [5.9622 µs 5.9709 µs 5.9815 µs]
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe

in_list/Utf8/list=8/nulls=0%/str=12
                        time:   [5.7414 µs 5.7488 µs 5.7579 µs]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

in_list/Utf8/list=8/nulls=0%/str=100
                        time:   [8.8534 µs 8.8659 µs 8.8822 µs]
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

in_list/Utf8/list=8/nulls=20%/str=3
                        time:   [5.7064 µs 5.7175 µs 5.7301 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

in_list/Utf8/list=8/nulls=20%/str=12
                        time:   [5.4608 µs 5.4692 µs 5.4804 µs]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

in_list/Utf8/list=8/nulls=20%/str=100
                        time:   [8.7698 µs 8.7824 µs 8.7967 µs]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

in_list/Utf8/list=100/nulls=0%/str=3
                        time:   [6.4593 µs 6.4684 µs 6.4801 µs]
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

in_list/Utf8/list=100/nulls=0%/str=12
                        time:   [6.4264 µs 6.4352 µs 6.4460 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

in_list/Utf8/list=100/nulls=0%/str=100
                        time:   [9.6304 µs 9.6417 µs 9.6555 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

in_list/Utf8/list=100/nulls=20%/str=3
                        time:   [6.5752 µs 6.5855 µs 6.5958 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

in_list/Utf8/list=100/nulls=20%/str=12
                        time:   [6.9432 µs 6.9638 µs 6.9866 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

in_list/Utf8/list=100/nulls=20%/str=100
                        time:   [9.6388 µs 9.7165 µs 9.8148 µs]
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

in_list/Utf8View/list=3/nulls=0%/str=3
                        time:   [5.8557 µs 5.8771 µs 5.8992 µs]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

in_list/Utf8View/list=3/nulls=0%/str=12
                        time:   [5.6449 µs 5.6718 µs 5.7081 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

in_list/Utf8View/list=3/nulls=0%/str=100
                        time:   [9.8523 µs 9.8744 µs 9.8994 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

in_list/Utf8View/list=3/nulls=20%/str=3
                        time:   [5.5093 µs 5.5290 µs 5.5511 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=3/nulls=20%/str=12
                        time:   [5.3516 µs 5.3753 µs 5.3992 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

in_list/Utf8View/list=3/nulls=20%/str=100
                        time:   [9.1348 µs 9.1597 µs 9.1879 µs]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

in_list/Utf8View/list=8/nulls=0%/str=3
                        time:   [5.9711 µs 6.0002 µs 6.0333 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=8/nulls=0%/str=12
                        time:   [5.5949 µs 5.6036 µs 5.6134 µs]
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=8/nulls=0%/str=100
                        time:   [9.8098 µs 9.8218 µs 9.8355 µs]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

in_list/Utf8View/list=8/nulls=20%/str=3
                        time:   [5.5159 µs 5.5268 µs 5.5384 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=8/nulls=20%/str=12
                        time:   [5.1920 µs 5.2102 µs 5.2338 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=8/nulls=20%/str=100
                        time:   [9.3485 µs 9.3909 µs 9.4365 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

in_list/Utf8View/list=100/nulls=0%/str=3
                        time:   [6.6794 µs 6.7021 µs 6.7288 µs]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=100/nulls=0%/str=12
                        time:   [6.2144 µs 6.2361 µs 6.2601 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

in_list/Utf8View/list=100/nulls=0%/str=100
                        time:   [10.513 µs 10.562 µs 10.621 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

in_list/Utf8View/list=100/nulls=20%/str=3
                        time:   [6.3865 µs 6.4063 µs 6.4271 µs]

in_list/Utf8View/list=100/nulls=20%/str=12
                        time:   [6.1409 µs 6.1519 µs 6.1637 µs]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

in_list/Utf8View/list=100/nulls=20%/str=100
                        time:   [10.067 µs 10.089 µs 10.115 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

in_list/Float32/list=3/nulls=0%
                        time:   [4.4602 µs 4.4717 µs 4.4863 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

in_list/Float32/list=3/nulls=20%
                        time:   [4.3086 µs 4.3168 µs 4.3257 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

in_list/Float32/list=8/nulls=0%
                        time:   [4.4807 µs 4.4887 µs 4.4975 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

in_list/Float32/list=8/nulls=20%
                        time:   [4.2632 µs 4.2724 µs 4.2821 µs]
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

in_list/Float32/list=100/nulls=0%
                        time:   [4.7939 µs 4.8026 µs 4.8122 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

in_list/Float32/list=100/nulls=20%
                        time:   [4.7811 µs 4.8106 µs 4.8470 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

in_list/Int32/list=3/nulls=0%
                        time:   [4.3962 µs 4.4051 µs 4.4158 µs]
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

in_list/Int32/list=3/nulls=20%
                        time:   [4.2974 µs 4.3109 µs 4.3252 µs]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

in_list/Int32/list=8/nulls=0%
                        time:   [4.4708 µs 4.4813 µs 4.4929 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

in_list/Int32/list=8/nulls=20%
                        time:   [4.3094 µs 4.3239 µs 4.3392 µs]

in_list/Int32/list=100/nulls=0%
                        time:   [5.2120 µs 5.2288 µs 5.2473 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

in_list/Int32/list=100/nulls=20%
                        time:   [4.5616 µs 4.5775 µs 4.5952 µs]
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

alamb · 2025-12-08T18:21:55Z

datafusion/physical-expr/benches/in_list.rs

+const ARRAY_LENGTH: usize = 1024;
+
+/// Returns a friendly type name for the array type.
+fn array_type_name<A: 'static>() -> &'static str {


You could potentially use https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#display-and-fromstr

So like array.data_type().to_string() 🤔

alamb · 2025-12-08T21:10:06Z

run benchmarks in_list

alamb-ghbot · 2025-12-08T21:10:10Z

🤖 Hi @alamb, thanks for the request (#19211 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: in_list.

alamb · 2025-12-08T21:12:47Z

run benchmark in_list

alamb · 2025-12-08T21:13:28Z

run benchmarks in_list

(sudo make me a sandwhich) 😆 (I added the benchmark to my allowed list)

adriangb · 2025-12-08T21:14:11Z

😆 did you ask AI to make your a robot version of your profile?

alamb · 2025-12-08T21:14:43Z

😆 did you ask AI to make your a robot version of your profile?

I absolutely did

alamb-ghbot · 2025-12-08T21:15:01Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/in_list_utf8_benchmark (8a2d7d1) to 55a38d4 diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --all-features --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_in_list_utf8_benchmark
Results will be posted here when complete

alamb-ghbot · 2025-12-08T21:34:16Z

🤖: Benchmark completed

Details

group                                          feat_in_list_utf8_benchmark            main
-----                                          ---------------------------            ----
in_list/Float32/list=100/nulls=0%              1.00    308.2±3.79µs        ? ?/sec  
in_list/Float32/list=100/nulls=20%             1.00    390.9±3.16µs        ? ?/sec  
in_list/Float32/list=3/nulls=0%                1.00     14.6±0.29µs        ? ?/sec  
in_list/Float32/list=3/nulls=20%               1.00     19.7±0.36µs        ? ?/sec  
in_list/Float32/list=8/nulls=0%                1.00     29.3±1.05µs        ? ?/sec  
in_list/Float32/list=8/nulls=20%               1.00     39.2±0.11µs        ? ?/sec  
in_list/Int32/list=100/nulls=0%                1.00   255.1±17.01µs        ? ?/sec  
in_list/Int32/list=100/nulls=20%               1.00   325.4±15.87µs        ? ?/sec  
in_list/Int32/list=3/nulls=0%                  1.00     13.7±0.37µs        ? ?/sec  
in_list/Int32/list=3/nulls=20%                 1.00     17.6±0.04µs        ? ?/sec  
in_list/Int32/list=8/nulls=0%                  1.00     25.1±0.48µs        ? ?/sec  
in_list/Int32/list=8/nulls=20%                 1.00     34.0±0.48µs        ? ?/sec  
in_list/Utf8/list=100/nulls=0%/str=100         1.00    648.3±5.63µs        ? ?/sec  
in_list/Utf8/list=100/nulls=0%/str=12          1.00  711.7±103.46µs        ? ?/sec  
in_list/Utf8/list=100/nulls=0%/str=3           1.00   671.2±10.03µs        ? ?/sec  
in_list/Utf8/list=100/nulls=20%/str=100        1.00   602.2±15.23µs        ? ?/sec  
in_list/Utf8/list=100/nulls=20%/str=12         1.00    617.7±4.50µs        ? ?/sec  
in_list/Utf8/list=100/nulls=20%/str=3          1.00    623.7±3.55µs        ? ?/sec  
in_list/Utf8/list=3/nulls=0%/str=100           1.00     25.6±0.25µs        ? ?/sec  
in_list/Utf8/list=3/nulls=0%/str=12            1.00     27.4±0.12µs        ? ?/sec  
in_list/Utf8/list=3/nulls=0%/str=3             1.00     26.2±0.90µs        ? ?/sec  
in_list/Utf8/list=3/nulls=20%/str=100          1.00     26.6±1.03µs        ? ?/sec  
in_list/Utf8/list=3/nulls=20%/str=12           1.00     26.6±0.08µs        ? ?/sec  
in_list/Utf8/list=3/nulls=20%/str=3            1.00     27.5±0.56µs        ? ?/sec  
in_list/Utf8/list=8/nulls=0%/str=100           1.00     54.3±0.29µs        ? ?/sec  
in_list/Utf8/list=8/nulls=0%/str=12            1.00     59.0±3.81µs        ? ?/sec  
in_list/Utf8/list=8/nulls=0%/str=3             1.00     55.8±0.46µs        ? ?/sec  
in_list/Utf8/list=8/nulls=20%/str=100          1.00     53.2±0.37µs        ? ?/sec  
in_list/Utf8/list=8/nulls=20%/str=12           1.00     55.2±0.96µs        ? ?/sec  
in_list/Utf8/list=8/nulls=20%/str=3            1.00     55.9±0.86µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=0%/str=100     1.00    480.8±6.10µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=0%/str=12      1.00    565.9±4.30µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=0%/str=3       1.00   567.7±17.28µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=20%/str=100    1.00   465.8±11.18µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=20%/str=12     1.00   516.8±17.51µs        ? ?/sec  
in_list/Utf8View/list=100/nulls=20%/str=3      1.00    520.4±9.71µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=0%/str=100       1.00     18.6±0.20µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=0%/str=12        1.00     22.5±0.49µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=0%/str=3         1.00     22.2±0.24µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=20%/str=100      1.00     20.7±0.15µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=20%/str=12       1.00     23.3±0.36µs        ? ?/sec  
in_list/Utf8View/list=3/nulls=20%/str=3        1.00     23.2±0.35µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=0%/str=100       1.00     39.2±0.08µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=0%/str=12        1.00     49.1±0.35µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=0%/str=3         1.00     49.2±1.71µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=20%/str=100      1.00     40.9±3.34µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=20%/str=12       1.00     46.0±0.57µs        ? ?/sec  
in_list/Utf8View/list=8/nulls=20%/str=3        1.00     47.8±0.58µs        ? ?/sec  
in_list_f32 (1024, 0) IN (1, 0)                                                       1.00      8.7±0.09µs        ? ?/sec
in_list_f32 (1024, 0) IN (10, 0)                                                      1.00     34.9±1.06µs        ? ?/sec
in_list_f32 (1024, 0) IN (100, 0)                                                     1.00    304.9±1.16µs        ? ?/sec
in_list_f32 (1024, 0) IN (3, 0)                                                       1.00     14.8±0.73µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (1, 0)                                                     1.00     11.2±0.10µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (10, 0)                                                    1.00     47.3±0.61µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (100, 0)                                                   1.00    389.7±3.65µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (3, 0)                                                     1.00     19.6±0.33µs        ? ?/sec
in_list_i32 (1024, 0) IN (1, 0)                                                       1.00      8.2±0.09µs        ? ?/sec
in_list_i32 (1024, 0) IN (10, 0)                                                      1.00     29.6±0.19µs        ? ?/sec
in_list_i32 (1024, 0) IN (100, 0)                                                     1.00    251.9±2.98µs        ? ?/sec
in_list_i32 (1024, 0) IN (3, 0)                                                       1.00     13.6±0.22µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (1, 0)                                                     1.00     10.5±0.12µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (10, 0)                                                    1.00     35.1±0.63µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (100, 0)                                                   1.00    287.6±1.26µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (3, 0)                                                     1.00     16.5±0.13µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (1, 0)                                                  1.00     13.1±0.31µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (10, 0)                                                 1.00     68.6±1.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (100, 0)                                                1.00   674.7±25.94µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (3, 0)                                                  1.00     27.7±0.77µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (1, 0)                                                1.00     14.3±0.20µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (10, 0)                                               1.00     66.4±1.48µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (100, 0)                                              1.00    623.4±6.75µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (3, 0)                                                1.00     26.9±0.73µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (1, 0)                                                  1.00     13.7±0.45µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (10, 0)                                                 1.00     70.2±2.85µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (100, 0)                                                1.00    676.6±4.39µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (3, 0)                                                  1.00     27.0±1.49µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (1, 0)                                                1.00     14.0±0.14µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (10, 0)                                               1.00     64.8±0.26µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (100, 0)                                              1.00   613.7±12.89µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (3, 0)                                                1.00     26.4±0.26µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (1, 0)                                                   1.00     13.4±0.06µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (10, 0)                                                  1.00     67.8±1.42µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (100, 0)                                                 1.00   691.6±11.55µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (3, 0)                                                   1.00     26.2±0.38µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (1, 0)                                                 1.00     13.9±0.07µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (10, 0)                                                1.00     65.7±1.36µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (100, 0)                                               1.00    614.1±3.58µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (3, 0)                                                 1.00     26.7±0.46µs        ? ?/sec

github-actions bot added the physical-expr Changes to the physical-expr crates label Dec 8, 2025

geoffreyclaude force-pushed the feat/in_list_utf8_benchmark branch from 0081ba1 to f9793b1 Compare December 8, 2025 13:44

geoffreyclaude marked this pull request as ready for review December 8, 2025 13:45

geoffreyclaude force-pushed the feat/in_list_utf8_benchmark branch 3 times, most recently from 09d3f4e to 597d727 Compare December 8, 2025 17:13

adriangb mentioned this pull request Dec 8, 2025

Short InList Optimization pydantic/datafusion#46

Merged

bench: add Utf8View benchmarks for InList

8a2d7d1

Measure InList evaluation on both StringArray (Utf8) and StringViewArray (Utf8View) to compare performance across Arrow's string representations. Refactored with generics to make adding new array types trivial.

geoffreyclaude force-pushed the feat/in_list_utf8_benchmark branch from 597d727 to 8a2d7d1 Compare December 8, 2025 19:16

adriangb approved these changes Dec 8, 2025

View reviewed changes

alamb approved these changes Dec 8, 2025

View reviewed changes

Dandandan approved these changes Dec 8, 2025

View reviewed changes

apache deleted a comment from alamb-ghbot Dec 8, 2025

alamb added this pull request to the merge queue Dec 8, 2025

alamb added the development-process Related to development process of DataFusion label Dec 8, 2025

Merged via the queue into apache:main with commit ab7fe0e Dec 8, 2025
31 checks passed

bench: add dedicated Utf8View benchmarks for InList #19211

bench: add dedicated Utf8View benchmarks for InList #19211

Uh oh!

Conversation

geoffreyclaude commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Dec 8, 2025

Uh oh!

geoffreyclaude commented Dec 8, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 8, 2025

Uh oh!

alamb-ghbot commented Dec 8, 2025

Uh oh!

alamb commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Dec 8, 2025

Uh oh!

adriangb commented Dec 8, 2025

Uh oh!

alamb commented Dec 8, 2025

Uh oh!

alamb-ghbot commented Dec 8, 2025

Uh oh!

alamb-ghbot commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

geoffreyclaude commented Dec 8, 2025 •

edited

Loading

alamb commented Dec 8, 2025 •

edited

Loading