-
Notifications
You must be signed in to change notification settings - Fork 1.9k
bench: add dedicated Utf8View benchmarks for InList #19211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0081ba1 to
f9793b1
Compare
09d3f4e to
597d727
Compare
|
Thank you! I merged #19204, could you please resolve conflicts? |
Measure InList evaluation on both StringArray (Utf8) and StringViewArray (Utf8View) to compare performance across Arrow's string representations. Refactored with generics to make adding new array types trivial.
597d727 to
8a2d7d1
Compare
Rebased on main |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @geoffreyclaude
I ran the tests locally and 👍
Example output
Gnuplot not found, using plotters backend
in_list/Utf8/list=3/nulls=0%/str=3
time: [5.8309 µs 5.8479 µs 5.8681 µs]
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
5 (5.00%) high mild
6 (6.00%) high severe
in_list/Utf8/list=3/nulls=0%/str=12
time: [5.5662 µs 5.5731 µs 5.5819 µs]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
in_list/Utf8/list=3/nulls=0%/str=100
time: [8.8208 µs 8.8323 µs 8.8469 µs]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
in_list/Utf8/list=3/nulls=20%/str=3
time: [5.7071 µs 5.7169 µs 5.7277 µs]
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
in_list/Utf8/list=3/nulls=20%/str=12
time: [5.0895 µs 5.1067 µs 5.1240 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
in_list/Utf8/list=3/nulls=20%/str=100
time: [8.6880 µs 8.7066 µs 8.7277 µs]
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) high mild
7 (7.00%) high severe
in_list/Utf8/list=8/nulls=0%/str=3
time: [5.9622 µs 5.9709 µs 5.9815 µs]
Found 16 outliers among 100 measurements (16.00%)
8 (8.00%) high mild
8 (8.00%) high severe
in_list/Utf8/list=8/nulls=0%/str=12
time: [5.7414 µs 5.7488 µs 5.7579 µs]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
in_list/Utf8/list=8/nulls=0%/str=100
time: [8.8534 µs 8.8659 µs 8.8822 µs]
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) high mild
11 (11.00%) high severe
in_list/Utf8/list=8/nulls=20%/str=3
time: [5.7064 µs 5.7175 µs 5.7301 µs]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
in_list/Utf8/list=8/nulls=20%/str=12
time: [5.4608 µs 5.4692 µs 5.4804 µs]
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe
in_list/Utf8/list=8/nulls=20%/str=100
time: [8.7698 µs 8.7824 µs 8.7967 µs]
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
in_list/Utf8/list=100/nulls=0%/str=3
time: [6.4593 µs 6.4684 µs 6.4801 µs]
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high severe
in_list/Utf8/list=100/nulls=0%/str=12
time: [6.4264 µs 6.4352 µs 6.4460 µs]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
in_list/Utf8/list=100/nulls=0%/str=100
time: [9.6304 µs 9.6417 µs 9.6555 µs]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
in_list/Utf8/list=100/nulls=20%/str=3
time: [6.5752 µs 6.5855 µs 6.5958 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
in_list/Utf8/list=100/nulls=20%/str=12
time: [6.9432 µs 6.9638 µs 6.9866 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
in_list/Utf8/list=100/nulls=20%/str=100
time: [9.6388 µs 9.7165 µs 9.8148 µs]
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe
in_list/Utf8View/list=3/nulls=0%/str=3
time: [5.8557 µs 5.8771 µs 5.8992 µs]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
in_list/Utf8View/list=3/nulls=0%/str=12
time: [5.6449 µs 5.6718 µs 5.7081 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
in_list/Utf8View/list=3/nulls=0%/str=100
time: [9.8523 µs 9.8744 µs 9.8994 µs]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
in_list/Utf8View/list=3/nulls=20%/str=3
time: [5.5093 µs 5.5290 µs 5.5511 µs]
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=3/nulls=20%/str=12
time: [5.3516 µs 5.3753 µs 5.3992 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
in_list/Utf8View/list=3/nulls=20%/str=100
time: [9.1348 µs 9.1597 µs 9.1879 µs]
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
in_list/Utf8View/list=8/nulls=0%/str=3
time: [5.9711 µs 6.0002 µs 6.0333 µs]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=8/nulls=0%/str=12
time: [5.5949 µs 5.6036 µs 5.6134 µs]
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=8/nulls=0%/str=100
time: [9.8098 µs 9.8218 µs 9.8355 µs]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
in_list/Utf8View/list=8/nulls=20%/str=3
time: [5.5159 µs 5.5268 µs 5.5384 µs]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=8/nulls=20%/str=12
time: [5.1920 µs 5.2102 µs 5.2338 µs]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=8/nulls=20%/str=100
time: [9.3485 µs 9.3909 µs 9.4365 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
in_list/Utf8View/list=100/nulls=0%/str=3
time: [6.6794 µs 6.7021 µs 6.7288 µs]
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=100/nulls=0%/str=12
time: [6.2144 µs 6.2361 µs 6.2601 µs]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
in_list/Utf8View/list=100/nulls=0%/str=100
time: [10.513 µs 10.562 µs 10.621 µs]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
in_list/Utf8View/list=100/nulls=20%/str=3
time: [6.3865 µs 6.4063 µs 6.4271 µs]
in_list/Utf8View/list=100/nulls=20%/str=12
time: [6.1409 µs 6.1519 µs 6.1637 µs]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
in_list/Utf8View/list=100/nulls=20%/str=100
time: [10.067 µs 10.089 µs 10.115 µs]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
in_list/Float32/list=3/nulls=0%
time: [4.4602 µs 4.4717 µs 4.4863 µs]
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
in_list/Float32/list=3/nulls=20%
time: [4.3086 µs 4.3168 µs 4.3257 µs]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
in_list/Float32/list=8/nulls=0%
time: [4.4807 µs 4.4887 µs 4.4975 µs]
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild
in_list/Float32/list=8/nulls=20%
time: [4.2632 µs 4.2724 µs 4.2821 µs]
Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) high mild
3 (3.00%) high severe
in_list/Float32/list=100/nulls=0%
time: [4.7939 µs 4.8026 µs 4.8122 µs]
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
in_list/Float32/list=100/nulls=20%
time: [4.7811 µs 4.8106 µs 4.8470 µs]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
in_list/Int32/list=3/nulls=0%
time: [4.3962 µs 4.4051 µs 4.4158 µs]
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
in_list/Int32/list=3/nulls=20%
time: [4.2974 µs 4.3109 µs 4.3252 µs]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
in_list/Int32/list=8/nulls=0%
time: [4.4708 µs 4.4813 µs 4.4929 µs]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
in_list/Int32/list=8/nulls=20%
time: [4.3094 µs 4.3239 µs 4.3392 µs]
in_list/Int32/list=100/nulls=0%
time: [5.2120 µs 5.2288 µs 5.2473 µs]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
in_list/Int32/list=100/nulls=20%
time: [4.5616 µs 4.5775 µs 4.5952 µs]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe| const ARRAY_LENGTH: usize = 1024; | ||
|
|
||
| /// Returns a friendly type name for the array type. | ||
| fn array_type_name<A: 'static>() -> &'static str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could potentially use https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#display-and-fromstr
So like array.data_type().to_string() 🤔
|
run benchmarks in_list |
|
🤖 Hi @alamb, thanks for the request (#19211 (comment)).
Please choose one or more of these with |
|
run benchmark in_list |
( |
I absolutely did |
|
🤖 |
|
🤖: Benchmark completed Details
|

Which issue does this PR close?
N/A - benchmark improvement
Stacked on top of #19204
Rationale for this change
We need to measure InList performance on both StringArray (Utf8) and StringViewArray (Utf8View) to compare Arrow's string representations.
What changes are included in this PR?
Add Utf8View benchmarks for InList, refactored with generics to make adding new array types trivial.
Are these changes tested?
Benchmark-only change.
Are there any user-facing changes?
No.