Add Parquet roundtrip benchmarks #8956

etseidl · 2025-12-04T18:09:51Z

Which issue does this PR close?

Closes Add round trip benchmark for Parquet writer/reader #8955.

Rationale for this change

Baseline for future improvements.

What changes are included in this PR?

Adds new benchmarks for reading and writing. Currently uses a fixed number of row groups, pages, and rows. Cycles through data types and encodings.

Are these changes tested?

N/A

Are there any user-facing changes?

No

etseidl · 2025-12-04T18:10:36Z

@alamb I borrowed liberally from your parquet footer code 😉

etseidl · 2025-12-04T20:19:19Z

Example run

Details

group                                 base
-----                                 ----
read Binary(100) delta_byte_array     1.00     21.7±0.49ms        ? ?/sec
read Binary(100) delta_length         1.00     11.4±0.18ms        ? ?/sec
read Binary(100) dict                 1.00     12.3±0.16ms        ? ?/sec
read Binary(100) plain                1.00     10.6±0.28ms        ? ?/sec
read Binary(20) delta_byte_array      1.00     12.6±0.18ms        ? ?/sec
read Binary(20) delta_length          1.00      8.3±0.19ms        ? ?/sec
read Binary(20) dict                  1.00      7.5±0.20ms        ? ?/sec
read Binary(20) plain                 1.00      7.4±0.25ms        ? ?/sec
read Fixed(16) byte_stream_split      1.00      6.8±0.38ms        ? ?/sec
read Fixed(16) delta_byte_array       1.00      8.0±0.20ms        ? ?/sec
read Fixed(16) dict                   1.00  1775.6±49.86µs        ? ?/sec
read Fixed(16) plain                  1.00  1757.0±43.28µs        ? ?/sec
read Fixed(2) byte_stream_split       1.00  1770.9±30.40µs        ? ?/sec
read Fixed(2) delta_byte_array        1.00      8.3±0.15ms        ? ?/sec
read Fixed(2) dict                    1.00  1223.6±28.11µs        ? ?/sec
read Fixed(2) plain                   1.00  1229.1±25.39µs        ? ?/sec
read f32 byte_stream_split            1.00      5.2±0.16ms        ? ?/sec
read f32 dict                         1.00      4.2±0.09ms        ? ?/sec
read f32 plain                        1.00      3.1±0.27ms        ? ?/sec
read f64 byte_stream_split            1.00      9.1±0.61ms        ? ?/sec
read f64 dict                         1.00      4.4±0.06ms        ? ?/sec
read f64 plain                        1.00      3.4±0.14ms        ? ?/sec
read int32 byte_stream_split          1.00      5.1±0.14ms        ? ?/sec
read int32 delta_binary               1.00      4.4±0.08ms        ? ?/sec
read int32 dict                       1.00      4.9±0.68ms        ? ?/sec
read int32 plain                      1.00      3.1±0.17ms        ? ?/sec
read int64 byte_stream_split          1.00      9.2±0.67ms        ? ?/sec
read int64 delta_binary               1.00      5.0±0.09ms        ? ?/sec
read int64 dict                       1.00      4.4±0.06ms        ? ?/sec
read int64 plain                      1.00      3.4±0.04ms        ? ?/sec
write Binary(100) delta_byte_array    1.00     64.1±1.77ms        ? ?/sec
write Binary(100) delta_length        1.00     55.6±1.15ms        ? ?/sec
write Binary(100) dict                1.00     39.4±0.82ms        ? ?/sec
write Binary(100) plain               1.00     51.1±1.04ms        ? ?/sec
write Binary(20) delta_byte_array     1.00     31.9±0.35ms        ? ?/sec
write Binary(20) delta_length         1.00     24.7±0.71ms        ? ?/sec
write Binary(20) dict                 1.00     32.4±0.78ms        ? ?/sec
write Binary(20) plain                1.00     24.1±0.22ms        ? ?/sec
write Fixed(16) byte_stream_split     1.00     67.5±0.67ms        ? ?/sec
write Fixed(16) delta_byte_array      1.00    148.0±2.06ms        ? ?/sec
write Fixed(16) dict                  1.00     62.2±0.53ms        ? ?/sec
write Fixed(16) plain                 1.00     62.6±1.38ms        ? ?/sec
write Fixed(2) byte_stream_split      1.00     57.7±0.76ms        ? ?/sec
write Fixed(2) delta_byte_array       1.00    144.4±0.98ms        ? ?/sec
write Fixed(2) dict                   1.00     59.7±0.73ms        ? ?/sec
write Fixed(2) plain                  1.00     59.7±0.63ms        ? ?/sec
write f32 byte_stream_split           1.00     17.6±0.44ms        ? ?/sec
write f32 dict                        1.00     31.5±0.33ms        ? ?/sec
write f32 plain                       1.00     18.0±1.41ms        ? ?/sec
write f64 byte_stream_split           1.00     21.0±0.25ms        ? ?/sec
write f64 dict                        1.00     31.8±0.38ms        ? ?/sec
write f64 plain                       1.00     19.6±0.20ms        ? ?/sec
write int32 byte_stream_split         1.00     21.7±0.32ms        ? ?/sec
write int32 delta_binary              1.00     29.0±0.33ms        ? ?/sec
write int32 dict                      1.00     38.4±2.54ms        ? ?/sec
write int32 plain                     1.00     22.2±1.35ms        ? ?/sec
write int64 byte_stream_split         1.00     21.7±0.47ms        ? ?/sec
write int64 delta_binary              1.00     27.6±0.40ms        ? ?/sec
write int64 dict                      1.00     32.4±0.42ms        ? ?/sec
write int64 plain                     1.00     20.2±0.22ms        ? ?/sec

alamb

Thanks @etseidl -- this looks great

I had a few suggestions, but nothing I think is required to merge

I also ran some basic profiling on these benchmarks to see

samply record -- cargo bench --bench parquet_round_trip -- "write int64 dict"

And it looks like it is measuring what I would expect:

I have a good feeling that the encoder/decoder is about to get a lot faster...

alamb · 2025-12-05T15:25:13Z

parquet/benches/parquet_round_trip.rs

+
+/// Creates a [`PrimitiveArray`] of a given `size` and `null_density`
+/// filling it with random numbers generated using the provided `seed`.
+pub fn create_primitive_array_with_seed<T>(


Do we need a separate copy of these functions? Maybe we can reuse the existing functions in bench_utils.rs:

https://github.com/apache/arrow-rs/blob/f131b5469655c2a1afc3b23ce5e3f850d6a389cf/arrow/src/util/bench_util.rs#L252-L251

Ha, I just copied from https://github.com/alamb/parquet_footer_parsing/blob/main/src/datagen.rs 😅

I'll switch over 👍

(I totally copy/pasted from arrow for that benchmark of course -- it all came full circle!)

done in d9bd421

parquet/benches/parquet_round_trip.rs

alamb · 2025-12-05T15:28:32Z

parquet/benches/parquet_round_trip.rs

+        .collect()
+}
+
+pub fn file_from_spec(spec: ParquetFileSpec, buf_size: Option<usize>) -> Bytes {


alamb · 2025-12-05T15:29:23Z

parquet/benches/parquet_round_trip.rs

+}
+
+fn read_write(c: &mut Criterion, spec: ParquetFileSpec, msg: &str) {
+    let f = file_from_spec(spec, None);


rather than passing in the buffer size, maybe the test could pass in the buffer directly (reusing it across calls) to avoid all output buffer allocations 🤔

Yes, I was thinking that too after submitting. I'll try switching it up some.

Done in f622a3d

alamb · 2025-12-05T15:42:59Z

parquet/benches/parquet_round_trip.rs

+        for rg in 0..spec.num_row_groups {
+            let col_writers = row_group_factory.create_column_writers(rg).unwrap();
+
+            let encoded_columns = encode_row_group(&schema, &spec, col_writers);


What is the reason to use this lower level api (rather than just writer.write to write the whole batch)?

Using writer.write would likely be less code and I think it might more closely mirror the API people actually use (though now I write this I am not sure I really know what people use)

Again, copy-paste https://github.com/alamb/parquet_footer_parsing/blob/4b551030fc1ecd2ddd46617fc3953fbd7491eb07/src/parquet_file.rs#L110-L126

😬 -- my own fault lol

Done in d783743

alamb · 2025-12-05T15:45:06Z

parquet/benches/parquet_round_trip.rs

+
+    c.bench_function(&format!("read {msg}"), |b| {
+        b.iter(|| {
+            let record_reader = ParquetRecordBatchReaderBuilder::try_new(f.clone())


I had to double check that f is Bytes here (and thus this clone is cheap)

I'll change to a more meaningful name 😅

Done in f622a3d

etseidl · 2025-12-05T15:57:15Z

Thanks @alamb. I was a bit surprised that the read and write times were so different.

I have a good feeling that the encoder/decoder is about to get a lot faster...

Hopefully. One thing to hit first is the fixed-len binary encoder. That seems to do a lot of allocations/copies that we should be able to avoid.

etseidl · 2025-12-05T17:53:01Z

parquet/benches/parquet_round_trip.rs

+// arrow::util::bench_util::create_fsb_array with a seed
+
+/// Creates a random (but fixed-seeded) array of fixed size with a given null density and length
+fn create_fsb_array_with_seed(


Should I just move this to bench_util.rs?

sure -- maybe a follow on PR

alamb

🚀

alamb · 2025-12-08T21:40:50Z

parquet/benches/parquet_round_trip.rs

+// arrow::util::bench_util::create_fsb_array with a seed
+
+/// Creates a random (but fixed-seeded) array of fixed size with a given null density and length
+fn create_fsb_array_with_seed(


sure -- maybe a follow on PR

alamb · 2025-12-08T21:41:21Z

Thanks @etseidl

etseidl added 3 commits December 4, 2025 09:50

add parquet roundtrip benchmarks

50555a3

reduce column count for binary benches (to keep times down)

87cff69

Merge remote-tracking branch 'origin/main' into parquet_bench

037bb18

github-actions bot added the parquet Changes to the parquet crate label Dec 4, 2025

etseidl added 2 commits December 4, 2025 10:45

fix CI

be6a79c

a few cleanups

fc0ac5a

alamb mentioned this pull request Dec 5, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-12-01 apache/datafusion#19016

Closed

42 tasks

alamb approved these changes Dec 5, 2025

View reviewed changes

etseidl added 4 commits December 5, 2025 08:30

reuse existing functions

d9bd421

reuse buffer between runs

f622a3d

use simpler API for writing batches

d783743

clippy

4831595

etseidl commented Dec 5, 2025

View reviewed changes

etseidl added 2 commits December 5, 2025 10:01

add actual binary bench, move old binary to string

a747f22

add StringView

73647f5

alamb approved these changes Dec 8, 2025

View reviewed changes

alamb merged commit c9fca0b into apache:main Dec 8, 2025
17 checks passed

etseidl deleted the parquet_bench branch December 9, 2025 22:42

alamb mentioned this pull request Jan 6, 2026

Add round trip benchmark for Parquet writer/reader #8955

Closed

Add Parquet roundtrip benchmarks #8956

Add Parquet roundtrip benchmarks #8956

Uh oh!

Conversation

etseidl commented Dec 4, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

etseidl commented Dec 4, 2025

Uh oh!

etseidl commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etseidl commented Dec 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

etseidl commented Dec 4, 2025 •

edited

Loading