GH-39704: [C++][Parquet] Benchmark levels decoding #39705
GH-39704: [C++][Parquet] Benchmark levels decoding #39705pitrou merged 8 commits intoapache:mainfrom
Conversation
|
|
|
@pitrou @emkornfield @wgtmac Would you mind take a look? Also cc @Hattonuri |
|
Benchmark on my MacOS with Release (-O2) |
|
This benchmark shows that, when not highly repeated, the RLE without bitpacking is slow 😅 After changing RLE to BIT_PACKED, the speed gets a bit faster when repeat is not high: We should mentioned that, our native unpack to int16 is slow =_=, this making decoding a bit slow. |
|
Is this use case relevant here? #34510 Reading a non nullable fixed size list is missing the fast path, it’d nice to see it in the benchmark (even if not improving yet). With all the AI nowadays I assume tensor storage will be more and more common. |
Yeah I think it's related, I think I can optimize unpack later, but maybe I need some help in optimizing RLE |
|
@emkornfield @pitrou Updated, so sorry for the delaying |
|
Result in my MacOS with Release(-O2): |
b402c93 to
fc0c1a5
Compare
|
Ping @pitrou @emkornfield for help |
|
Strange phenomenon: we get results like |
|
Oh, it seems Google benchmark has a weird behavior here. Unrelated to this PR though. |
|
Posted google/benchmark#1749 for the Google benchmark oddity. |
|
FTR, benchmark numbers here: |
|
@pitrou I think the benchmark result shows "batch_size" should be take into consideration, for example, when the batchsize grows, BITPACK doesn't get improved, however, RLE is well optimized |
|
Mybad, change vector for output to |
|
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 0c88d13. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
### Rationale for this change This patch add the level-decoding benchmark. It test: 1. Different max-level (for flat type, maximum level would be 1, for nested type, it would grows) 2. With different repeat ( repeated null / non-null is different from non-repeated data) 3. With different read-batch size. This part of logic is a bit tricky in original code ### What changes are included in this PR? Add Level decoding benchmark ### Are these changes tested? No need ### Are there any user-facing changes? no * Closes: apache#39704 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
This patch add the level-decoding benchmark. It test:
What changes are included in this PR?
Add Level decoding benchmark
Are these changes tested?
No need
Are there any user-facing changes?
no