GH-39398: [C++][Parquet] DNM: benchmark for readLevels by mapleFU · Pull Request #39486 · apache/arrow

mapleFU · 2024-01-06T08:47:22Z

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Closes: [C++] [Parquet] Use std::count in parquet ColumnReader #39398

mapleFU · 2024-01-06T08:48:58Z

Please not merge this patch, this is just for benchmark

mapleFU · 2024-01-06T08:50:07Z

cpp/src/parquet/column_reader_benchmark.cc

+  for (auto _ : state) {
+    state.PauseTiming();
+    Int32Reader* reader = helper.ResetColumnReader();
+    [[maybe_unused]] bool v = reader->HasNext();


Using hasNext to trigger initialization

mapleFU · 2024-01-06T08:50:27Z

cpp/src/parquet/column_reader_benchmark.cc

+  const auto repetition = static_cast<Repetition::type>(state.range(0));
+  const auto batch_size = static_cast<int64_t>(state.range(1));
+
+  BenchmarkHelper helper(repetition, /*num_pages=*/1, /*levels_per_page=*/16 * 80000);


Using one page to make it simple

emkornfield · 2024-01-06T09:06:45Z

cpp/src/parquet/column_reader.h

                                          int64_t* indices_read, const T** dict,
                                          int32_t* dict_len) = 0;
+
+  virtual void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels,


What do you think about making this private with a Test Peer that can be used in the benchmark, so we can check this PR in?

emkornfield · 2024-01-06T09:08:04Z

Thank you @mapleFU it would be great to check this in (without the count changes) so we have a good benchmark for ReadLevels. See comment on how to maybe do it without breaking abstractions.

emkornfield · 2024-01-06T09:11:07Z

I guess looking at CI it would take a little more work than this proof of concept to check it in.

mapleFU · 2024-01-06T09:14:28Z

It's just a quick poc for ReadLevels optimization... I think exporting it is so hacking, because ReadLevels is just "read levels in current page". So, HasNext is called, and I only maintaining one page...The interface would be weird here

pitrou · 2024-01-06T12:22:49Z

@ursabot please benchmark

ursabot · 2024-01-06T12:22:56Z

Benchmark runs are scheduled for commit 0529cce. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2024-01-06T21:14:45Z

Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 0529cce.

There were 3 benchmark results indicating a performance regression:

Pull Request Run on ursa-thinkcentre-m75q at 2024-01-06 21:07:35Z
- ReadMmapUncachedFile (C++) with params=num_cols:64/is_partial:0/real_time, source=cpp-micro, suite=arrow-ipc-read-write-benchmark
- ValidateTinyNonAscii (C++) with source=cpp-micro, suite=arrow-utf8-util-benchmark
and 1 more (see the report linked below)

The full Conbench report has more details.

mapleFU · 2024-01-07T08:09:54Z

Emm would regression benchmark un-related..?

pitrou · 2024-01-07T13:07:32Z

Emm would regression benchmark un-related..?

None of them are related to Parquet.

DNM: benchmark for std::count

0529cce

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Jan 6, 2024

mapleFU mentioned this pull request Jan 6, 2024

GH-39398: [C++][Parquet] Use std::count in ColumnReader ReadLevels #39397

Merged

mapleFU commented Jan 6, 2024

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 6, 2024

emkornfield reviewed Jan 6, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 6, 2024

mapleFU closed this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-39398: [C++][Parquet] DNM: benchmark for readLevels#39486

GH-39398: [C++][Parquet] DNM: benchmark for readLevels#39486
mapleFU wants to merge 1 commit intoapache:mainfrom
mapleFU:dnm/verify-read-faster

mapleFU commented Jan 6, 2024 •

edited by github-actions bot

Loading

Uh oh!

mapleFU commented Jan 6, 2024

Uh oh!

mapleFU Jan 6, 2024

Uh oh!

mapleFU Jan 6, 2024

Uh oh!

emkornfield Jan 6, 2024

Uh oh!

emkornfield commented Jan 6, 2024

Uh oh!

emkornfield commented Jan 6, 2024

Uh oh!

mapleFU commented Jan 6, 2024

Uh oh!

pitrou commented Jan 6, 2024

Uh oh!

ursabot commented Jan 6, 2024

Uh oh!

conbench-apache-arrow bot commented Jan 6, 2024

Uh oh!

mapleFU commented Jan 7, 2024

Uh oh!

pitrou commented Jan 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mapleFU commented Jan 6, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

mapleFU commented Jan 6, 2024

Uh oh!

mapleFU Jan 6, 2024

Choose a reason for hiding this comment

Uh oh!

mapleFU Jan 6, 2024

Choose a reason for hiding this comment

Uh oh!

emkornfield Jan 6, 2024

Choose a reason for hiding this comment

Uh oh!

emkornfield commented Jan 6, 2024

Uh oh!

emkornfield commented Jan 6, 2024

Uh oh!

mapleFU commented Jan 6, 2024

Uh oh!

pitrou commented Jan 6, 2024

Uh oh!

ursabot commented Jan 6, 2024

Uh oh!

conbench-apache-arrow bot commented Jan 6, 2024

Uh oh!

mapleFU commented Jan 7, 2024

Uh oh!

pitrou commented Jan 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mapleFU commented Jan 6, 2024 •

edited by github-actions bot

Loading