Skip to content

GH-39398: [C++][Parquet] DNM: benchmark for readLevels#39486

Closed
mapleFU wants to merge 1 commit intoapache:mainfrom
mapleFU:dnm/verify-read-faster
Closed

GH-39398: [C++][Parquet] DNM: benchmark for readLevels#39486
mapleFU wants to merge 1 commit intoapache:mainfrom
mapleFU:dnm/verify-read-faster

Conversation

@mapleFU
Copy link
Copy Markdown
Member

@mapleFU mapleFU commented Jan 6, 2024

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@mapleFU
Copy link
Copy Markdown
Member Author

mapleFU commented Jan 6, 2024

Please not merge this patch, this is just for benchmark

for (auto _ : state) {
state.PauseTiming();
Int32Reader* reader = helper.ResetColumnReader();
[[maybe_unused]] bool v = reader->HasNext();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hasNext to trigger initialization

const auto repetition = static_cast<Repetition::type>(state.range(0));
const auto batch_size = static_cast<int64_t>(state.range(1));

BenchmarkHelper helper(repetition, /*num_pages=*/1, /*levels_per_page=*/16 * 80000);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using one page to make it simple

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 6, 2024
int64_t* indices_read, const T** dict,
int32_t* dict_len) = 0;

virtual void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this private with a Test Peer that can be used in the benchmark, so we can check this PR in?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 6, 2024
@emkornfield
Copy link
Copy Markdown
Contributor

Thank you @mapleFU it would be great to check this in (without the count changes) so we have a good benchmark for ReadLevels. See comment on how to maybe do it without breaking abstractions.

@emkornfield
Copy link
Copy Markdown
Contributor

I guess looking at CI it would take a little more work than this proof of concept to check it in.

@mapleFU
Copy link
Copy Markdown
Member Author

mapleFU commented Jan 6, 2024

It's just a quick poc for ReadLevels optimization... I think exporting it is so hacking, because ReadLevels is just "read levels in current page". So, HasNext is called, and I only maintaining one page...The interface would be weird here

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jan 6, 2024

@ursabot please benchmark

@ursabot
Copy link
Copy Markdown

ursabot commented Jan 6, 2024

Benchmark runs are scheduled for commit 0529cce. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@conbench-apache-arrow
Copy link
Copy Markdown

Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 0529cce.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details.

@mapleFU
Copy link
Copy Markdown
Member Author

mapleFU commented Jan 7, 2024

Emm would regression benchmark un-related..?

@pitrou
Copy link
Copy Markdown
Member

pitrou commented Jan 7, 2024

Emm would regression benchmark un-related..?

None of them are related to Parquet.

@mapleFU mapleFU closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] [Parquet] Use std::count in parquet ColumnReader

4 participants