Skip to content

Performance optimizations for parquet sub-rowgroup reader.#15020

Merged
rapids-bot[bot] merged 13 commits intorapidsai:branch-24.04from
nvdbaranec:sub_rowgroup_opt_cleanup
Feb 29, 2024
Merged

Performance optimizations for parquet sub-rowgroup reader.#15020
rapids-bot[bot] merged 13 commits intorapidsai:branch-24.04from
nvdbaranec:sub_rowgroup_opt_cleanup

Conversation

@nvdbaranec
Copy link
Copy Markdown
Contributor

This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader.

The primary culprit for the performance hit was that in the case where we perform no splits, we were making a full copy of all of the pages into the subpass struct (including a pinned memory allocation). This is unnecessary because we can just represent the pages in the subpass as a span that wraps the existing pages in the pass.

In addition, several hostdevice_vectors used for work that could be done entirely device-side were converted to rmm::device_uvector.

Finally, I converted a number of functions that were taking hostdevice_vectors to use spans instead and added some missing operators to the hostdevice_vector class itself.

This PR doesn't recover all the time (there is some new work that we have to do in all cases) but it takes out most of the sting. A sample of some of the benchmarks that were most notably affected:

                       Original Time      Sub-rowgroup-implementation       This PR
parquet_read_decode
Int, device buffer 0   29260860778        26373181343                       28121328587
Int, device buffer 1   30692134492        27474241282                       29495189226

parquet_read_chunks
Int, device buffer     33895028252        29986276949                       32293548191
Float, device buffer   57055985251        49640274260                       55795392897

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

…is to avoid allocating a separate set of page data for the subpass when

we are not actually causing any chunking to occur. We can simply wrap the base pages from the pass itself in a span and use it directly.
@nvdbaranec nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Performance Performance related issue non-breaking Non-breaking change labels Feb 9, 2024
@nvdbaranec nvdbaranec requested a review from a team as a code owner February 9, 2024 21:24
Copy link
Copy Markdown
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiniest of nits. This is a good find! Good job here.

@nvdbaranec nvdbaranec added improvement Improvement / enhancement to an existing function and removed code quality labels Feb 22, 2024
@nvdbaranec nvdbaranec requested a review from ttnghia February 28, 2024 21:12
@nvdbaranec
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit a9e41e7 into rapidsai:branch-24.04 Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants