Performance optimizations for parquet sub-rowgroup reader. by nvdbaranec · Pull Request #15020 · rapidsai/cudf

nvdbaranec · 2024-02-09T21:24:10Z

This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader.

The primary culprit for the performance hit was that in the case where we perform no splits, we were making a full copy of all of the pages into the subpass struct (including a pinned memory allocation). This is unnecessary because we can just represent the pages in the subpass as a span that wraps the existing pages in the pass.

In addition, several hostdevice_vectors used for work that could be done entirely device-side were converted to rmm::device_uvector.

Finally, I converted a number of functions that were taking hostdevice_vectors to use spans instead and added some missing operators to the hostdevice_vector class itself.

This PR doesn't recover all the time (there is some new work that we have to do in all cases) but it takes out most of the sting. A sample of some of the benchmarks that were most notably affected:

                       Original Time      Sub-rowgroup-implementation       This PR
parquet_read_decode
Int, device buffer 0   29260860778        26373181343                       28121328587
Int, device buffer 1   30692134492        27474241282                       29495189226

parquet_read_chunks
Int, device buffer     33895028252        29986276949                       32293548191
Float, device buffer   57055985251        49640274260                       55795392897

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…is to avoid allocating a separate set of page data for the subpass when we are not actually causing any chunking to occur. We can simply wrap the base pages from the pass itself in a span and use it directly.

…rk. Added some more ranges.

cpp/src/io/parquet/page_hdr.cu

hyperbolic2346

The tiniest of nits. This is a good find! Good job here.

cpp/src/io/utilities/hostdevice_span.hpp

cpp/src/io/parquet/reader_impl_chunking.cu

cpp/src/io/parquet/reader_impl_chunking.hpp

cpp/src/io/parquet/reader_impl_preprocess.cu

nvdbaranec · 2024-02-29T16:31:46Z

/merge

nvdbaranec added 4 commits February 6, 2024 12:55

Optimization to sub-rowgroup parquet reader. The primary change here …

3cc11d3

…is to avoid allocating a separate set of page data for the subpass when we are not actually causing any chunking to occur. We can simply wrap the base pages from the pass itself in a span and use it directly.

Merge branch 'branch-24.04' into sub_rowgroup_opt_cleanup

2561893

A few more low-hanging fruit removals of unnecessary host<->device wo…

038b168

…rk. Added some more ranges.

Merge branch 'branch-24.04' into sub_rowgroup_opt_cleanup

1d96d20

nvdbaranec added libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue Performance Performance related issue non-breaking Non-breaking change labels Feb 9, 2024

nvdbaranec requested a review from a team as a code owner February 9, 2024 21:24

nvdbaranec requested review from davidwendt and shrshi February 9, 2024 21:24

nvdbaranec added tech debt and removed tech debt labels Feb 9, 2024

ttnghia reviewed Feb 12, 2024

View reviewed changes

cpp/src/io/parquet/page_hdr.cu Show resolved Hide resolved

hyperbolic2346 approved these changes Feb 15, 2024

View reviewed changes

cpp/src/io/utilities/hostdevice_span.hpp Outdated Show resolved Hide resolved

cpp/src/io/utilities/hostdevice_span.hpp Outdated Show resolved Hide resolved

nvdbaranec added 2 commits February 22, 2024 10:39

Fix an out of bound indexing error in an exclusive scan.

1db3f78

Doc changes from PR review feedback.

d191b01

nvdbaranec added improvement Improvement / enhancement to an existing function and removed code quality labels Feb 22, 2024

nvdbaranec added 3 commits February 22, 2024 11:00

Merge branch 'branch-24.04' into sub_rowgroup_opt_cleanup

7267650

Fixed another missed index case in an exclusive scan.

61981f5

Merge branch 'branch-24.04' into sub_rowgroup_opt_cleanup

bc4974a