Performance optimizations for parquet sub-rowgroup reader.#15020
Merged
rapids-bot[bot] merged 13 commits intorapidsai:branch-24.04from Feb 29, 2024
Merged
Performance optimizations for parquet sub-rowgroup reader.#15020rapids-bot[bot] merged 13 commits intorapidsai:branch-24.04from
rapids-bot[bot] merged 13 commits intorapidsai:branch-24.04from
Conversation
…is to avoid allocating a separate set of page data for the subpass when we are not actually causing any chunking to occur. We can simply wrap the base pages from the pass itself in a span and use it directly.
…rk. Added some more ranges.
ttnghia
reviewed
Feb 12, 2024
hyperbolic2346
approved these changes
Feb 15, 2024
Contributor
hyperbolic2346
left a comment
There was a problem hiding this comment.
The tiniest of nits. This is a good find! Good job here.
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
reviewed
Feb 26, 2024
ttnghia
approved these changes
Feb 28, 2024
Contributor
Author
|
/merge |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader.
The primary culprit for the performance hit was that in the case where we perform no splits, we were making a full copy of all of the pages into the subpass struct (including a pinned memory allocation). This is unnecessary because we can just represent the pages in the subpass as a span that wraps the existing pages in the pass.
In addition, several
hostdevice_vectors used for work that could be done entirely device-side were converted tormm::device_uvector.Finally, I converted a number of functions that were taking hostdevice_vectors to use spans instead and added some missing operators to the
hostdevice_vectorclass itself.This PR doesn't recover all the time (there is some new work that we have to do in all cases) but it takes out most of the sting. A sample of some of the benchmarks that were most notably affected:
Checklist