feat(parquet): Support page‑level pruning#14214
feat(parquet): Support page‑level pruning#14214zhli1142015 wants to merge 15 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@majetideepak @rui-mo @Yuhta could you please help to review this PR? |
83f0664 to
9e5c920
Compare
rui-mo
left a comment
There was a problem hiding this comment.
Thanks for this nice work! Added some comments.
a55dd79 to
6decdbf
Compare
20408d9 to
2108eb3
Compare
2108eb3 to
1c436fa
Compare
rui-mo
left a comment
There was a problem hiding this comment.
Thanks. Just added some nits.
f9871c4 to
590bbb2
Compare
590bbb2 to
c327abe
Compare
c327abe to
7225488
Compare
7225488 to
7365082
Compare
|
@majetideepak could you help review this change? Thanks. |
|
@zhli1142015 I will take a look today. |
7365082 to
a4b747d
Compare
|
@zhli1142015 I started reviewing this. I need a couple more days to complete. Thanks. |
a4b747d to
7574952
Compare
|
UT failure is not related, #15093 |
d37ab69 to
a4b9237
Compare
a4b9237 to
a690963
Compare
|
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions! |
|
If the filter column is sorted or z-ordered, the benefit is obvious:
|
This PR implements Parquet page pruning:
• ColumnPageIndex: Implements parsing of column index pages and offset index
pages, and the function to convert relevant metadata into
dwio::common::ColumnStatistics.• RowRanges: Introduces this to represent pushdown filter evaluation results
for variably sized data pages, with added support in
MetadataFilter.• Parquet Reader: Implements index page reading, merges filtering results
across columns, and generates final
RowRanges. Skips unneeded rows duringdata reading based on computed
RowRanges.• ParquetData: Uses column index statistics to apply pushdown filters for page
skipping. Loads only required data pages according to final
RowRangesinfunction
enqueueRowGroup.• PageReader: Skips unneeded pages using offset index information.
Fixes: #14195