Reproducer for a bug in parquet 57.1.0+ where RowSelectionStrategy::Mask attempts to read skipped pages.
cargo testParquet error: Invalid offset in sparse column chunk data: 754, no matching page found.
If you are using a `SelectionStrategyPolicy::Mask`, ensure that the OffsetIndex is provided when creating the InMemoryRowGroup.
Test data is located in test.parquet.
See create_parquet binary for code that was used to create it.
- Columns:
time(timestamp),tag(string) - 2 row groups, 300 rows each
Tag column:
- 3 tag values 'a', 'b', 'c' with rows sorted by tag values
- When querying
tag IN ('a', 'c')this creates selector list:[select 100, skip 100, select 100]
Time column:
- Times interleaved: even rows in predicate range, odd rows out of predicate range
- When querying on a time range this creates a sparse selection to trigger Mask strategy
| Rows | Tag | Time |
|---|---|---|
| 0-99 | 'a' | Alternating in-range/out-of-range |
| 100-199 | 'b' | Alternating in-range/out-of-range |
| 200-299 | 'c' | Alternating in-range/out-of-range |
Note: The page size is set so that the skip section (tag='b') contains at least one full page.
The bug occurs when:
- A predicate uses
RowSelectionStrategy::Selectorswith aRowSelectorlist that skips an entire page. - Another predicate uses
RowSelectionStrategy::Maskby triggering the mask run-length threshold. - The column with
RowSelectionStrategy::Maskis not in the output projection, soshould_force_selectorswill not force it to useRowSelectionStrategy::Selectors. - The mask strategy will try to fetch pages that were skipped, resulting in an error.