Skip to content

erratic-pattern/parquet_mask_strategy_missing_pages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parquet "Invalid offset in sparse column chunk data" Reproducer

Reproducer for a bug in parquet 57.1.0+ where RowSelectionStrategy::Mask attempts to read skipped pages.

Run

cargo test

Error Message

Parquet error: Invalid offset in sparse column chunk data: 754, no matching page found.
If you are using a `SelectionStrategyPolicy::Mask`, ensure that the OffsetIndex is provided when creating the InMemoryRowGroup.

Test Data

Test data is located in test.parquet.

See create_parquet binary for code that was used to create it.

File Layout

  • Columns: time (timestamp), tag (string)
  • 2 row groups, 300 rows each

Tag column:

  • 3 tag values 'a', 'b', 'c' with rows sorted by tag values
  • When querying tag IN ('a', 'c') this creates selector list: [select 100, skip 100, select 100]

Time column:

  • Times interleaved: even rows in predicate range, odd rows out of predicate range
  • When querying on a time range this creates a sparse selection to trigger Mask strategy
Rows Tag Time
0-99 'a' Alternating in-range/out-of-range
100-199 'b' Alternating in-range/out-of-range
200-299 'c' Alternating in-range/out-of-range

Note: The page size is set so that the skip section (tag='b') contains at least one full page.

Bug Description

The bug occurs when:

  1. A predicate uses RowSelectionStrategy::Selectors with a RowSelector list that skips an entire page.
  2. Another predicate uses RowSelectionStrategy::Mask by triggering the mask run-length threshold.
  3. The column with RowSelectionStrategy::Mask is not in the output projection, so should_force_selectors will not force it to use RowSelectionStrategy::Selectors.
  4. The mask strategy will try to fetch pages that were skipped, resulting in an error.

Related Issues

About

Reproducer for parquet 57.1.0 bug where Mask selection strategy tries to reads skipped pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages