Parquet - Pagination

Parquet pagination can be a tricky one since reader only knows where row groups are starting/ending. 
However with properly configured row group/page sizes, we should be able to read group/page metadata from parquet file schema and estimate +/- even for massive files where to start reading. 

Obviously bigger row groups would require more memory but in general making a row group big is affecting memory consumption. 
Big row groups are perfect for Spark or Hadoop since all calculations are happening in memory which saves I/O. However in case of PHP, smaller groups and smaller pages can help to keep memory consumption under control. Additional I/O related to reading more pages/groups from disk is not a big deal. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Parquet - Pagination #934

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parquet - Pagination #934

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions