Skip to content

Error: missing required field ColumnIndex.null_pages when loading page indexes #6464

@alamb

Description

@alamb

Describe the bug
If the ParquetMetadataReader tries to read metadata written by ParquetMetaDataWriter without first loading the page indexes, you get an error like "missing required field ColumnIndex.null_pages"

Nite this depends on #6463

To Reproduce
The full reproducer is in #6463. Here is the relevant piece

        let parquet_bytes = create_parquet_file();

        // read the metadata from the file WITHOUT the page index structures
        let original_metadata = ParquetMetaDataReader::new()
            .parse_and_finish(&parquet_bytes)
            .unwrap();

        // read metadata back from the serialized bytes requesting to read the offsets
        let metadata_bytes = metadata_to_bytes(&original_metadata);
        let roundtrip_metadata = ParquetMetaDataReader::new()
            .with_page_indexes(true) // there are no page indexes in the metadata
            .parse_and_finish(&metadata_bytes)
            .unwrap(); // <******* This fails

Expected behavior
The reader should not error

I am not sure if the right fix is to

  1. change the ParquetMetadataWriter to clear the index offset fields befor writing them
  2. change the ParquetMetadataReader to ignore bogus offsets
  3. SOmething else

Additional context
@etseidl has added the APIs in #6431

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions