-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Describe the bug
If the ParquetMetadataReader tries to read metadata written by ParquetMetaDataWriter without first loading the page indexes, you get an error like "missing required field ColumnIndex.null_pages"
Nite this depends on #6463
To Reproduce
The full reproducer is in #6463. Here is the relevant piece
let parquet_bytes = create_parquet_file();
// read the metadata from the file WITHOUT the page index structures
let original_metadata = ParquetMetaDataReader::new()
.parse_and_finish(&parquet_bytes)
.unwrap();
// read metadata back from the serialized bytes requesting to read the offsets
let metadata_bytes = metadata_to_bytes(&original_metadata);
let roundtrip_metadata = ParquetMetaDataReader::new()
.with_page_indexes(true) // there are no page indexes in the metadata
.parse_and_finish(&metadata_bytes)
.unwrap(); // <******* This failsExpected behavior
The reader should not error
I am not sure if the right fix is to
- change the ParquetMetadataWriter to clear the index offset fields befor writing them
- change the ParquetMetadataReader to ignore bogus offsets
- SOmething else