Describe the enhancement requested
ParquetFileFragment::EvaluateStatisticsAsExpression filters parquet file with parquet statistics, the function is listed below:
|
if (statistics.num_values() == 0 && statistics.null_count() > 0) { |
statistics.null_count() is used here, however, there're merely case when !statistics.HasNullCount(). So this function should check statistics.HasNullCount() before using that
!statistics.HasNullCount() is merely happens, since parquet-java and parquet-c++ always writes this even when null-count == 0. However, parquet-rs previously don't write it when count == 0 . And maybe some legacy file without this.
So as a result, we need check !statistics.HasNullCount() here
Component(s)
C++, Parquet