Skip to content

[C++][Parquet] Dataset: ParquetFileFragment::EvaluateStatisticsAsExpression should better checks Statistics::HasNullCount #43712

@mapleFU

Description

@mapleFU

Describe the enhancement requested

ParquetFileFragment::EvaluateStatisticsAsExpression filters parquet file with parquet statistics, the function is listed below:

if (statistics.num_values() == 0 && statistics.null_count() > 0) {

statistics.null_count() is used here, however, there're merely case when !statistics.HasNullCount(). So this function should check statistics.HasNullCount() before using that

!statistics.HasNullCount() is merely happens, since parquet-java and parquet-c++ always writes this even when null-count == 0. However, parquet-rs previously don't write it when count == 0 . And maybe some legacy file without this.

So as a result, we need check !statistics.HasNullCount() here

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions