Skip to content

Spark: Disable min/max aggregation push down for string under any mode#16320

Merged
huaxingao merged 1 commit into
apache:mainfrom
hantangwangd:disable_min_max_aggregate_pushdown_for_string
May 13, 2026
Merged

Spark: Disable min/max aggregation push down for string under any mode#16320
huaxingao merged 1 commit into
apache:mainfrom
hantangwangd:disable_min_max_aggregate_pushdown_for_string

Conversation

@hantangwangd

Copy link
Copy Markdown
Contributor

The metrics mode recorded in an Iceberg table reflects only the current state. That is to say, the mode may have been changed in the past, and when data was actually written earlier, the statistics were processed according to the metrics mode at that time. For example, the metrics mode might have been truncate(5) when the data was written, and later changed to all. In such a case, deciding that min/max aggregation on string columns can be pushed down based on the current metrics mode may lead to incorrect results. Refer to the test case in this PR for details.

@github-actions github-actions Bot added the spark label May 13, 2026
@hantangwangd hantangwangd marked this pull request as ready for review May 13, 2026 18:04
@hantangwangd

Copy link
Copy Markdown
Contributor Author

Hi @huaxingao, could you take a look at this fix when you have a chance? Thanks a lot!

@huaxingao huaxingao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix.

@huaxingao huaxingao merged commit 721980d into apache:main May 13, 2026
27 checks passed
@hantangwangd hantangwangd deleted the disable_min_max_aggregate_pushdown_for_string branch May 13, 2026 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants