-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Implementing partition_statistics for EmptyExec (Issue #15873) #16941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…play of the operator: `"EmptyExec: partitions=X, fields=Y"`
…play of the operator: `"EmptyExec: partitions=X, fields=Y"`
Implementing partition_statistics for EmptyExec (Issue apache#15873)
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @vim89
cc @xudong963
Added known statistics values
|
@alamb @xudong963 Added known statistics values. Please review |
| Ok(()) | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can find the partition statistics in the dedicated file: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/physical_optimizer/partition_statistics.rs.
It also contains the real exection to check the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah well, thank you & apologies for oversight.
Let me run integration tests and align column stats for values instead of Absent.
|
@xudong963 does this one look good to you now? |
@alamb I'm working on using column stats for values instead of Absent. Then Let me run integration tests |
|
I started the tests and marked the PR as a draft. Please mark it as ready for review when it is ready for another look |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
|
Feel free to reopen if it becomes active again |
Which issue does this PR close?
partition_statisticsAPI for more operators #15873Rationale for this change
EmptyExecpreviously returned "unknown" statistics for both global and partition-level queries, which impacts planner accuracy. This change provides exact-zero stats for EmptyExec, aligning with how other operators likeAggregateExecandPlaceholderRowExechandle partition statistics.What changes are included in this PR?
partition_statistics(...)forEmptyExec:num_rowsandtotal_byte_sizetoPrecision::Exact(0)column_statisticswith oneColumnStatistics::new_unknown()per schema fieldempty_multi_partition_statistics:TreeRenderbranch toDisplayAsto provide informative display of the operator:"EmptyExec: partitions=X, fields=Y"Are these changes tested?
Yes.
Are there any user-facing changes?
No.
This change affects internal planner behavior only; no public APIs or outputs are changed.