Skip to content

Conversation

@vim89
Copy link

@vim89 vim89 commented Jul 28, 2025

Which issue does this PR close?

Rationale for this change

EmptyExec previously returned "unknown" statistics for both global and partition-level queries, which impacts planner accuracy. This change provides exact-zero stats for EmptyExec, aligning with how other operators like AggregateExec and PlaceholderRowExec handle partition statistics.

What changes are included in this PR?

  • Implemented partition_statistics(...) for EmptyExec:
    • Sets num_rows and total_byte_size to Precision::Exact(0)
    • Populates column_statistics with one ColumnStatistics::new_unknown() per schema field
  • Added unit test empty_multi_partition_statistics:
    • Verifies default 1-partition behavior: global and partition 0 return zero stats, invalid partition errors
    • Verifies 2-partition behavior: partitions 0 and 1 return zero stats, invalid partition errors
  • Added a TreeRender branch to DisplayAs to provide informative display of the operator:
    "EmptyExec: partitions=X, fields=Y"
  • No other execution plans or tests were modified

Are these changes tested?

Yes.

  • New test covers all relevant edge cases.
  • The full test suite still passes without regressions.

Are there any user-facing changes?

No.
This change affects internal planner behavior only; no public APIs or outputs are changed.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jul 28, 2025
alamb
alamb previously approved these changes Jul 29, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @vim89

cc @xudong963

@vim89
Copy link
Author

vim89 commented Jul 30, 2025

@alamb @xudong963 Added known statistics values. Please review

@vim89 vim89 requested a review from alamb July 30, 2025 08:07
Ok(())
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find the partition statistics in the dedicated file: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/physical_optimizer/partition_statistics.rs.

It also contains the real exection to check the results

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well, thank you & apologies for oversight.
Let me run integration tests and align column stats for values instead of Absent.

@alamb
Copy link
Contributor

alamb commented Aug 4, 2025

@xudong963 does this one look good to you now?

@vim89
Copy link
Author

vim89 commented Aug 5, 2025

@xudong963 does this one look good to you now?

@alamb I'm working on using column stats for values instead of Absent. Then Let me run integration tests

@alamb alamb marked this pull request as draft August 5, 2025 10:39
@alamb
Copy link
Contributor

alamb commented Aug 5, 2025

I started the tests and marked the PR as a draft. Please mark it as ready for review when it is ready for another look

@alamb alamb dismissed their stale review August 5, 2025 10:40

Still in progress

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Oct 19, 2025
@github-actions github-actions bot removed the Stale PR has not had any activity for some time label Oct 20, 2025
@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Dec 19, 2025
@Jefffrey Jefffrey closed this Dec 30, 2025
@Jefffrey
Copy link
Contributor

Feel free to reopen if it becomes active again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement partition_statistics API for more operators

4 participants