-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Consolidate statistics merging code (try 2) #15661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .collect() | ||
| } | ||
|
|
||
| /// Set the number of rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made these methods to reduce the boiler plate in the examples (otherwise I had to explicitly provide Precision::Absent for every field. This let's the examples be simpler)
| /// | ||
| /// # Example | ||
| /// ``` | ||
| /// # use datafusion_common::{ColumnStatistics, ScalarValue, Statistics}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is an example showing how merge works
| } | ||
|
|
||
| /// Set the null count | ||
| pub fn with_null_count(mut self, null_count: Precision<usize>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, adding these setters to make it easier (less verbose) to write examples and tests
| } | ||
|
|
||
| #[test] | ||
| fn test_try_merge_basic() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests were moved from datasource
| } | ||
|
|
||
| #[test] | ||
| fn test_try_merge_mismatched_size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mismatched size is a new test
| } | ||
|
|
||
| /// Generic function to compute statistics across multiple items that have statistics | ||
| fn compute_summary_statistics<T, I>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this PR is to remove this function and replace it with Statistics::try_merge_iter
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are moved
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the PR still has the issue that is mentioned here: xudong963#5 (comment).
But from the comment: #15503, I think you plan to fix the issue after merging the PR.
Thank you
Yes I think you are right -- however I don't think this PR makes the problem worse (or better). I have filed this ticket to track: @xudong963 do you have time to look at that one? |
Sure, sorry for the late reply, I had a headache this weekend, so off my computer. |
|
I hope you feel better! |
Which issue does this PR close?
Precisioncombination code in favor ofPrecision::min/max/add#15659statistics_by_partition APIto ExecutionPlan #15503statistics_by_partitionAPI toExecutionPlan#15495Rationale for this change
As @xudong963 works on getting statistics code into better shape we have to move the code to combine multiple
Statisticsobjects (compute_summary_statistics) and make it public so it can be reused (see #15503)While we are doing this, lets consolidate the functionality into
Statisticswhere it isWhat changes are included in this PR?
Statistics::merge_iterStatistics::mergecompute_summary_statisticsAre these changes tested?
Yes, by existing tests and newly added doc test and unit test
Are there any user-facing changes?