-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Enhance percentile_cont to support array argument #18663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tim Trost <82676248+Tim-53@users.noreply.github.com>
| ) -> Result<ApproxPercentileAccumulator> { | ||
| let percentile = | ||
| validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?; | ||
| validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was made purely to make percentile_cont compile and behave correctly.
I'm not yet sure whether approx_percentile_cont should be enhanced in the same way; that can be addressed in a separate issue.
| let list_f64 = | ||
| DataType::List(Arc::new(Field::new("item", DataType::Float64, true))); | ||
|
|
||
| for num in NUMERICS { | ||
| // Accept any numeric value paired with a float64 percentile | ||
| variants.push(TypeSignature::Exact(vec![num.clone(), DataType::Float64])); | ||
| //Accept any numeric value paired with a list of float64 percentiles | ||
| variants.push(TypeSignature::Exact(vec![num.clone(), list_f64.clone()])); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this might not be ideal, but I don't think our current signature API does have a clean way of representing what we want here 🤔
Might need some further work on the signature API to make this nicer (not to mention hopefully cleaning up the usage of NUMERICS here, which I've been slowly working on #18092)
Perhaps we go with the user_defined approach for now 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ye this part felt a little hacky to implement, but i diden't find a better way to express this.
Just to confirm: you're suggesting using TypeSignature::UserDefined instead of enumerating the Exact variants, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep; in future can look into refactoring away from user_defined once signature API has better support, but for now I think it'll be easier to use user_defined here
| Some(values[0]) | ||
| } else if percentile == 0.0 { | ||
| vec![Some(values[0]); percentiles.len()] | ||
| } else if percentiles[0] == 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to consider length of percentiles
| percentiles | ||
| .iter() | ||
| .map(|percentile| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic might be inefficient; I don't know how postgres does it but perhaps they just sort the values once up front allowing each percentile to be calculated O(1) on the sorted output.
Maybe we need a separate path for when we have a single percentile (do it the old way) and multiple percentiles (sort input then calculate all percentiles at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually what i did in 27243fd
Since this is still a draft, I’d love feedback on whether I’m going in the right direction here and if the operator overloading is implemented in a way that aligns with the project’s conventions, or if there’s a better pattern to use.
Which issue does this PR close?
percentile_contaggregate function #18600.Rationale for this change
What changes are included in this PR?
percentile_contAre these changes tested?
Tests will be added before finalizing the PR
Are there any user-facing changes?
Yes:
percentile_contnow accepts arrays of percentile values.Documentation will be updated once the API is finalized.
This change is backwards compatible and should not(?) be considered a API change.