Skip to content

Conversation

@Tim-53
Copy link
Contributor

@Tim-53 Tim-53 commented Nov 13, 2025

Since this is still a draft, I’d love feedback on whether I’m going in the right direction here and if the operator overloading is implemented in a way that aligns with the project’s conventions, or if there’s a better pattern to use.

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Adds support for passing an array of percentile values to percentile_cont
  • Enhances return-type inference:
  • Scalar percentile → returns a scalar numeric value (existing behavior preserved).
  • Array percentile → returns a List of numeric values.

Are these changes tested?

Tests will be added before finalizing the PR

Are there any user-facing changes?

Yes:

  • percentile_cont now accepts arrays of percentile values.
  • When an array is supplied, the function returns a list of values instead of a scalar.

Documentation will be updated once the API is finalized.

This change is backwards compatible and should not(?) be considered a API change.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Nov 13, 2025
Signed-off-by: Tim Trost <82676248+Tim-53@users.noreply.github.com>
) -> Result<ApproxPercentileAccumulator> {
let percentile =
validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?;
validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made purely to make percentile_cont compile and behave correctly.
I'm not yet sure whether approx_percentile_cont should be enhanced in the same way; that can be addressed in a separate issue.

Comment on lines 146 to 155
let list_f64 =
DataType::List(Arc::new(Field::new("item", DataType::Float64, true)));

for num in NUMERICS {
// Accept any numeric value paired with a float64 percentile
variants.push(TypeSignature::Exact(vec![num.clone(), DataType::Float64]));
//Accept any numeric value paired with a list of float64 percentiles
variants.push(TypeSignature::Exact(vec![num.clone(), list_f64.clone()]));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this might not be ideal, but I don't think our current signature API does have a clean way of representing what we want here 🤔

Might need some further work on the signature API to make this nicer (not to mention hopefully cleaning up the usage of NUMERICS here, which I've been slowly working on #18092)

Perhaps we go with the user_defined approach for now 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye this part felt a little hacky to implement, but i diden't find a better way to express this.
Just to confirm: you're suggesting using TypeSignature::UserDefined instead of enumerating the Exact variants, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep; in future can look into refactoring away from user_defined once signature API has better support, but for now I think it'll be easier to use user_defined here

Some(values[0])
} else if percentile == 0.0 {
vec![Some(values[0]); percentiles.len()]
} else if percentiles[0] == 0.0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to consider length of percentiles

Comment on lines 848 to 850
percentiles
.iter()
.map(|percentile| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic might be inefficient; I don't know how postgres does it but perhaps they just sort the values once up front allowing each percentile to be calculated O(1) on the sorted output.

Maybe we need a separate path for when we have a single percentile (do it the old way) and multiple percentiles (sort input then calculate all percentiles at once)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually what i did in 27243fd

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array argument for percentile_cont aggregate function

2 participants