Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Literal 0/1 percentiles don’t need percentile buffering; using min/max keeps results identical.

What changes are included in this PR?

  • Add a simplify hook so percentile_cont(..., 0|1) rewrites to min/max, preserving distinct/filter/null handling and casting ints to Float64.
  • Add targeted tests for the rewrite and for the no‑rewrite path.

Are these changes tested?

Added tests

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Nov 20, 2025
}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add some tests in sqllogictest https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest

It should run some SQL queries that this optimization is applicable, and we first ensure the result is expected, and also do a EXPLAIN to ensure such optimization is applied.

In fact, we can move most of the test coverage to sqllogictests, instead of unit tests here. The reason is:

  1. SQL tests are simpler to maintain
  2. The SQL interface is more stable, while internal APIs may change frequently. As a result, good test coverage here can easily get lost during refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the unit tests along with the new sql test in the sqllogictest. Should I remove the unit tests or is it okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the unit tests if they duplicate the sqllogictests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the unit tests if they duplicate the sqllogictests

+1 unless there are something can't be covered by slt tests

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 20, 2025
kumarUjjawal and others added 3 commits November 20, 2025 21:06
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up. Have a few suggestions to simplify the code

}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the unit tests if they duplicate the sqllogictests

kumarUjjawal and others added 3 commits November 21, 2025 13:29
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
@Jefffrey Jefffrey added this pull request to the merge queue Nov 29, 2025
@Jefffrey
Copy link
Contributor

Thanks @kumarUjjawal

Merged via the queue into apache:main with commit 73562e8 Nov 29, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simplify percentile_cont to min/max when percentile is 0 or 1

4 participants