Skip to content

Improve performance of min/max aggregates for Durations via GroupsAccumulator #15317

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

@svranesevic implemented basic support for Min/Max Duration types in this PR: ❤

However, this only implements the slower "Accumulator" interface.
There is a faster GroupsAccumulator that would be really nice to implement as well as it will be faster for larger numbers of groups.

Describe the solution you'd like

See Notes on Implementing GroupsAccumulator for more detail

Describe alternatives you've considered

It basically involves

  1. adding the correct Durations type in the list of supported types (the one for max is here): https://github.com/apache/datafusion/blob/42ec5109bb4c6249d8404e862c457ebd86ee0623/datafusion/functions-aggregate/src/min_max.rs#L243-L242
  2. Instantiating an accumulator here: https://github.com/apache/datafusion/blob/42ec5109bb4c6249d8404e862c457ebd86ee0623/datafusion/functions-aggregate/src/min_max.rs#L270-L269
  3. Writing some tests (perhaps add a few more rows to the test added in Support Duration in min/max agg functions #15310)

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions