Skip to content

Support multiple order aware aggregate functions in a query #8582

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Today DataFusion supports three aggregate functions that can be "order aware": ARRAY_AGG, FIRST_VALUE and LAST_VALUE. This means that you can supply a ORDER BY clause to their argument, for example FIRST_VALUE(x ORDER BY time).

Today, there be only one single order specified across ALL order aware aggregate functions

For example

❯ create table t(x int, y int) as values (1, 1), (1, 2), (1, 1), (1, 4), (2, 20), (2, 10);;
0 rows in set. Query took 0.003 seconds.

❯ select x, first_value(x ORDER BY y) from t GROUP BY x;
+---+------------------+
| x | FIRST_VALUE(t.x) |
+---+------------------+
| 2 | 2                |
| 1 | 1                |
+---+------------------+
2 rows in set. Query took 0.004 seconds.

❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC) from t GROUP BY x;
+---+------------------+-----------------+
| x | FIRST_VALUE(t.x) | LAST_VALUE(t.x) |
+---+------------------+-----------------+
| 1 | 1                | 1               |
| 2 | 2                | 2               |
+---+------------------+-----------------+
2 rows in set. Query took 0.004 seconds.

❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC NULLS LAST) from t GROUP BY x;
This feature is not implemented: Conflicting ordering requirements in aggregate functions is not supported

Describe the solution you'd like

There are a few designs proposed here: #8558 (comment)

We are working on a more detailed proposal

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions