-
Notifications
You must be signed in to change notification settings - Fork 8.3k
TOTALS, BY and ORDER BY combinators for aggregate functions #34156
Description
Use case
Provide more convenient and possibly more efficient way to write queries with a non-standard syntax as an alternative to window functions when you need to calculate subtotal aggregations in a query or you need to pass values into aggregate function in specific order.
SELECT country, city, avg(salary), avg(salary BY country), avg(salary TOTAL) FROM table GROUP BY country, city
Describe the solution you'd like
Allow to write aggregate functions like this:
groupArray(x ORDER BY y)
agg(x, y ORDER BY a ASC, b DESC)
agg(x, y ORDER BY x LIMIT 10)
agg(x, y TOTALS)
SELECT a, b, c, agg(x), agg(x, y BY a, b), agg(x, y BY a), agg(x, y TOTALS) GROUP BY a, b, c
ORDER BY syntax will be converted to parametric aggregate functions:
Example:
agg(params...)(x, y ORDER BY a ASC, b DESC, x)
will be parsed as
aggOrderBy('a ASC, b DESC, x', params...)(x, y, a, b, x)
Sort description is prepended to parameters and the columns required for sorting are appended to the arguments.
-OrderBy combinator works by collecting all the passed arguments tuples into array and then sorting it. If LIMIT is specified, it can use heap (priority queue). After data is sorted, it is passed to the original aggregate function.
BY and TOTALS combinators are processed differently - the information about them will be stored in ASTFunction,
then passed to AggregateDescription and used by Aggregator.
Aggregator will calculate the aggregate function with BY or TOTALS as usual.
For example, avg(x BY a) is calculated as avg(x).
Then the resulting aggregation states are merged together according to the specified columns in BY.
Additional context
This is one of experimental tasks for students.
This task is resembling #10930 and #11235.