Support expressions with window functions#37848
Conversation
|
@Mergifyio update |
✅ Branch has been successfully updated |
| template <typename Visitor, typename T = ASTPtr> | ||
| struct InDepthNodeVisitorWithChildInfo : Visitor | ||
| { | ||
| using ChildInfo = typename Visitor::ChildInfo; |
There was a problem hiding this comment.
Why do we need special visitor for this, can't we use regular one and store the info in the data?
There was a problem hiding this comment.
I want explicitly show that:
- Node processing result depends on the results of children nodes.
- Children are processed completely independently.
Datarepresents global visitor values.
There was a problem hiding this comment.
For example, GetAggregatesVisitor could be implemented in mode efficient way using this approach.
Instead of traversing query AST for several times in getAggregates and getWindowFunctions functions, we can collect functions and check correctness on the fly if we know independent results for child nodes.
| public: | ||
| using Visitor = ConstInDepthNodeVisitor<ActionsMatcher, true>; | ||
|
|
||
| enum class WindowDependancyState |
There was a problem hiding this comment.
Why not just bool depends_on_window ?
There was a problem hiding this comment.
It's not affecting performance in any way, but IMHO makes code more readable. This variable describes the current state of the visitor, so enum represents it more straightforwardly than boolean.
| data.window_function_in_subtree = true; | ||
| return; | ||
| } | ||
| else if (node.compute_after_window_functions) |
There was a problem hiding this comment.
So, this flags subtree_contains_window_call, compute_after_window_functions, window_function_in_subtree, window_dependancy_state are required just to change the order of visiting the nodes, are they? It looks like this PR's trickiest part, but it just does visit.
There was a problem hiding this comment.
Yes, that's the most important part of this PR. Let's consider some general expression with a window function:
f(g(arg) OVER window_name)
Evaluation of this expression must be split to 3 separate parts:
- Evaluate all arguments of window functions (
argin the example) - Evaluate window functions. It's important to notice, that any window function can't have another window function as an argument. (In the example it's
g(arg) OVER window_name) - Evaluate expressions with window functions (in the example we will compute
fcall value). We can do this way because we are sure thatfis not:- Aggregate function (aggregation is executed before window functions, so I can't have a window function as an argument);
- Window function.
In other words, here we have some kind of topological sort.
| ExpressionActionsChain::Step & step = chain.lastStep(columns_after_window); | ||
| for (const auto & expression : syntax->expressions_with_window_function) | ||
| { | ||
| getRootActionsForWindowFunctions(expression->clone(), true, step.actions()); |
There was a problem hiding this comment.
So, generally, most of the work is done here. Have I got right that it's the part that drives the calculation of expressions for window functions?
There was a problem hiding this comment.
Yes, here we create actions for expression evaluation.
|
|
||
| bool is_window_function = false; | ||
|
|
||
| bool compute_after_window_functions = false; |
There was a problem hiding this comment.
AST Looks like not the best place for this flag, but we already have is_window_function here, so it seems now there's no easy way to get rid of it. And we need actually to determine which expressions are related to the window in WindowExpressionsCollectorChildInfo that also operates with AST, though
There was a problem hiding this comment.
In my opinion, it's completely okay to cache some useful data in AST. Instead of that we usually recompute the same values several times.
The only part that should be considered is that we don't want to make AST nodes too big. In this case, it's not a problem, because I just added a simple boolean flag (in the opposite situation we could store a pointer to a struct with cached values).
|
@Mergifyio update |
✅ Branch has been successfully updated |
Unrelated. |
|
@novikd, thanks for working on this functionality. SELECT sum(number) OVER() + 1 AS result FROM numbers(10);
SELECT arraySum(groupArray(number) OVER()) AS result FROM numbers(10);
SELECT sum(number) OVER() + avg(number) OVER() AS result FROM numbers(10);now works. But it is not working for array operations with lambda expressions: SELECT arrayMap(x -> x + 1, groupArray(number) OVER()) AS result FROM numbers(10);
Code: 183. DB::Exception: Unexpected lambda expression: While processing arrayMap(x -> (x + 1), groupArray(number) OVER ()) AS result. (UNEXPECTED_EXPRESSION) (version 22.7.1.710 (official build))And not working in case of nested window functions: SELECT sum(avg(number) OVER()) OVER() AS result FROM numbers(10);
Code: 184. DB::Exception: Window function avg(number) OVER () is found inside another window function in query: While processing avg(number) OVER (). (ILLEGAL_AGGREGATION) (version 22.7.1.710 (official build)) |
It's possible to have nested WINDOW functions? |
|
@vpanfilov Thank you for testing the feature. I'll take a look at higher-order functions.
SQL standard defines nested window functions as a completely different entity. The query you wrote is not supposed to work. If you want, you can use GROUP BY to fix this query. Nested window functions specified in the SQL standard require specific <row_marker> usage. As far as I know, it's not been supported in ClickHouse yet. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support expressions with window functions. Closes #19857