In the substitutions batch of our LogicalPlanOptimizer, there's 4 rules that take an expression like | STATS foo = avg(x*x) + 2 and turn this into a simple aggregation with enclosing EVALs; in this example, this becomes (essentially)
| EVAL $$x = x*x
| STATS $$foo_sum = sum($$x), $$foo_count = count($$x)
| EVAL $$foo = $$foo_sum/$$foo_count, foo = $$foo + 2
| KEEP foo
This is becoming complicated and more difficult to argue about due to the substitutions happening in 4 rules; let's see if we can do with just 2 rules.
More specifically,
ReplaceStatsNestedExpressionWithEval turns STATS avg(x*x) + 2 into EVAL $$x = x*x | STATS foo = avg($$x) + 2.
ReplaceStatsAggExpressionWithEval then turns | STATS foo = avg($$x) + 2 into | STATS $$foo = avg($$x) | EVAL foo = $$foo + 2
SubstituteSurrogates replaces | STATS $$foo = avg($$x) by | STATS $$foo_sum = sum($$x), $$foo_count = count($$x) | EVAL $$foo = $$foo_sum/$$foo_count
- Then we run
ReplaceStatsNestedExpressionWithEval again to account for stuff that happened in TranslateMetricsAggregate
It makes sense that there's 1 rule that creates EVALs after the aggregation (ReplaceStatsNestedExpressionWithEval) and one that pulls nested expressions out of agg functions into an EVAL before the aggregation (ReplaceStatsAggExpressionWithEval).
In the substitutions batch of our LogicalPlanOptimizer, there's 4 rules that take an expression like
| STATS foo = avg(x*x) + 2and turn this into a simple aggregation with enclosingEVALs; in this example, this becomes (essentially)This is becoming complicated and more difficult to argue about due to the substitutions happening in 4 rules; let's see if we can do with just 2 rules.
More specifically,
ReplaceStatsNestedExpressionWithEvalturnsSTATS avg(x*x) + 2intoEVAL $$x = x*x | STATS foo = avg($$x) + 2.ReplaceStatsAggExpressionWithEvalthen turns| STATS foo = avg($$x) + 2into| STATS $$foo = avg($$x) | EVAL foo = $$foo + 2SubstituteSurrogatesreplaces| STATS $$foo = avg($$x)by| STATS $$foo_sum = sum($$x), $$foo_count = count($$x) | EVAL $$foo = $$foo_sum/$$foo_countReplaceStatsNestedExpressionWithEvalagain to account for stuff that happened inTranslateMetricsAggregateIt makes sense that there's 1 rule that creates
EVALs after the aggregation (ReplaceStatsNestedExpressionWithEval) and one that pulls nested expressions out of agg functions into anEVALbefore the aggregation (ReplaceStatsAggExpressionWithEval).SubstituteSurrogatesshould only substitute and letReplaceStatsNestedExpressionWithEvalhandle creating theEVALafter theSTATS.ReplaceStatsNestedExpressionWithEvalafterTranslateMetricsAggregate