Skip to content

ESQL: Refactor STATS substitution optimizer rules #110345

@alex-spies

Description

@alex-spies

In the substitutions batch of our LogicalPlanOptimizer, there's 4 rules that take an expression like | STATS foo = avg(x*x) + 2 and turn this into a simple aggregation with enclosing EVALs; in this example, this becomes (essentially)

| EVAL $$x = x*x
| STATS $$foo_sum = sum($$x), $$foo_count = count($$x)
| EVAL $$foo = $$foo_sum/$$foo_count, foo = $$foo + 2
| KEEP foo

This is becoming complicated and more difficult to argue about due to the substitutions happening in 4 rules; let's see if we can do with just 2 rules.

More specifically,

  1. ReplaceStatsNestedExpressionWithEval turns STATS avg(x*x) + 2 into EVAL $$x = x*x | STATS foo = avg($$x) + 2.
  2. ReplaceStatsAggExpressionWithEval then turns | STATS foo = avg($$x) + 2 into | STATS $$foo = avg($$x) | EVAL foo = $$foo + 2
  3. SubstituteSurrogates replaces | STATS $$foo = avg($$x) by | STATS $$foo_sum = sum($$x), $$foo_count = count($$x) | EVAL $$foo = $$foo_sum/$$foo_count
  4. Then we run ReplaceStatsNestedExpressionWithEval again to account for stuff that happened in TranslateMetricsAggregate

It makes sense that there's 1 rule that creates EVALs after the aggregation (ReplaceStatsNestedExpressionWithEval) and one that pulls nested expressions out of agg functions into an EVAL before the aggregation (ReplaceStatsAggExpressionWithEval).

  • However, SubstituteSurrogates should only substitute and let ReplaceStatsNestedExpressionWithEval handle creating the EVAL after the STATS.
  • We should check if we can somehow do without a second run of ReplaceStatsNestedExpressionWithEval after TranslateMetricsAggregate

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions