Skip to content

ESQL: Make planning/optimization invariants explicit via asserts/validations and tests #137362

@alex-spies

Description

@alex-spies

We noticed that folks who start hacking on ESQL struggle with some unwritten invariants, and understandably so. We should make them explicit so that tests immediately start to fail when we accidentally deviate from them, and so it's clear what assumptions can be made at various steps in the planner/optimizer pipeline. Asserts should be sufficient to avoid degrading Production performance.

We need to think about what the invariants are and how to expose them. Here's a few I can think of:

  • After analysis, the plan should be consistent in terms of plan nodes depending on attributes. This is ESQL: Verify that analyzed plan is consistent #137189.
  • After the substitutions batch, Aggregate nodes should have only simple agg functions and references to groupings in its aggregates, whereas the groupings must be simple attributes. There should also be no duplicate agg functions (over the same field, with the same parameters and filter).
  • There are more invariants like this that hold after the substitutions have run - ideally, each LogicalPlan node should expose such invariants via a method, say postSubstitutionInvariantCheck() (and we could have more invariant checks for the different stages of planning, if needed).
  • There are some logical plan nodes that only ever occur during parsing, but disappear after analysis. Maybe we should mark them with an interface, or just implement an invariant check which is "just doesn't exist after analysis". This way folks would know that they don't have to account for such plan nodes during optimization. Examples are Row, Drop, Rename and LookupJoin.
  • After the operator optimization batch is done, all foldable expressions in the plan must be folded, so that we can assume they're Literals. At least, we can check that in the post optimization verification that we already have (where the plan consistency is also checked).

We could adjust our LogicalPlanOptimizerTests (and friends, that is, the physical tests and both local variants) to automatically perform some additional checks that ensure we optimize correctly. At a minimum:

  • We only perform a consistency check after the whole optimizer was run, which allows individual rules to output inconsistent query plans that accidentally get fixed by other rules. (Happened e.g. with ResolveUnionTypes and was undiscovered for forever!) During tests, we should just check plan consistency after every rule application.
  • On multiple accounts, we failed to consider that some Project nodes may just not be able to be pulled up to the top of the plan (because new nodes, such as MvExpand or Join cannot (yet) be pushed down past them). We can inject random, no-op Projects during optimization and check that the rule still produces a consistent plan after being applied.

Finally, there are some more subtle optimization invariants that would be good to enforce:

  • Optimizations in the operators() batch should generally not depend on the order of the optimizer rules. If an optimizer rule is moved to a random place in the list, the optimizer should still output a consistent plan with the same output as before. (Although it may have been optimized differently.)
  • Most operators() optimizer rules should also be optional: if we switch them off, we at least shouldn't create inconsistent plans, and the resulting query plan should still have the same .output().

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Analytics/ES|QLAKA ESQL>testIssues or PRs that are addressing/adding testsTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)

    Type

    No fields configured for Epic.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions