You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed that folks who start hacking on ESQL struggle with some unwritten invariants, and understandably so. We should make them explicit so that tests immediately start to fail when we accidentally deviate from them, and so it's clear what assumptions can be made at various steps in the planner/optimizer pipeline. Asserts should be sufficient to avoid degrading Production performance.
We need to think about what the invariants are and how to expose them. Here's a few I can think of:
After the substitutions batch, Aggregate nodes should have only simple agg functions and references to groupings in its aggregates, whereas the groupings must be simple attributes. There should also be no duplicate agg functions (over the same field, with the same parameters and filter).
There are more invariants like this that hold after the substitutions have run - ideally, each LogicalPlan node should expose such invariants via a method, say postSubstitutionInvariantCheck() (and we could have more invariant checks for the different stages of planning, if needed).
There are some logical plan nodes that only ever occur during parsing, but disappear after analysis. Maybe we should mark them with an interface, or just implement an invariant check which is "just doesn't exist after analysis". This way folks would know that they don't have to account for such plan nodes during optimization. Examples are Row, Drop, Rename and LookupJoin.
After the operator optimization batch is done, all foldable expressions in the plan must be folded, so that we can assume they're Literals. At least, we can check that in the post optimization verification that we already have (where the plan consistency is also checked).
We could adjust our LogicalPlanOptimizerTests (and friends, that is, the physical tests and both local variants) to automatically perform some additional checks that ensure we optimize correctly. At a minimum:
We only perform a consistency check after the whole optimizer was run, which allows individual rules to output inconsistent query plans that accidentally get fixed by other rules. (Happened e.g. with ResolveUnionTypes and was undiscovered for forever!) During tests, we should just check plan consistency after every rule application.
On multiple accounts, we failed to consider that some Project nodes may just not be able to be pulled up to the top of the plan (because new nodes, such as MvExpand or Join cannot (yet) be pushed down past them). We can inject random, no-op Projects during optimization and check that the rule still produces a consistent plan after being applied.
Finally, there are some more subtle optimization invariants that would be good to enforce:
Optimizations in the operators() batch should generally not depend on the order of the optimizer rules. If an optimizer rule is moved to a random place in the list, the optimizer should still output a consistent plan with the same output as before. (Although it may have been optimized differently.)
Most operators() optimizer rules should also be optional: if we switch them off, we at least shouldn't create inconsistent plans, and the resulting query plan should still have the same .output().
We noticed that folks who start hacking on ESQL struggle with some unwritten invariants, and understandably so. We should make them explicit so that tests immediately start to fail when we accidentally deviate from them, and so it's clear what assumptions can be made at various steps in the planner/optimizer pipeline. Asserts should be sufficient to avoid degrading Production performance.
We need to think about what the invariants are and how to expose them. Here's a few I can think of:
aggregates, whereas thegroupingsmust be simple attributes. There should also be no duplicate agg functions (over the same field, with the same parameters and filter).postSubstitutionInvariantCheck()(and we could have more invariant checks for the different stages of planning, if needed).Row,Drop,RenameandLookupJoin.Literals. At least, we can check that in the post optimization verification that we already have (where the plan consistency is also checked).We could adjust our LogicalPlanOptimizerTests (and friends, that is, the physical tests and both local variants) to automatically perform some additional checks that ensure we optimize correctly. At a minimum:
Finally, there are some more subtle optimization invariants that would be good to enforce:
operators()batch should generally not depend on the order of the optimizer rules. If an optimizer rule is moved to a random place in the list, the optimizer should still output a consistent plan with the same output as before. (Although it may have been optimized differently.)operators()optimizer rules should also be optional: if we switch them off, we at least shouldn't create inconsistent plans, and the resulting query plan should still have the same.output().