-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A classic part of query optimization is algebraic transformations such as partially evaluating expressions once at plan time rather than over and over for each row during execution time.
For example, a predicate such as where time < date_trunc('2021-10-04Z10:12:13', 'year') can be rewritten to where time < '2021-01-01Z00:00:00' which both saves many redundant evaluations of the date_trunc functions and also unlocks additional optimizations such as parquet row group pruning and using constant comparison kernels.
DataFusion has a basic constant folding implementation here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/constant_folding.rs
However, as implemented, it has a few drawbacks:
- It only covers a few algebraic transformations (like boolean algebra and
now()expansion) - It effectively is a second implementation of expression evaluation so
2a. As new expression support is added, it would also need to be added to constant_folding.rs
2b. It runs the risk of producing different answers than if the expression had been evaluated at runtime - It does not handle functions and sorting out how to support it for user defined functions would be non trivial
Describe the solution you'd like
Reuse the existing expression evaluation framework (namely PhysicalExpr::evaluate and everything in physical_plan/expressions) to implement constant folding.
This would be beneficial because:
- All current and future expression types could be evaluated (including user defined functions)
- It would allow more sophisticated expression transformations / optimziations such as WIP: Extended Tokomak optimizer #1066
The high level idea would be to walk the Expr tree bottom up, and if a subtree contained only constants (and non volitalie functions #1069) create and run a PhysicalExpr to produce a single value, and then replace the subtree with that appropriate constant.
Describe alternatives you've considered
I think it is possible to implement expression evaluation as a set of rewrite rules (as is partially done in #1066) but that still has the downside that the behavior can deviate from the actual expression evaluation in PhysicalExpr
Additional context