-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
At the time of writing , DataFusion has three partially complete range/interval analysis implementations:
Range analysis used in cardinality estimation
https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/trait.PhysicalExpr.html#method.analyze
https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/struct.AnalysisContext.html
(cc @isidentical)
Pruning Predicate
https://docs.rs/datafusion/18.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html
(creates an expr and then evaluates it against statistics)
Intervals
https://github.com/apache/arrow-datafusion/blob/6a8c4a60114c7132de6d2fcd6d44111cf40b3df9/datafusion/physical-expr/src/intervals/mod.rs#L24
Introduced by @metesynnada and @ozankabak in #5322
The modules are all structured a differently / have different implementations but what they are computing (interval / range analysis) is basically the same and they all operate on PhysicalExprs.
Since there are three implementations, as we add new PhysicalExprs either the work must be triplicated or else (more likely) the implementations will keep diverging in functionality.
In addition it is hard to increase the scope of coverage when the logic is in different places
Describe the solution you'd like
I would like a single implementation of range analysis used for all three uses in DataFusion. The one introduced by @metesynnada and @ozankabak in #5322 seems to have the best documentation and theoretical foundation, but the other two are more functionally complete.
I don't have a preference for which approach is taken, other than that we end up with a single implementation that is easier to maintain
Describe alternatives you've considered
Additional context
I believe @ozankabak said they may have plans in this area (#5322 (comment)) but I am not sure
cc @mingmwang