Skip to content

Consolidate 3 Range Analysis / Interval implementations (cost model, pruning predicates, interval analysis) #5535

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

At the time of writing , DataFusion has three partially complete range/interval analysis implementations:

Range analysis used in cardinality estimation

https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/trait.PhysicalExpr.html#method.analyze
https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/struct.AnalysisContext.html
(cc @isidentical)

Pruning Predicate

https://docs.rs/datafusion/18.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html
(creates an expr and then evaluates it against statistics)

Intervals

https://github.com/apache/arrow-datafusion/blob/6a8c4a60114c7132de6d2fcd6d44111cf40b3df9/datafusion/physical-expr/src/intervals/mod.rs#L24
Introduced by @metesynnada and @ozankabak in #5322

The modules are all structured a differently / have different implementations but what they are computing (interval / range analysis) is basically the same and they all operate on PhysicalExprs.

Since there are three implementations, as we add new PhysicalExprs either the work must be triplicated or else (more likely) the implementations will keep diverging in functionality.

In addition it is hard to increase the scope of coverage when the logic is in different places

Describe the solution you'd like
I would like a single implementation of range analysis used for all three uses in DataFusion. The one introduced by @metesynnada and @ozankabak in #5322 seems to have the best documentation and theoretical foundation, but the other two are more functionally complete.

I don't have a preference for which approach is taken, other than that we end up with a single implementation that is easier to maintain

Describe alternatives you've considered

Additional context

I believe @ozankabak said they may have plans in this area (#5322 (comment)) but I am not sure

cc @mingmwang

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions