Skip to content

Keep track of common sub-expression across logical plan nodes #9576

@mustafasrepo

Description

@mustafasrepo

Is your feature request related to a problem or challenge?

No response

Describe the solution you'd like

Currently, common CommonSubexprEliminate LogicalPlan optimizer rule analyzes common sub-expressions in a query. Then caches, common sub-expression by adding a LogicalPlan::Projection if it thinks this is beneficial.
As an example, following query

SELECT c3+c4, SUM(c3+c4) OVER(order by c3+c4)
FROM t

generates following LogicalPlan:

Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 AS Int64)) ORDER BY [t.c3 + t.c4t.c4t.c3 AS t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS SUM(t.c3 + t.c4) ORDER BY [t.c3 + t.c4 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
----Projection: t.c3 + t.c4 AS t.c3 + t.c4t.c4t.c3, t.c3, t.c4
------TableScan: t projection=[c3, c4]

where t.c3+t.c4 is calculated once in the Projection then referred by subsequent WindowAggr as a column.

However, following query:

SELECT c3+c4, SUM(c3+c4) OVER()
FROM t

generates following LogicalPlan:

Projection: t.c3 + t.c4, SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
--WindowAggr: windowExpr=[[SUM(CAST(t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
----TableScan: t projection=[c3, c4]

instead we could generate following plan:

Projection: col(t.c3 + t.c4), SUM(t.c3 + t.c4) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
--WindowAggr: windowExpr=[[SUM(CAST(col(t.c3 + t.c4) AS t.c3 + t.c4 AS Int64)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
----Projection: t.c3 + t.c4 AS col(t.c3 + t.c4)
------TableScan: t projection=[c3, c4]

If were to keep track of common sub expression counts globally across different nodes in the LogicalPlan. This will enable us to generate better LogicalPlans.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions