Skip to content

[RAS] GroupLeafExec does not preserve outputPartitioning #11468

@baibaichen

Description

@baibaichen

Description

GroupLeafExec in RAS (Ras-based Adaptive Search optimizer) always returns UnknownPartitioning for outputPartitioning, which causes incorrect behavior when spark.sql.unionOutputPartitioning=true (default in Spark 4.1).

Background

Spark 4.1 introduced apache/spark#51623 which allows UnionExec to preserve child partitioning. When all children have identical partitioning, Spark's optimizer trusts this information and may omit downstream Exchange operators.

However, in RAS, GroupLeafExec does not preserve the outputPartitioning from its wrapped plan, always returning UnknownPartitioning. This breaks the partitioning contract and can lead to incorrect query results.

Workaround

Disable the feature by setting spark.sql.unionOutputPartitioning=false when using RAS.

Proposed Solution

GroupLeafExec should override outputPartitioning to return the correct partitioning from the underlying plan.

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions