[data] Combine repartitions#59145
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new logical optimization rule, CombineRepartitions, which effectively merges consecutive Repartition or StreamingRepartition operators into a single one. This is a valuable optimization that can simplify the logical plan and potentially improve performance by reducing overhead from redundant repartitioning steps. The logic correctly handles the shuffle parameter for Repartition and target_num_rows_per_block for StreamingRepartition when fusing operators. The integration into the _LOGICAL_RULESET is also correctly done.
| if transformed_dag is original_dag: | ||
| return plan | ||
|
|
||
| # TODO replace w/ Plan.copy |
There was a problem hiding this comment.
The TODO comment here indicates a known area for improvement. Implementing a Plan.copy method would provide a more robust and idiomatic way to create a new LogicalPlan instance after transformations, ensuring all relevant attributes are correctly propagated and maintaining the immutability pattern consistently across the plan hierarchy. This is a good future enhancement.
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
## Description A Logical Optimizer rule to combine consecutive repartitions. If 2 repartitions are consecutive, take the last one. If one of them performs an all gather shuffle, then the fused one should also shuffle too. ## Related issues None ## Additional information None --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
A Logical Optimizer rule to combine consecutive repartitions. If 2 repartitions are consecutive, take the last one. If one of them performs an all gather shuffle, then the fused one should also shuffle too.
Related issues
None
Additional information
None