[GLUTEN-11088][VL] Fix the Spark4.0 storage partition join#11184
[GLUTEN-11088][VL] Fix the Spark4.0 storage partition join#11184jinchengchenghh merged 8 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
fcf56d4 to
8594ab9
Compare
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
b28e232 to
381591e
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
| applyPartialClustering: Boolean, | ||
| replicatePartitions: Boolean, | ||
| joinKeyPositions: Option[Seq[Int]] = None): Seq[Seq[InputPartition]] = { | ||
| val original = batchScan.asInstanceOf[BatchScanExecShim] |
There was a problem hiding this comment.
Could you help do a refactor for the orderPartitions function, this is mostly copied from BatchScanExec::inputRDD, and inputPartitionsShim may can be replaced by inputPartitions @beliefer Thanks! https://github.com/apache/incubator-gluten/blob/49389cd05ea07356f71bfdfe660410604c1461ea/shims/spark40/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala#L60
https://github.com/apache/incubator-gluten/blob/d636fa77c49e991eb02159a0c25431eb499c6da2/shims/spark40/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AbstractBatchScanExec.scala#L142
There was a problem hiding this comment.
@jinchengchenghh would you please help to create a issue to track on this?
| }.flatMap(smj => collect(smj) { case s: ColumnarShuffleExchangeExec => s }) | ||
| } | ||
|
|
||
| private def collectShuffles(plan: SparkPlan): Seq[ShuffleExchangeLike] = { |
There was a problem hiding this comment.
Will create a PR in apache/Spark to make the function from private to protected, then we can only override the function to check the plan
Related issue: #11088