[GLUTEN-11088][CORE] Fix Spark 4.0 GlutenDynamicPartitionPruningV1SuiteAEOn by jinchengchenghh · Pull Request #11212 · apache/gluten

jinchengchenghh · 2025-11-27T18:52:29Z

Support FileSourceScanExecTransformer stream.

FileSourceScanExecTransformer is not equal because dataFilters is not equal, (cast(x#688 as double) = ReusedSubquery Subquery subquery#683, [id=#2324]) and (cast(x#688 as double) = Subquery subquery#683, [id=#2324]) should be same, must use canonicalized expression to deduplicate.

Fix dynamic pruning FileSourceScanExec getPartitions because add new class ScanFileListing and other refactor for inputRDD

Related issue: #11088

jinchengchenghh · 2025-11-27T18:54:11Z

Run Gluten ClickHouse CI on x86

github-actions · 2025-11-28T09:24:22Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-28T09:54:50Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-28T16:29:25Z

Run Gluten Clickhouse CI on x86

jinchengchenghh · 2025-11-28T16:33:59Z

...-substrait/src/main/scala/org/apache/gluten/execution/BasicPhysicalOperatorTransformer.scala


  def subtractFilters(left: Seq[Expression], right: Seq[Expression]): Seq[Expression] = {
-    (left.toSet -- right.toSet).toSeq
+    val scanSet = left.map(_.canonicalized).toSet


Hi, I find it causes ReuseExchange issue, I find you add combineFilters, do you have any suggestions for that, I think it may meet the same issue @Zouxxyy

Oh, I checked the commit history, and this PR (#6296) changed ExpressionSet to a regular Set. The original ExpressionSet compared expressions based on both determinism and canonical representation, whereas a regular Set only compares expressions directly (using their default equals/hashCode).

It seems what we actually need is comparison based on canonicalized forms. Perhaps we could modify the code like this:

def subtractFilters(left: Seq[Expression], right: Seq[Expression]): Seq[Expression] = { val rightCanonicalSet = right.map(_.canonicalized).toSet left.filterNot(expr => rightCanonicalSet.contains(expr.canonicalized)) } def combineFilters(left: Seq[Expression], right: Seq[Expression]): Seq[Expression] = { val seen = scala.collection.mutable.Set[Expression]() val result = scala.collection.mutable.ListBuffer[Expression]() def tryAdd(expr: Expression): Unit = { val canon = expr.canonicalized if (!seen.contains(canon)) { seen += canon result += expr } } (left ++ right).foreach(tryAdd) result.toList }

Maybe here we cannot use canonicalized expression, otherwise, will throw subquery is not finished exception. with following code

val scanSet = left.map(_.canonicalized).toSet scanSet.toSeq ++ right.filter(f => !scanSet.contains(f.canonicalized))

Keep current code now

github-actions · 2025-11-28T16:44:10Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-28T18:31:14Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-28T18:32:25Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-28T18:44:51Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-29T00:43:52Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-29T12:31:36Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-29T14:56:01Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-29T14:56:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-11-29T15:31:28Z

Run Gluten Clickhouse CI on x86

jinchengchenghh · 2025-11-29T23:24:26Z

gluten-ut/spark40/src/test/scala/org/apache/spark/sql/GlutenDynamicPartitionPruningSuite.scala

-        //
-        // See also org.apache.gluten.execution.FilterHandler#applyFilterPushdownToScan
-        // See also DynamicPartitionPruningSuite.scala:1362
-        assert(subqueryIds.size == 3, "Whole plan subquery reusing not working correctly")


Maybe the previous version has extra subquery, the filter should not affect ReusedSubquery, Gluten with Spark4.0 result is same with jvm Spark

…teAEOn

github-actions · 2025-11-29T23:34:25Z

Run Gluten Clickhouse CI on x86

zhouyuan

👍

github-actions bot added the CORE works for Gluten Core label Nov 27, 2025

github-actions bot added the DATA_LAKE label Nov 28, 2025

jinchengchenghh marked this pull request as draft November 28, 2025 13:16

jinchengchenghh force-pushed the dpp branch from 5b239b1 to d8bf5d9 Compare November 28, 2025 16:28

jinchengchenghh commented Nov 28, 2025

View reviewed changes

jinchengchenghh force-pushed the dpp branch from d8bf5d9 to 9e4d167 Compare November 28, 2025 16:43

jinchengchenghh force-pushed the dpp branch from b642689 to 3e36724 Compare November 29, 2025 14:55

jinchengchenghh requested a review from zhouyuan November 29, 2025 23:21

jinchengchenghh marked this pull request as ready for review November 29, 2025 23:21

jinchengchenghh commented Nov 29, 2025

View reviewed changes

[GLUTEN-11088][CORE] Fix Spark 4.0 GlutenDynamicPartitionPruningV1Sui…

596685a

…teAEOn

jinchengchenghh force-pushed the dpp branch from f031c37 to 596685a Compare November 29, 2025 23:33

zhouyuan approved these changes Dec 1, 2025

View reviewed changes

jinchengchenghh merged commit ba25aec into apache:main Dec 1, 2025
60 checks passed

jinchengchenghh mentioned this pull request Dec 2, 2025

[VL] Track on Spark-4.0 failed unit tests #11088

Open

Conversation

jinchengchenghh commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinchengchenghh commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

jinchengchenghh Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Zouxxyy Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

jinchengchenghh Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

zhouyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jinchengchenghh commented Nov 27, 2025 •

edited

Loading

jinchengchenghh Nov 29, 2025 •

edited

Loading