[GLUTEN-8852][CORE] PART0: Adding Spark400 support by zhouyuan · Pull Request #9768 · apache/gluten

zhouyuan · 2025-05-27T13:52:54Z

What changes were proposed in this pull request?

adding basic shim layer for Spark-400
Spark-4.0 profile added
tpch tests passing

Here's the command to build package for Spark-400:

rm -f backends-velox/src/test/scala/org/apache/gluten/execution/VeloxStringFunctionsSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/GlutenHiveUDFSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxParquetWriteForHiveSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/VeloxRasBenchmark.scala \
    backends-velox/src/test/scala/org/apache/spark/sql/execution/joins/GlutenExistenceJoinSuite.scala 

mvn clean package -Pbackends-velox -Pspark-4.0 -Pscala-2.13 -DskipTests

Note: Spark-400 will set ANSI on by default. However Gluten will automatically do fallback to JVM code path when ANSI ON. So for now it's recommended to test Gluten + Spark-400 with ANSI turned off:

.config("spark.sql.ansi.enabled", "false")

co-authored-by: feilong.he@intel.com @philo-he

(Related: #8852)

How was this patch tested?

pass GHA

github-actions · 2025-05-27T13:53:11Z

#8852

github-actions · 2025-05-27T13:53:26Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-06-03T13:49:01Z

Run Gluten Clickhouse CI on x86

kapilks · 2025-06-30T09:07:04Z

Is there active work going on this PR?

zhouyuan · 2025-06-30T09:34:41Z

Is there active work going on this PR?

@kapilks yes, I'm still working on this - waiting for some refactoring work on the main branch to land first

kapilks · 2025-06-30T10:54:00Z

Is there active work going on this PR?

@kapilks yes, I'm still working on this - waiting for some refactoring work on the main branch to land first

@zhouyuan Let me know if other items are there.
I would like to contribute to Spark 4 support

github-actions · 2025-07-10T07:49:18Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-07-17T14:50:07Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-07-17T15:16:40Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-07-17T16:09:51Z

Run Gluten Clickhouse CI on x86

zhouyuan · 2025-07-17T16:24:28Z

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks
@kapilks @philo-he

github-actions · 2025-07-18T20:49:14Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-07-23T15:03:09Z

Run Gluten Clickhouse CI on x86

zjuwangg · 2025-07-30T03:33:11Z

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks @kapilks @philo-he

hi @zhouyuan What's the latest status of this promising MR? I'd also like to take some work on Spark 4.0 support.

zhouyuan · 2025-07-30T08:01:31Z

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks @kapilks @philo-he

hi @zhouyuan What's the latest status of this promising MR? I'd also like to take some work on Spark 4.0 support.

@zjuwangg Cool. With this patch Gluten can pass the basic TPCH tests. TPCDS failed due to missing logic of handling dynamic partition pruning. This can be included in the 1.5 release(targeting Aug).
The remaining tasks (targeting 1.6 release) are defined here: #8852. The most important part is to enable all the unit tests and ANSI support IMO.
Thanks.

Cc @philo-he @weiting-chen

philo-he · 2025-07-30T15:42:22Z

We can adjust the goal of this PR. If build pass for Spark 4.0 and no CI failure for earlier Spark support, I think we can merge the PR. The runtime issues can be fixed in subsequent PRs.

philo-he

Looks good.

github-actions · 2025-07-31T07:44:21Z

Run Gluten Clickhouse CI on x86

Signed-off-by: Yuan <yuanzhou@apache.org>

* Fix compilation * Add reducer * Fix * Fix UI * Remove isNullIntolerant (already done in upstream) * Fix substrait module * Fix newly found issues * Fix UT * Minor change

Signed-off-by: Yuan <yuanzhou@apache.org>

* Initial * Refine

Signed-off-by: Yuan <yuanzhou@apache.org>

github-actions · 2025-08-11T09:08:09Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-08-11T11:18:37Z

Run Gluten Clickhouse CI on x86

Signed-off-by: Yuan <yuanzhou@apache.org>

github-actions · 2025-08-11T11:27:46Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-08-11T12:02:39Z

shims/common/src/main/scala/org/apache/gluten/execution/BatchCarrierRowBase.scala

 * number of PlaceholderRows + the TerminalRow equates to the size of the original columnar batch.
 */
-sealed abstract class BatchCarrierRow extends InternalRow {
+abstract class BatchCarrierRowBase extends InternalRow {


We can avoid shiming this class by making BatchCarrierRow extend a mixin trait. I'll help update the PR for that.

Fixed in 5379cc7

zhztheplayer · 2025-08-11T14:41:56Z

shims/common/src/main/scala/org/apache/spark/sql/connector/catalog/functions/Reducer.java

+ * @since 4.0.0
+ */
+@Evolving
+public interface Reducer<I, O> {


The class duplicates the one from vanilla Spark. I think we need to place it to all 3.x shims instead? Although it's tedious than the current practice.

zhztheplayer · 2025-08-11T14:47:01Z

gluten-ui/pom.xml

+                            <phase>generate-sources</phase>
+                            <configuration>
+                                <target>
+                                    <replaceregexp file="src/main/scala/org/apache/spark/sql/execution/ui/GlutenAllExecutionsPage.scala"


It's a Scala class so we may be able to add a type shim in the shim layers to avoid such hacking? E.g.,

type HttpServletRequestShim = javax.servlet.http.HttpServletRequest

@zhztheplayer, I have created a pr to fix: zhouyuan#33. Thanks.

github-actions · 2025-08-11T15:08:07Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-08-11T15:22:23Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-08-11T18:03:56Z

Run Gluten Clickhouse CI on x86

zhouyuan · 2025-08-12T10:24:31Z

@zhztheplayer @philo-he Thanks a lot for helping on this patch! Just merged this initial part so it wont block community efforts on further improving.

zhztheplayer · 2025-08-12T10:36:14Z

Thank you for the effort, @zhouyuan @philo-he.

github-actions bot added the CORE works for Gluten Core label May 27, 2025

github-actions bot added the VELOX label Jul 10, 2025

zhouyuan force-pushed the wip_spark400_2 branch from cf7f54d to f11c500 Compare July 17, 2025 14:49

zhouyuan marked this pull request as ready for review July 17, 2025 14:50

zhouyuan changed the title ~~[GLUTEN-8852][CORE] Adding Spark400 support~~ [GLUTEN-8852][CORE] PART1: Adding Spark400 support Jul 17, 2025

zhouyuan mentioned this pull request Jul 17, 2025

[VL] Enable Spark 400 unit tests #10207

Closed

philo-he approved these changes Jul 30, 2025

View reviewed changes

zhouyuan changed the title ~~[GLUTEN-8852][CORE] PART1: Adding Spark400 support~~ [GLUTEN-8852][CORE] PART0: Adding Spark400 support Jul 30, 2025

zhouyuan and others added 5 commits August 11, 2025 10:05

adding spark400 support

f40b4e5

Signed-off-by: Yuan <yuanzhou@apache.org>

fix comile

a1352c2

Signed-off-by: Yuan <yuanzhou@apache.org>

fix compile on substrait

5c42d31

Signed-off-by: Yuan <yuanzhou@apache.org>

Propose some fixes for passing compilation (#28)

9777850

* Fix compilation * Add reducer * Fix * Fix UI * Remove isNullIntolerant (already done in upstream) * Fix substrait module * Fix newly found issues * Fix UT * Minor change

fix compile

784dcb5

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan and others added 8 commits August 11, 2025 10:07

split spark32/33/34/35 shim

ab7e681

Signed-off-by: Yuan <yuanzhou@apache.org>

Fix code compatibility for ColumnarArrowEvalPythonExec.scala (#29)

9f8d270

* Initial * Refine

fix compile

ef08106

Signed-off-by: Yuan <yuanzhou@apache.org>

fix rebase

57b27e3

Signed-off-by: Yuan <yuanzhou@apache.org>

add back python exec metrics

7e722ad

Signed-off-by: Yuan <yuanzhou@apache.org>

fix failed unit tests

2fa1b09

Signed-off-by: Yuan <yuanzhou@apache.org>

bump arrow

724d09b

Signed-off-by: Yuan <yuanzhou@apache.org>

revert changes on idea/vcs.xml

976645e

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_spark400_2 branch from c2f5de4 to 976645e Compare August 11, 2025 09:07

fix rebase

fcc1632

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_spark400_2 branch from ec9e2c7 to fcc1632 Compare August 11, 2025 11:27

zhztheplayer reviewed Aug 11, 2025

View reviewed changes

remove internal-row shim

5379cc7

zhztheplayer reviewed Aug 11, 2025

View reviewed changes

style

12c2c60

fixup

2214d23

zhouyuan merged commit 7a3b0e3 into apache:main Aug 12, 2025
92 of 94 checks passed

zhztheplayer mentioned this pull request Aug 13, 2025

[Doc][VL]Add documents for spark 4.0.0 support #10430

Draft

zjuwangg mentioned this pull request Aug 14, 2025

[GLUTEN-8852][VL] Fix compilation issues in existing tests for Spark 4.0.0 #10434

Merged

zhouyuan mentioned this pull request Sep 4, 2025

[CORE] Spark 4.x support #8852

Open

Conversation

zhouyuan commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

kapilks commented Jun 30, 2025

Uh oh!

zhouyuan commented Jun 30, 2025

Uh oh!

kapilks commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

zhouyuan commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

zjuwangg commented Jul 30, 2025

Uh oh!

zhouyuan commented Jul 30, 2025

Uh oh!

philo-he commented Jul 30, 2025

Uh oh!

philo-he left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

zhztheplayer Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

philo-he Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025

Uh oh!

Uh oh!

zhouyuan commented Aug 12, 2025

Uh oh!

zhztheplayer commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

zhouyuan commented May 27, 2025 •

edited

Loading

kapilks commented Jun 30, 2025 •

edited

Loading

zhouyuan commented Jul 17, 2025 •

edited

Loading