Skip to content

[GLUTEN-8852][CORE] PART0: Adding Spark400 support#9768

Merged
zhouyuan merged 17 commits intoapache:mainfrom
zhouyuan:wip_spark400_2
Aug 12, 2025
Merged

[GLUTEN-8852][CORE] PART0: Adding Spark400 support#9768
zhouyuan merged 17 commits intoapache:mainfrom
zhouyuan:wip_spark400_2

Conversation

@zhouyuan
Copy link
Copy Markdown
Member

@zhouyuan zhouyuan commented May 27, 2025

What changes were proposed in this pull request?

  • adding basic shim layer for Spark-400
  • Spark-4.0 profile added
  • tpch tests passing

Here's the command to build package for Spark-400:

rm -f backends-velox/src/test/scala/org/apache/gluten/execution/VeloxStringFunctionsSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/GlutenHiveUDFSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxParquetWriteForHiveSuite.scala \
     backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/VeloxRasBenchmark.scala \
    backends-velox/src/test/scala/org/apache/spark/sql/execution/joins/GlutenExistenceJoinSuite.scala 

mvn clean package -Pbackends-velox -Pspark-4.0 -Pscala-2.13 -DskipTests

Note: Spark-400 will set ANSI on by default. However Gluten will automatically do fallback to JVM code path when ANSI ON. So for now it's recommended to test Gluten + Spark-400 with ANSI turned off:

.config("spark.sql.ansi.enabled", "false")

co-authored-by: feilong.he@intel.com @philo-he

(Related: #8852)

How was this patch tested?

pass GHA

@github-actions github-actions bot added the CORE works for Gluten Core label May 27, 2025
@github-actions
Copy link
Copy Markdown

#8852

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jun 3, 2025

Run Gluten Clickhouse CI on x86

@kapilks
Copy link
Copy Markdown
Contributor

kapilks commented Jun 30, 2025

Is there active work going on this PR?

@zhouyuan
Copy link
Copy Markdown
Member Author

Is there active work going on this PR?

@kapilks yes, I'm still working on this - waiting for some refactoring work on the main branch to land first

@kapilks
Copy link
Copy Markdown
Contributor

kapilks commented Jun 30, 2025

Is there active work going on this PR?

@kapilks yes, I'm still working on this - waiting for some refactoring work on the main branch to land first

@zhouyuan Let me know if other items are there.
I would like to contribute to Spark 4 support

@github-actions github-actions bot added the VELOX label Jul 10, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhouyuan zhouyuan marked this pull request as ready for review July 17, 2025 14:50
@zhouyuan zhouyuan changed the title [GLUTEN-8852][CORE] Adding Spark400 support [GLUTEN-8852][CORE] PART1: Adding Spark400 support Jul 17, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhouyuan
Copy link
Copy Markdown
Member Author

zhouyuan commented Jul 17, 2025

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks
@kapilks @philo-he

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zjuwangg
Copy link
Copy Markdown
Contributor

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks @kapilks @philo-he

hi @zhouyuan What's the latest status of this promising MR? I'd also like to take some work on Spark 4.0 support.

@zhouyuan
Copy link
Copy Markdown
Member Author

I have updated the patch, part 1 is to enable the shim layer and able to run tpch/ds. The following tasks are defined in #8852 as sub-tasks @kapilks @philo-he

hi @zhouyuan What's the latest status of this promising MR? I'd also like to take some work on Spark 4.0 support.

@zjuwangg Cool. With this patch Gluten can pass the basic TPCH tests. TPCDS failed due to missing logic of handling dynamic partition pruning. This can be included in the 1.5 release(targeting Aug).
The remaining tasks (targeting 1.6 release) are defined here: #8852. The most important part is to enable all the unit tests and ANSI support IMO.
Thanks.

Cc @philo-he @weiting-chen

@philo-he
Copy link
Copy Markdown
Member

We can adjust the goal of this PR. If build pass for Spark 4.0 and no CI failure for earlier Spark support, I think we can merge the PR. The runtime issues can be fixed in subsequent PRs.

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@zhouyuan zhouyuan changed the title [GLUTEN-8852][CORE] PART1: Adding Spark400 support [GLUTEN-8852][CORE] PART0: Adding Spark400 support Jul 30, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

zhouyuan and others added 5 commits August 11, 2025 10:05
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
* Fix compilation

* Add reducer

* Fix

* Fix UI

* Remove isNullIntolerant (already done in upstream)

* Fix substrait module

* Fix newly found issues

* Fix UT

* Minor change
Signed-off-by: Yuan <yuanzhou@apache.org>
zhouyuan and others added 8 commits August 11, 2025 10:07
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

* number of PlaceholderRows + the TerminalRow equates to the size of the original columnar batch.
*/
sealed abstract class BatchCarrierRow extends InternalRow {
abstract class BatchCarrierRowBase extends InternalRow {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid shiming this class by making BatchCarrierRow extend a mixin trait. I'll help update the PR for that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5379cc7

* @since 4.0.0
*/
@Evolving
public interface Reducer<I, O> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class duplicates the one from vanilla Spark. I think we need to place it to all 3.x shims instead? Although it's tedious than the current practice.

<phase>generate-sources</phase>
<configuration>
<target>
<replaceregexp file="src/main/scala/org/apache/spark/sql/execution/ui/GlutenAllExecutionsPage.scala"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a Scala class so we may be able to add a type shim in the shim layers to avoid such hacking? E.g.,

type HttpServletRequestShim = javax.servlet.http.HttpServletRequest

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer, I have created a pr to fix: zhouyuan#33. Thanks.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhouyuan zhouyuan merged commit 7a3b0e3 into apache:main Aug 12, 2025
92 of 94 checks passed
@zhouyuan
Copy link
Copy Markdown
Member Author

@zhztheplayer @philo-he Thanks a lot for helping on this patch! Just merged this initial part so it wont block community efforts on further improving.

@zhztheplayer
Copy link
Copy Markdown
Member

Thank you for the effort, @zhouyuan @philo-he.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants