[GLUTEN-11088][VL] Fix GlutenParquetIOSuite compatibility issues for Spark 4.0 by baibaichen · Pull Request #11281 · apache/gluten

baibaichen · 2025-12-11T07:15:28Z

What changes were proposed in this pull request?

This PR fixes compatibility issues in GlutenParquetIOSuite for Spark 4.0 by addressing the following Spark 4.0 shim layer changes:

Respect mapreduce.output.basename configuration: Updated SparkWriteFilesCommitProtocol to honor the mapreduce.output.basename configuration when generating output file names, aligning with SPARK-49991.
Proper error handling in write operations: Replaced direct exception throwing with GlutenFileFormatWriter.throwWriteError to use Spark's standardized error handling mechanism (QueryExecutionErrors.taskFailedWhileWritingRowsError). aligning with SPARK-45844.
Code quality improvements:
- Added explicit type annotations to sparkStageId, sparkPartitionId, and sparkAttemptNumber for better type safety
- Changed fileNames initialization from null to underscore idiom (_) for cleaner Scala style
- Migrated from deprecated scala.collection.JavaConverters to scala.jdk.CollectionConverters
- Simplified TextScan instantiation by removing redundant new keyword (applies to Scala 3/case class patterns)
Test coverage: Re-enabled 3 previously excluded tests in VeloxTestSettings:
- SPARK-49991: Respect 'mapreduce.output.basename' to generate file names
- SPARK-6330 regression test
- SPARK-7837 Do not close output writer twice when commitTask() fails

Why are the changes needed?

The Spark 4.0 upgrade introduced breaking changes in the shim layer:

The file naming convention now supports custom basename configuration through mapreduce.output.basename
Error handling APIs were refactored to use centralized error builders
The previous direct exception throwing approach is incompatible with Spark 4.0's error handling framework

Without these changes, GlutenParquetIOSuite tests fail due to:

Incorrect file name generation (missing basename support)
Incompatible exception types when write operations fail
Deprecated Scala collection conversion APIs

How was this patch tested?

Re-enabled and verified all 3 previously excluded tests pass successfully
Existing GlutenParquetIOSuite tests continue to pass
Validated file naming with custom mapreduce.output.basename configurations
Confirmed error handling produces correct exception types and messages

Related Issue

Addresses #11088 (Track on Spark-4.0 failed unit tests)

github-actions · 2025-12-11T07:16:08Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-12-12T01:36:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-12-13T06:18:14Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-12-13T06:18:46Z

Run Gluten Clickhouse CI on x86

zhouyuan · 2025-12-15T00:53:43Z

@baibaichen
it looks like we should disable below ANSI test as Gluten-velox does not support it yet

2025-12-13T07:06:56.4659042Z - Throw exceptions on inserting out-of-range int value with ANSI casting policy *** FAILED ***
2025-12-13T07:06:56.4660344Z   Expected exception org.apache.spark.SparkArithmeticException to be thrown, but org.apache.spark.SparkException was thrown (InsertSuite.scala:775)
2025-12-13T07:06:56.4840500Z 07:06:56.483 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Project[QueryId=47908], due to: 
2025-12-13T07:06:56.4842112Z  - Validation failed with exception from: ProjectExecTransformer, reason: CheckOverflowInTableInsert is used in ANSI mode, but Gluten does not support ANSI mode.
2025-12-13T07:06:56.4843981Z

github-actions · 2025-12-15T08:00:25Z

Run Gluten Clickhouse CI on x86

baibaichen · 2025-12-15T08:11:13Z

@baibaichen it looks like we should disable below ANSI test as Gluten-velox does not support it yet

2025-12-13T07:06:56.4659042Z - Throw exceptions on inserting out-of-range int value with ANSI casting policy *** FAILED ***
2025-12-13T07:06:56.4660344Z   Expected exception org.apache.spark.SparkArithmeticException to be thrown, but org.apache.spark.SparkException was thrown (InsertSuite.scala:775)
2025-12-13T07:06:56.4840500Z 07:06:56.483 WARN org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Project[QueryId=47908], due to: 
2025-12-13T07:06:56.4842112Z  - Validation failed with exception from: ProjectExecTransformer, reason: CheckOverflowInTableInsert is used in ANSI mode, but Gluten does not support ANSI mode.
2025-12-13T07:06:56.4843981Z

Thanks @zhouyuan. This isn't related to ANSI mode; the issue was introduced by my fix.

…riteError` for task failure handling.

…ation, according to apache/spark#48494

…d consistency

…agnostics

github-actions · 2025-12-16T01:56:15Z

Run Gluten Clickhouse CI on x86

github-actions bot added CORE works for Gluten Core VELOX labels Dec 11, 2025

baibaichen mentioned this pull request Dec 11, 2025

[VL] Track on Spark-4.0 failed unit tests #11088

Open

baibaichen force-pushed the feature/GlutenParquetIOSuite branch from 73dfa1d to 7baaac2 Compare December 12, 2025 01:35

baibaichen force-pushed the feature/GlutenParquetIOSuite branch from 59fb290 to 5d2b367 Compare December 13, 2025 06:18

baibaichen force-pushed the feature/GlutenParquetIOSuite branch from 5d2b367 to 4674f9d Compare December 15, 2025 07:59

baibaichen and others added 7 commits December 16, 2025 09:55

Replace direct exception throwing with `GlutenFileFormatWriter.throwW…

296df37

…riteError` for task failure handling.

Respect 'mapreduce.output.basename' configuration for file name gener…

34ec0aa

…ation, according to apache/spark#48494

Refactor imports and variable initializations for improved clarity an…

a98cc0f

…d consistency

Remove exclusions

0bf03a4

Assert on the cause message

97e8906

Enhance error handling in commit and abort tasks to provide better di…

475a844

…agnostics

Fix minor syntax inconsistency

45e4618

baibaichen force-pushed the feature/GlutenParquetIOSuite branch from 4674f9d to 45e4618 Compare December 16, 2025 01:55

baibaichen requested review from JkSelf and zhouyuan December 17, 2025 05:34

rui-mo approved these changes Dec 17, 2025

View reviewed changes

baibaichen merged commit ee153ed into apache:main Dec 17, 2025
106 of 107 checks passed

baibaichen deleted the feature/GlutenParquetIOSuite branch December 17, 2025 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-11088][VL] Fix GlutenParquetIOSuite compatibility issues for Spark 4.0#11281

[GLUTEN-11088][VL] Fix GlutenParquetIOSuite compatibility issues for Spark 4.0#11281
baibaichen merged 7 commits intoapache:mainfrom
baibaichen:feature/GlutenParquetIOSuite

baibaichen commented Dec 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

zhouyuan commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

baibaichen commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

baibaichen commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Related Issue

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

zhouyuan commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

baibaichen commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baibaichen commented Dec 11, 2025 •

edited

Loading