Skip to content

[UT] Add missing Gluten test suites for Spark 4.0 and 4.1#11512

Merged
baibaichen merged 20 commits intoapache:mainfrom
baibaichen:feature/internal_issue
Feb 2, 2026
Merged

[UT] Add missing Gluten test suites for Spark 4.0 and 4.1#11512
baibaichen merged 20 commits intoapache:mainfrom
baibaichen:feature/internal_issue

Conversation

@baibaichen
Copy link
Copy Markdown
Contributor

@baibaichen baibaichen commented Jan 28, 2026

What changes are proposed in this pull request?

This PR adds comprehensive test suite coverage for Gluten by creating wrapper test suites that extend Spark's original test suites. These newly introduced suites make up for the corresponding test coverage that has been missing in Gluten since Spark 3.3. The changes include:

  • 208 new test suite files for Spark 4.0.0 (spark40)
  • 207 new test suite files for Spark 4.1.0 (spark41)
  • 237 enableSuite calls added to VeloxTestSettings for spark40
  • 236 enableSuite calls added to VeloxTestSettings for spark41
  • Test suites under hive/execution/ are not included in this addition yet.

All test suites follow the pattern of extending Spark's native test suites with GlutenTestsCommonTrait to enable Gluten-specific execution.

Suite Files Added by Package

package (non-recursive) spark40 spark41 comments
sql/streaming 46 46 Identical
sql 36 35 Spark41 missing GlutenDefaultANSIValueSuite.scala
sql/catalyst/expressions 34 34 Identical
sql/execution 33 33 Identical
sql/catalyst/expressions/aggregate 11 11 Identical
sql/execution/datasources 10 10 Identical
sql/connector 10 10 Identical
sql/sources 8 8 Identical
sql/execution/datasources/parquet 5 5 Identical
sql/execution/python 4 4 Identical
sql/execution/joins 2 2 Identical
sql/execution/datasources/v2 2 2 Identical
sql/errors 2 2 Identical
sql/execution/metric 1 1 Identical
sql/execution/datasources/text 1 1 Identical
sql/execution/datasources/orc 1 1 Identical
sql/execution/datasources/json 1 1 Identical
sql/execution/datasources/csv 1 1 Identical
TOTAL 208 207 1 file difference

Difference Analysis:

  • Spark41 is missing GlutenDefaultANSIValueSuite.scala in the sql package, which exists in Spark40. This is likely because the DefaultANSIValue feature is specific to Spark 3.4.0 and not present or changed in Spark 4.1.0.

enableSuite Calls Added by Package

Note: The enableSuite count represents the total number of test suite enablement calls in VeloxTestSettings.scala. Some counts exceed file counts because certain suite files contain multiple test classes that are enabled separately.

package (non-recursive) spark40 spark41 comments
Total 237, 49 commented Total 236, 52 commented Spark41 has 3 more commented suites
sql/streaming Active: 38, Commented: 8 Active: 38, Commented: 8 Identical
sql/catalyst/expressions Active: 32, Commented: 1 Active: 32, Commented: 1 Identical (GlutenXmlExpressionsSuite commented in both)
sql Active: 28, Commented: 11 Active: 26, Commented: 12 Spark41 has GlutenDataFrameSubquerySuite commented
sql/execution Active: 27, Commented: 6 Active: 27, Commented: 6 Identical
sql/catalyst/expressions/aggregate Active: 11, Commented: 0 Active: 11, Commented: 0 Identical
sql/execution/datasources Active: 10, Commented: 0 Active: 9, Commented: 1 Spark41 has GlutenParquetVariantShreddingSuite commented
sql/connector Active: 16, Commented: 2 Active: 16, Commented: 2 Identical
sql/sources Active: 1, Commented: 7 Active: 1, Commented: 7 Many Hive-related suites commented
sql/execution/python Active: 1, Commented: 3 Active: 0, Commented: 4 Spark41 has GlutenRowQueueSuite commented
sql/execution/joins Active: 2, Commented: 0 Active: 2, Commented: 0 Identical
sql/errors Active: 6, Commented: 0 Active: 6, Commented: 0 Identical
Other datasource packages Active: 16, Commented: 11 Active: 16, Commented: 11 Identical

Key Differences Between Spark40 and Spark41:

  1. GlutenDefaultANSIValueSuite (sql package)

    • Spark40: File exists and enableSuite is active
    • Spark41: File does not exist (feature not available in Spark 4.1.0)
  2. GlutenDataFrameSubquerySuite (sql package)

    • Spark40: enableSuite[GlutenDataFrameSubquerySuite] is active
    • Spark41: // TODO: 4.x enableSuite[GlutenDataFrameSubquerySuite] // 1 failure (commented)
    • Reason: 1 test failure in Spark 4.1.0
  3. GlutenParquetVariantShreddingSuite (execution/datasources/parquet)

    • Spark40: enableSuite[GlutenParquetVariantShreddingSuite] is active
    • Spark41: // TODO: 4.x enableSuite[GlutenParquetVariantShreddingSuite] // 1 failure (commented)
    • Reason: 1 test failure in Spark 4.1.0
  4. GlutenRowQueueSuite (execution/python)

    • Spark40: enableSuite[GlutenRowQueueSuite] is active
    • Spark41: // TODO: 4.x enableSuite[GlutenRowQueueSuite] (commented)
    • Reason: Compatibility issue with Spark 4.1.0

Why enableSuite Count > File Count?

The number of enableSuite calls (237 for spark40, 236 for spark41) is higher than the number of Suite files (208 for spark40, 207 for spark41) because:

  1. Central Registry: All enableSuite calls are in VeloxTestSettings.scala, which acts as a central registry for enabling test suites in Gluten
  2. Existing Suites: The enableSuite count includes both newly added suites AND previously existing suites that were already in VeloxTestSettings.scala
  3. Ratio: The ratio of 237 calls / 208 files ≈ 1.14 is reasonable and indicates most suites have a 1:1 mapping

Note: This PR adds 208/207 new Suite files, but the VeloxTestSettings.scala already contained some enableSuite calls before this commit. The 237/236 number represents the total number of enableSuite calls added in this specific commit.

How was this patch tested?

GHA.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.5 (Analysis and PR message generation)

@github-actions github-actions bot added the CORE works for Gluten Core label Jan 28, 2026
@baibaichen baibaichen force-pushed the feature/internal_issue branch from 1ed92bc to 3f3c1eb Compare January 29, 2026 10:10
@github-actions github-actions bot added the INFRA label Jan 30, 2026
@baibaichen baibaichen changed the title [VL] Add missing Gluten test suites for Spark 4.0 and 4.1 [Don't Merge!!!!] [VL] Add missing Gluten test suites for Spark 4.0 and 4.1 Jan 30, 2026
@baibaichen baibaichen force-pushed the feature/internal_issue branch from 268e2cd to 0cab74d Compare January 30, 2026 07:46
baibaichen and others added 4 commits January 30, 2026 23:09
… testing

This commit streamlines the CI/CD pipeline to focus on Spark 4.0 and 4.1
compatibility testing by disabling unrelated workflows and Spark 3.x jobs.

Changes:
- Disabled 9 workflow files (renamed to .disabled):
  * ARM/Enhanced backend workflows
  * Flink and ClickHouse specific workflows
  * Code analysis and maintenance workflows

- Modified velox_backend_x86.yml:
  * Commented out all Spark 3.3/3.4/3.5 test jobs (19 jobs)
  * Modified tpc-test-ubuntu to only test Spark 4.0/4.1 with Java 17/21
  * Kept only Spark 4.0/4.1 unit tests and build job

- Kept active formatting/quality checks:
  * scala_code_format.yml
  * code_format.yml
  * check_license.yml

All changes are marked with "TEMP DISABLED - for Spark 4.0/4.1 focus"
for easy rollback when full testing is needed again.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Migrated 47 streaming test suite files from Spark 4.0 (commit 4a2a4d62)
to Spark 4.1. This commit adds comprehensive test coverage for Apache
Spark's streaming package to Spark 4.1.

Changes:
- Added 47 new GlutenXXXSuite.scala files in streaming package
- Updated VeloxTestSettings.scala with 57 new enableSuite calls
- All suites extend their corresponding Spark test suites with
  GlutenTestsCommonTrait

Test suites added:
- Streaming aggregation and deduplication suites
- File stream source/sink suites
- State management suites (FlatMapGroups, TransformWithState)
- Streaming join suites (Inner, Outer, FullOuter, LeftSemi)
- Watermarking and windowing suites
- RocksDB state store suites
- Streaming query management and listener suites

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Gluten test suites for package: org.apache.spark.sql.streaming

Generated 47 new test suite files with 55 test classes for the
org.apache.spark.sql.streaming package. All suites extend their
corresponding Spark test suites with GlutenTestsCommonTrait or
GlutenSQLTestsTrait.

Changes:
- Added 47 new GlutenXXXSuite.scala files in streaming package
- Updated VeloxTestSettings.scala with 55 new enableSuite calls
- Added import statement for all new streaming suite classes

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@baibaichen baibaichen force-pushed the feature/internal_issue branch from 0cab74d to 630e5ee Compare January 30, 2026 16:07
baibaichen and others added 15 commits February 1, 2026 14:35
Temporarily comment out the following test suites that are failing:
- GlutenEventTimeWatermarkSuite
- GlutenFileStreamSourceSuite
- GlutenFlatMapGroupsWithStateDistributionSuite
- GlutenFlatMapGroupsWithStateSuite
- GlutenRocksDBStateStoreFlatMapGroupsWithStateSuite
- GlutenRocksDBStateStoreStreamingAggregationSuite
- GlutenRocksDBStateStoreStreamingDeduplicationSuite
- GlutenStreamSuite
- GlutenStreamingAggregationDistributionSuite
- GlutenStreamingAggregationSuite
- GlutenStreamingDeduplicationDistributionSuite
- GlutenStreamingDeduplicationSuite
- GlutenStreamingInnerJoinSuite
- GlutenStreamingOuterJoinSuite
- GlutenStreamingSessionWindowDistributionSuite
- GlutenStreamingStateStoreFormatCompatibilitySuite
…ressions

  Remove GlutenSubExprEvaluationRuntimeSuite due to Guava shading conflict.
Note: GlutenDefaultANSIValueSuite is excluded as DefaultANSIValueSuite doesn't exist in Spark 4.1
Add Gluten test suites for package: org.apache.spark.sql
Commented out 31 failing test suites in Spark 4.0 and 32 in Spark 4.1
to allow builds to pass while test failures are investigated.

Changes:
- Spark 4.0: Disabled 31 test suites with 157 total test failures
- Spark 4.1: Disabled 32 test suites (31 common + 2 version-specific)

Notable disabled suites:
- GlutenParquetTypeWideningSuite: 74 failures (major issue)
- GlutenWholeStageCodegenSuite: 24 failures
- Multiple HiveSupport and HadoopFsRelation suites
- Various Python, Variant, and XML test suites

All suite enablements are commented with failure counts for tracking.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions github-actions bot removed the INFRA label Feb 2, 2026
@baibaichen baibaichen changed the title [Don't Merge!!!!] [VL] Add missing Gluten test suites for Spark 4.0 and 4.1 [UT] Add missing Gluten test suites for Spark 4.0 and 4.1 Feb 2, 2026
@baibaichen baibaichen marked this pull request as ready for review February 2, 2026 07:15
@baibaichen baibaichen requested a review from philo-he February 2, 2026 08:15
Copy link
Copy Markdown
Contributor

@liuneng1994 liuneng1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

@baibaichen baibaichen merged commit 4e3df67 into apache:main Feb 2, 2026
64 checks passed
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 6, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 7, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 10, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 26, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 27, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit to baibaichen/gluten that referenced this pull request Feb 27, 2026
…ml- invalid data'

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (apache#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
baibaichen added a commit that referenced this pull request Feb 28, 2026
…ml- invalid data' (#11580)

- Enable GlutenXmlExpressionsSuite in VeloxTestSettings (was TODO disabled)
- Fix mixin: GlutenTestsCommonTrait → GlutenTestsTrait. The prior PR (#11512)
  added GlutenXmlExpressionsSuite with GlutenTestsCommonTrait, which does not
  enable Gluten execution for the test suite.
- Exclude 'from_xml- invalid data': Gluten overrides checkEvaluation to execute
  expressions via DataFrame, which throws SparkException directly instead of
  wrapping it in TestFailedException. Same pattern as 'from_json - invalid data'.
@baibaichen baibaichen deleted the feature/internal_issue branch March 22, 2026 04:12
baibaichen added a commit to baibaichen/gluten that referenced this pull request Mar 23, 2026
…rkSessionJobTaggingAndCancellationSuite

These 2 suites were disabled in apache#11512 but actually pass with GlutenPlugin
loaded (trait was correctly changed to GlutenSQLTestsTrait/GlutenTestsTrait
in apache#11800). This is a follow-up to re-enable them after diagnosis confirmed
they pass on both spark40 and spark41.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
zhouyuan pushed a commit that referenced this pull request Mar 23, 2026
…rkSessionJobTaggingAndCancellationSuite (#11812)

These 2 suites were disabled in #11512 but actually pass with GlutenPlugin
loaded (trait was correctly changed to GlutenSQLTestsTrait/GlutenTestsTrait
in #11800). This is a follow-up to re-enable them after diagnosis confirmed
they pass on both spark40 and spark41.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants