Skip to content

[GLUTEN-11088][VL] Add compatibility layer for StructsToJson and StaticInvoke expressions across Spark versions#11294

Merged
baibaichen merged 5 commits intoapache:mainfrom
baibaichen:feature/invoke
Dec 16, 2025
Merged

[GLUTEN-11088][VL] Add compatibility layer for StructsToJson and StaticInvoke expressions across Spark versions#11294
baibaichen merged 5 commits intoapache:mainfrom
baibaichen:feature/invoke

Conversation

@baibaichen
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR addresses Spark 4.0 compatibility issues and refactors the ExpressionConverter to improve code quality.

Main Changes:

  1. Add compatibility layer for StructsToJson expression across Spark versions

    • In Spark 4.0, StructsToJson was replaced with Invoke expression using StructsToJsonEvaluator
    • Added StructsToJsonInvoke extractor in shims for all Spark versions (3.2-4.0) to provide a unified interface
    • For Spark 4.0: Extracts evaluator options, child expression, and timeZoneId from the Invoke pattern
    • For Spark 3.2-3.5: Returns None to maintain backward compatibility
  2. Add support for StaticInvoke expressions in Spark 4.0

    • lengthOfJsonArray and jsonObjectKeys now use StaticInvoke in Spark 4.0
    • Added mappings to handle these functions properly in ExpressionConverter
  3. Refactor ExpressionConverter to reduce return usage

    • Restructured code to avoid explicit return statements, following Scala best practices
    • Improved readability by using pattern matching and Option-based control flow
    • Extracted common mappings (icebergStaticInvokeMap, staticInvokeMap) to reduce code duplication

Why are the changes needed?

  • Spark 4.0 introduced breaking changes where StructsToJson was replaced with Invoke expressions
  • Without this compatibility layer, JSON-related functions fail in Spark 4.0
  • The return keyword is discouraged in Scala as it breaks functional programming style and can lead to unexpected behavior

Does this PR introduce any user-facing change?

No. This is an internal compatibility fix.

How was this patch tested?

  • Removed Spark-version-specific test limitations in JsonFunctionsValidateSuite
  • Existing JSON function tests now pass across all Spark versions (3.2-4.0)

Related issue: #11088

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Dec 14, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@baibaichen baibaichen merged commit 30d1ab6 into apache:main Dec 16, 2025
60 checks passed
@baibaichen baibaichen deleted the feature/invoke branch December 16, 2025 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants