Skip to content

Conversation

@YanivKunda
Copy link
Contributor

Which issue does this PR close?

Closes #1461

Rationale for this change

Spark 3.5.5 is the latest stable 3.5.x version, and should be supported.

What changes are included in this PR?

Just the upgrade of spark 3.5.4 to 3.5.5 and the only code change required in ShimCometScanExec.

How are these changes tested?

Ran mvn test with the spark-3.5 profile.
This was sufficient since the build failed with just the upgrade and without the required code change.

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
Did we really need to regenerate the diff file?

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.40%. Comparing base (f09f8af) to head (45fbe75).
Report is 98 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1565      +/-   ##
============================================
+ Coverage     56.12%   58.40%   +2.27%     
- Complexity      976      985       +9     
============================================
  Files           119      122       +3     
  Lines         11743    12214     +471     
  Branches       2251     2280      +29     
============================================
+ Hits           6591     7133     +542     
+ Misses         4012     3949      -63     
+ Partials       1140     1132       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kazuyukitanimura kazuyukitanimura merged commit 646dc9b into apache:main Mar 24, 2025
68 checks passed
@YanivKunda
Copy link
Contributor Author

@parthchandra I regenerated the diff file per the instructions in https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/spark-sql-tests.md -
Although they use the 3.4.3 diff file as the base (which obviously created a lot of rejects).
I used the existing diff file for the previous 3.5.x minor (3.5.4) which resulted in no rejects.
Regenerating it ensures it is tailored fit for the exact code base of the version it supports.

maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] =
PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, maxSplitBytes, partitionValues)
PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, maxSplitBytes, partitionValues)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems incompatible with versions below 3.5.4 of 3.5.X. Should we try to maintain compatibility among minor versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be possible using minor-version-specific shims, but it will change the release structure -
which assumed a single release for each spark major version.
What do the maintainers think?
If not, I think the latest minor should be preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be able to simply make them compatible through reflection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be certainly much easier from a release-engineering perspective (it's also not big of a development effort) -
but the question is whether this is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried using reflection in #1573 (also adding 3.5.4 to the testing matrix) -
Let's see how it fares.

@YanivKunda YanivKunda deleted the spark-3.5.5 branch March 31, 2025 18:48
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
## Which issue does this PR close?

Closes apache#1461

## Rationale for this change

Spark 3.5.5 is the latest stable 3.5.x version, and should be supported.

## What changes are included in this PR?

Just the upgrade of spark 3.5.4 to 3.5.5 and the only code change required in `ShimCometScanExec`.

## How are these changes tested?

Ran `mvn test` with the `spark-3.5` profile.
This was sufficient since the build failed with just the upgrade and without the required code change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NoSuchMethodError: PartitionedFileUtil$.splitFiles when running with Spark 3.5.5

5 participants