Skip to content

Bump Spark3 version to 3.2.2#23805

Merged
aromanenko-dev merged 9 commits intoapache:masterfrom
aromanenko-dev:spark3_version
Jul 20, 2023
Merged

Bump Spark3 version to 3.2.2#23805
aromanenko-dev merged 9 commits intoapache:masterfrom
aromanenko-dev:spark3_version

Conversation

@aromanenko-dev
Copy link
Copy Markdown
Contributor

@aromanenko-dev aromanenko-dev commented Oct 24, 2022

Bump Spark3 default version to 3.2.2

Closes #23804


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@aromanenko-dev aromanenko-dev requested a review from mosche October 24, 2022 12:08
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark StructuredStreaming ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner Java 11

@aromanenko-dev aromanenko-dev removed the request for review from mosche October 24, 2022 12:13
@aromanenko-dev aromanenko-dev marked this pull request as draft October 24, 2022 12:13
Copy link
Copy Markdown
Member

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please update

Additionally #23802 should be merged first and metrics-core should be bumped to 4.2.0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 3.2.2?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw 3.2.0 / 3.2.1 contain CVEs

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, tests are failing... looks like 3.2.0 to 3.2.1 (which is currently tested as part of sparkVersionsTest) contains a breaking change

@mosche
Copy link
Copy Markdown
Member

mosche commented Oct 24, 2022

Run Spark Runner Tpcds Tests

@mosche
Copy link
Copy Markdown
Member

mosche commented Oct 24, 2022

I'd also suggest to document this change in CHANGES.md. Particularly for anyone using the job-server, this is a breaking change as they have to upgrade their cluster version.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 24, 2022

Codecov Report

Merging #23805 (b8d6681) into master (8f12469) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #23805      +/-   ##
==========================================
- Coverage   71.15%   71.15%   -0.01%     
==========================================
  Files         861      861              
  Lines      104568   104568              
==========================================
- Hits        74404    74401       -3     
- Misses      28615    28618       +3     
  Partials     1549     1549              
Flag Coverage Δ
python 80.33% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spotless PreCommit

@aromanenko-dev aromanenko-dev force-pushed the spark3_version branch 2 times, most recently from f59c82e to 755613b Compare November 8, 2022 14:32
@aromanenko-dev aromanenko-dev marked this pull request as ready for review November 9, 2022 17:01
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark StructuredStreaming ValidatesRunner

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 9, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

@mosche All tests are passing now

@mosche
Copy link
Copy Markdown
Member

mosche commented Nov 10, 2022

@aromanenko-dev a few things are still left

  • Please update the Spark version mentioned in the Spark runner docs. Also this change should be mentioned in CHANGES.md https://github.com/apache/beam/blame/master/website/www/site/content/en/documentation/runners/spark.md#L70

  • What about the Spark versions used for compatibility tests? It currently contains 3.2.1, if we bump to 3.2.0 we should keep that. But what about 3.1.1? Should we add it there to ensure we stay compatible?
    https://github.com/apache/beam/blob/master/runners/spark/3/build.gradle#L36-L39

  • The hadoop version tests won't reliably work work anymore as both dependencies are pulled. So it's unclear what gets actually used.

    hadoopVersions.each { kv ->
      configurations."hadoopVersion$kv.key" {
        resolutionStrategy {
          force "org.apache.hadoop:hadoop-common:$kv.value"
        }
      }
    }
    
  • Similarly, VR tests are pulling hadoop-format as dependency and it's not clear what version of hadoop is effectively used to run the tests. We have to make sure we run tests enforcing the version used by Spark.

@github-actions
Copy link
Copy Markdown
Contributor

Reminder, please take a look at this pr: @damccorm

@damccorm
Copy link
Copy Markdown
Contributor

stop reviewer notifications

@aromanenko-dev aromanenko-dev marked this pull request as ready for review July 19, 2023 13:52
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Python_Integration PreCommit

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark StructuredStreaming ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

@mosche ping on this
As we discussed privately, hadoop-related things are not so important in this case and could be addressed out of this change.

@mosche mosche self-requested a review July 20, 2023 06:10
@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Copy Markdown
Contributor Author

Run Spark StructuredStreaming ValidatesRunner

Copy link
Copy Markdown
Member

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aromanenko-dev 🚀

@mosche
Copy link
Copy Markdown
Member

mosche commented Jul 20, 2023

Run Spark ValidatesRunner

@mosche
Copy link
Copy Markdown
Member

mosche commented Jul 20, 2023

Run SQL PreCommit

@aromanenko-dev aromanenko-dev merged commit 0cd9b90 into apache:master Jul 20, 2023
@aromanenko-dev aromanenko-dev deleted the spark3_version branch July 20, 2023 12:06
cushon pushed a commit to cushon/beam that referenced this pull request May 24, 2024
* Bump Spark3 version to 3.2.0

* [23804] Add Spark 3.2.0 constructors in EncoderFactory

* Update Hadoop deps

* Bump Spark3 version to 3.2.2

* Add Spark 3.1.1 for compatibility testing

* Update CHANGES.md on Spark version bump

* Fix whitespace check

* Add Spark 3.1.2 for compatibility testing

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Bump Spark3 version to 3.2.2

3 participants