Spark Runner : Replace queueStream with custom DStream in Spark streaming Flatten transform#34080
Conversation
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @damccorm for label build. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Run Java_Spark3_Versions PreCommit |
|
Run Java PreCommit |
|
Run Spotless PreCommit |
|
Run Spark ValidatesRunner |
f0d1816 to
24118f7
Compare
|
The failing tests are also failing in the latest master branch, unrelated to this PR's changes. |
|
Thanks, spark runner tests are currently broken (by #33574) and we are working on Fix. |
|
The tests are fixed on HEAD. Would you mind rebasing your PR onto the latest master branch? Thanks! |
runners/spark/src/main/java/org/apache/beam/runners/spark/TestSparkPipelineOptions.java
Show resolved
Hide resolved
@Abacn Thank you. The test runs smoothly! |
…ming Flatten transform (apache#34080) * feat : add delete checkpoint dir option in TestSparkPipelineOptions * feat : implementation SingleEmitInputDStream * chore : spotlessApply * feat : Replace queueStream with SingleEmitInputDStream for Spark streaming * chore : remove logging in SingleEmitInputDStream * Touch trigger files * test : fix flaky StreamingTransformTranslatorTest
Please add a meaningful description for your change here
fixes #18144
fixes #20426
This PR replaces Spark's queueStream with a custom SingleEmitInputDStream implementation
for the Flatten transform in Spark streaming. The queueStream was not checkpoint-able,
which caused issues when recovering pipelines with flattened bounded and unbounded PCollections.
The custom SingleEmitInputDStream ensures:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.