[#32004] Ensure all pcollection coders are length prefixed if necessary.#32012
Merged
damondouglas merged 3 commits intoapache:masterfrom Jul 30, 2024
Merged
[#32004] Ensure all pcollection coders are length prefixed if necessary.#32012damondouglas merged 3 commits intoapache:masterfrom
damondouglas merged 3 commits intoapache:masterfrom
Conversation
Contributor
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Contributor
Author
|
Definitely weird. Please review, but let's give it a day to get clean reruns I think. |
Contributor
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
Contributor
Author
|
Friendly ping. |
damondouglas
approved these changes
Jul 30, 2024
reeba212
pushed a commit
to reeba212/beam
that referenced
this pull request
Dec 4, 2024
…ecessary. (apache#32012) * [apache#32004] Ensure input collection is wrapped. Send precise PCollections. * error out if there's an issue rewriting coders. * Unwrap length prefix coders in element hasher. --------- Co-authored-by: lostluck <13907733+lostluck@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prism wasn't length prefix wrapping unknown coders for bundle internal PCollections, or for the parallel input pcollection. This prevented Java and Python SplittableDoFns from correctly encoding their sub element split results, since they pull their coder from the input PCollection.
Go is arguably incorrectly pulling it's coder from the DataSource transform itself, but explains why prism developed this bug.
This change ensures all the internal PCollections are wrapped, and updated with the correct coder. Further prism now only sends precisely the PCollections needed or used in the bundle, instead of the entire set of pipeline PCollections.
Fixes #32004.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.