Fix RowCoderGenerator to use the encodingPositions when encoding and decoding the bit set representing null fields.#32389
Conversation
|
R: @reuvenlax |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Also added internal dataflow tests verifying that this fixes update with reordered schemas with null fields. |
|
@Abacn @reuvenlax friendly ping |
Abacn
left a comment
There was a problem hiding this comment.
thanks, left a few questions
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
GENERATED_CODERS is already a synchronized map. Usually does not need to be wrapped with synchronized block. Here I see "setOverridesLock" is used in other places, probably this is the reason. If this is the case, consider adding a comment to note this?
There was a problem hiding this comment.
Coders can be read without the synchronized block but are written with the extra synchronization because concurrenthashmap updateifnotpresent is not reentrant (unlike synchronized) and the get/insert pattern is possibly racy.
That said I since the read coders are cached, I think I will just change to regular maps under synchronization and ditch the concurrenthashmap.
There was a problem hiding this comment.
was this a bug? rowIndex and encodingPos looks different
There was a problem hiding this comment.
Yes this along with other nullfields fix above is the purpose of this PR to fix #32388 .
The stack trace and synchronization changes were added as the initial belief was that encoded corruption was due to late overrides arriving. Since that could still be an issue, I think we should keep those changes but I can separate them to a separate PR if you'd prefer.
There was a problem hiding this comment.
Note that unless there are encoding overrides rowIndex and encodingPos are equal. But the improved unit tests catch the issue, previous tests with encoding overrides didn't have null fields and thus missed it.
…decoding the bit set representing null fields.
70ab14b to
a52564a
Compare
…decoding the bit set representing null fields. (apache#32389)
Add tests that fail without the change covering encoding and decoding
Also add tests that cover the static position overrides which was not tested previously.
Some other cleanup to help debug other possible encoding positions issues in the future:
fixes #32388
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.