Change Reify.asIterable to GBK in BigQueryIO File loads#33587
Change Reify.asIterable to GBK in BigQueryIO File loads#33587Abacn merged 4 commits intoapache:masterfrom
Conversation
| PCollectionTuple partitions = | ||
| results | ||
| .apply("ReifyResults", new ReifyAsIterable<>()) | ||
| .apply("ReifyResults", new CombineAsIterable<>()) |
There was a problem hiding this comment.
This won't have any output for an empty PCollection (whereas ReifyAsIterable does).
There was a problem hiding this comment.
yes that's right, and it fails a unit test (changed below).
honestly I'm not fully understand the implication of this change of behavior. One thing I could wonder is that if a pipeline didn't have any input before, it would still create an empty table; while now it does not.
Could this be a problem?
There was a problem hiding this comment.
I think this could be a problem. (I'm also concerned because the underlying issue is still there...)
There was a problem hiding this comment.
The customer is waiting for this workaround. @Abacn possible to add one option to enable this workaround?
There was a problem hiding this comment.
Yes we can add a pipeline option if choose to not make it default. Then customer could still face the issue until enable this option.
There was a problem hiding this comment.
It's now guarded by a pipeline option "--groupFilesFileLoad". PTAL, thanks!
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
liferoad
left a comment
There was a problem hiding this comment.
Can we add this to CHANGES.md and mention the user should use this option if they experience any issues with BatchLoads?
* Change Reify.asIterable to GBK in BigQueryIO File loads * trigger postcommit * the change is only effective when explicitly enabled * update CHANGES.md
* Change Reify.asIterable to GBK in BigQueryIO File loads * trigger postcommit * the change is only effective when explicitly enabled * update CHANGES.md
Workaround an unknown side input bug reported internally
internal ref: 373833916
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.