[Python BQ] Allow setting a fixed number of Storage API streams#28592
[Python BQ] Allow setting a fixed number of Storage API streams#28592ahmedabu98 merged 5 commits intoapache:masterfrom
Conversation
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
|
Run Python_Xlang_Gcp_Direct PostCommit |
|
Run Python_Xlang_Gcp_Dataflow PostCommit |
Codecov Report
@@ Coverage Diff @@
## master #28592 +/- ##
==========================================
- Coverage 72.23% 72.20% -0.03%
==========================================
Files 684 684
Lines 100991 101132 +141
==========================================
+ Hits 72949 73021 +72
- Misses 26463 26532 +69
Partials 1579 1579
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 11 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| all of FILE_LOADS, STREAMING_INSERTS, and STORAGE_WRITE_API. Only | ||
| applicable to unbounded input. | ||
| num_storage_api_streams: If set, the Storage API sink will default to | ||
| using this number of write streams. Only applicable to unbounded data. |
There was a problem hiding this comment.
num_storage_api_streams: specifies the number of write streams that the Storage API sink will use. This parameter is only applicable to unbounded data.
There was a problem hiding this comment.
Shall we check this should be always set now for the unbounded data since it won't work otherwise.
There was a problem hiding this comment.
Shall we check this should be always set now for the unbounded data since it won't work otherwise.
streaming writes with at-least-once still works without setting this parameter
There was a problem hiding this comment.
changed the documentation, thanks for the suggestion!
There was a problem hiding this comment.
I see. Shall we add something like "This parameter must be set for Storage API writes with the exactly once method."?
There was a problem hiding this comment.
I hesitate on doing this because conventionally we don't do runner-based checks in the SDK.
There was a problem hiding this comment.
I clarified it in the public BigQuery connector doc (https://beam.apache.org/documentation/io/built-in/google-bigquery/)
|
Just had minor comments. LGTM from the Python side. Thanks. |
| streaming=True, | ||
| allow_unsafe_triggers=True) | ||
|
|
||
| auto_sharding = num_streams == 0 |
There was a problem hiding this comment.
Took me some time to parse it a bit, nit and might be common practice, but auto_sharding = (num_streams == 0) looks better
…age API streams (#28618) * expose num streams option; fix some streaming tests * add test phrase in description * lint fix
…he#28592) * expose num streams option; fix some streaming tests * clarify docs; address nit
…ms (apache#28592)" (apache#28613) This reverts commit 04a26da.
Allow users to set a number of streams for the Storage Write API sink.
Fixes #28587
Provides a workaround for issue in #28168
Update: had to revert this (#28613) and fix some linting errors. PR after fixes is in #28618