-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[Feature Request]: Enable withFormatRecordOnFailureFunction() equivalent for BigQuery STORAGE_WRITE_API #31354
Description
What would you like to happen?
We have a dataflow pipeline that reads data from PubSub and writes to BigQuery.
Current status:
BigQuery write method: STREAMING_INSERTS
Function used to get BigQuery deadletter: getFailedInsertsWithErr()
Deadletter format function: withFormatRecordOnFailureFunction()
We have a pipeline that writes multiple events dynamically to different BQ destination tables. withFormatRecordOnFailureFunction() is currently used to transform the bad inserts to the desired format for further deadletter processing - rather than return the original TableRow itself, we provide a customized function encoding the returned TableRow object by adding eventType field thus we can figure out which table it writes to.
As we are enhancing the pipeline by using STORAGE_WRITE_API, we are facing the below issue.
BigQuery write method: STORAGE_WRITE_API
Function used to get BigQuery deadletter: getFailedStorageApiInserts() (as getFailedInsertsWithErr() cannot be used for STORAGE_WRITE_API)
Deadletter format function: N/A
Without a withFormatRecordOnFailureFunction() equivalent, we cannot format the failed inserts TableRows which is a blocker for our upgrade.
How this might work:
Add withFormatRecordOnFailureFunction() equivalent for BigQuerySTORAGE_WRITE_API
As we are dynamically writing to BigQuery tables, the Pcollection may contains multiple eventTypes, we will lose the eventType info if the failure transformation function cannot be added to those failure inserts.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner