Skip to content

WriteToBigQuery ignores insert_retry_strategy on HttpErrors #21080

@damccorm

Description

@damccorm

insertAll will retry forever on a streaming pipeline running on 2.31.0, with insert_retry_strategy=RetryStrategy.RETRY_NEVER, and create_disposition=BigQueryDisposition.CREATE_NEVER.

Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in BigQueryWriteFn.FAILED_ROWS and these errors repeated in the logs:


Error message from worker: generic::unknown: Traceback (most recent call last):
  File "apache_beam/runners/common.py",
line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
  File "apache_beam/runners/common.py",
line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "apache_beam/runners/common.py",
line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1268, in finish_bundle
    return self._flush_all_batches()
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1278, in _flush_all_batches
    for destination in list(self._rows_buffer.keys())
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1279, in <listcomp>
    if self._rows_buffer[destination]
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
line 1312, in _flush_batch
    skip_invalid_rows=True)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 1125, in insert_rows
    project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
  File
"/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
    return
fun(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 637, in _insert_all_rows
    response = self.client.tabledata.InsertAll(request)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 795, in InsertAll
    config, request, global_params=global_params)
  File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)

 File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse

   self.__ProcessHttpResponse(method_config, http_response, request))
  File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
    http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpNotFoundError:
HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8',
'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection':
'0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length':
'344', '-content-encoding': 'gzip'}>, content <{
  "error": {
    "code": 404,
    "message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
    "errors": [
      {
        "message": "Not
found: Table <REDACTED>:testdb__dbo__raw.customers",
        "domain": "global",
        "reason":
"notFound"
      }
    ],
    "status": "NOT_FOUND"
  }
}
...

Possibly related to BEAM-12362. Had been running on 2.29.0 previously, which would send errors repeatedly with no trace:


There were errors inserting to BigQuery. Will not retry. Errors were []

2.31.0 is logging the errors but ignores retry strategy, preventing errors from being handled through FailedRows tag.

Imported from Jira BEAM-12783. Original Jira may contain additional context.
Reported by: ajdub980a.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions