Skip to content

BigQueryCreateEmptyTableOperator exists_ok parameter doesn't throw appropriate error when set to "False" #29301

@mfjackson

Description

@mfjackson

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

I'm using apache-airflow-providers-google==8.2.0, but it looks like the relevant code that's causing this to occur is still in use as of 8.8.0.

Apache Airflow version

2.3.2

Operating System

Debian (from Docker image apache/airflow:2.3.2-python3.10)

Deployment

Official Apache Airflow Helm Chart

Deployment details

Deployed on an EKS cluster via Helm.

What happened

The first task in one of my DAGs is to create an empty BigQuery table using the BigQueryCreateEmptyTableOperator as follows:

create_staging_table = BigQueryCreateEmptyTableOperator(
    task_id="create_staging_table",
    dataset_id="my_dataset",
    table_id="tmp_table",
    schema_fields=[
        {"name": "field_1", "type": "TIMESTAMP", "mode": "NULLABLE"},
        {"name": "field_2", "type": "INTEGER", "mode": "NULLABLE"},
        {"name": "field_3", "type": "INTEGER", "mode": "NULLABLE"}
    ],
    exists_ok=False
)

Note that exists_ok=False explicitly here, but it is also the default value.

This task exits with a SUCCESS status even when my_dataset.tmp_table already exists in a given BigQuery project. The task returns the following logs:

[2023-02-02, 05:52:29 UTC] {bigquery.py:875} INFO - Creating table
[2023-02-02, 05:52:29 UTC] {bigquery.py:901} INFO - Table my_dataset.tmp_table already exists.
[2023-02-02, 05:52:30 UTC] {taskinstance.py:1395} INFO - Marking task as SUCCESS. dag_id=my_fake_dag, task_id=create_staging_table, execution_date=20230202T044000, start_date=20230202T055229, end_date=20230202T055230
[2023-02-02, 05:52:30 UTC] {local_task_job.py:156} INFO - Task exited with return code 0

What you think should happen instead

Setting exists_ok=False should raise an exception and exit the task with a FAILED status if the table being created already exists in BigQuery.

How to reproduce

  1. Deploy Airflow 2.3.2 running Python 3.10 in some capacity
  2. Ensure apache-airflow-providers-google==8.2.0 (or 8.8.0, as I don't believe the issue has been fixed) is installed on the deployment.
  3. Set up a GCP project and create a BigQuery dataset.
  4. Create an empty BigQuery table with a schema.
  5. Create a DAG that uses the BigQueryCreateEmptyTableOperator to create a new BigQuery table.
  6. Run the DAG from Step 5 on the Airflow instance deployed in Step 1.
  7. Observe the task's status.

Anything else

I believe the silent failure may be occurring here, as the except statement results in a log output, but doesn't actually raise an exception or change a state that would make the task fail.

If this is in fact the case, I'd be happy to submit a PR, but appreciate any input as to any error-handling standards/consistencies that this provider package maintains.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions