-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
I'm using apache-airflow-providers-google==8.2.0, but it looks like the relevant code that's causing this to occur is still in use as of 8.8.0.
Apache Airflow version
2.3.2
Operating System
Debian (from Docker image apache/airflow:2.3.2-python3.10)
Deployment
Official Apache Airflow Helm Chart
Deployment details
Deployed on an EKS cluster via Helm.
What happened
The first task in one of my DAGs is to create an empty BigQuery table using the BigQueryCreateEmptyTableOperator as follows:
create_staging_table = BigQueryCreateEmptyTableOperator(
task_id="create_staging_table",
dataset_id="my_dataset",
table_id="tmp_table",
schema_fields=[
{"name": "field_1", "type": "TIMESTAMP", "mode": "NULLABLE"},
{"name": "field_2", "type": "INTEGER", "mode": "NULLABLE"},
{"name": "field_3", "type": "INTEGER", "mode": "NULLABLE"}
],
exists_ok=False
)Note that exists_ok=False explicitly here, but it is also the default value.
This task exits with a SUCCESS status even when my_dataset.tmp_table already exists in a given BigQuery project. The task returns the following logs:
[2023-02-02, 05:52:29 UTC] {bigquery.py:875} INFO - Creating table
[2023-02-02, 05:52:29 UTC] {bigquery.py:901} INFO - Table my_dataset.tmp_table already exists.
[2023-02-02, 05:52:30 UTC] {taskinstance.py:1395} INFO - Marking task as SUCCESS. dag_id=my_fake_dag, task_id=create_staging_table, execution_date=20230202T044000, start_date=20230202T055229, end_date=20230202T055230
[2023-02-02, 05:52:30 UTC] {local_task_job.py:156} INFO - Task exited with return code 0
What you think should happen instead
Setting exists_ok=False should raise an exception and exit the task with a FAILED status if the table being created already exists in BigQuery.
How to reproduce
- Deploy Airflow 2.3.2 running Python 3.10 in some capacity
- Ensure
apache-airflow-providers-google==8.2.0(or 8.8.0, as I don't believe the issue has been fixed) is installed on the deployment. - Set up a GCP project and create a BigQuery dataset.
- Create an empty BigQuery table with a schema.
- Create a DAG that uses the
BigQueryCreateEmptyTableOperatorto create a new BigQuery table. - Run the DAG from Step 5 on the Airflow instance deployed in Step 1.
- Observe the task's status.
Anything else
I believe the silent failure may be occurring here, as the except statement results in a log output, but doesn't actually raise an exception or change a state that would make the task fail.
If this is in fact the case, I'd be happy to submit a PR, but appreciate any input as to any error-handling standards/consistencies that this provider package maintains.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct