-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.6.2
What happened
When using the GCSToBigQueryOperator in deferrable mode with an impersonation_chain service account which has a default project_id that is different from the project_id specified in the operator arguments, a failure occurs.
[2023-06-23, 11:38:37 UTC] {taskinstance.py:1824} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py", line 447, in execute_complete
raise AirflowException(event["message"])
airflow.exceptions.AirflowException: 404, message='Not Found: {\n "error": {\n "code": 404,\n "message": "Not found: Job king-cdmr-etl-sandbox:airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd",\n "errors": [\n {\n "message": "Not found: Job king-cdmr-etl-sandbox:airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd",\n "domain": "global",\n "reason": "notFound"\n }\n ],\n "status": "NOT_FOUND"\n }\n}\n', url=URL('https://www.googleapis.com/bigquery/v2/projects/king-cdmr-etl-sandbox/jobs/airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd')
I believe this happens because, although the BigQuery job to insert data, is raised against self.project_id in _submit_job, when in deferrable mode it tries to find the job within the project in self.hook.project_id.
It is possible that that the default project_id assigned to the impersonation chain service account is different to the project_id specified to the operator.
In the above error, you can see that the error says that it cannot find the job_id airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd in the project king-cdmt-etl-sandbox.
In fact this job_id was created successfully in the project king-coredatasets-sandbox
What you think should happen instead
I think that we should modify the call to self.defer to receive self.project_id rather than self.hook.project_id
How to reproduce
I haven't quite got the exact steps to reproduce but I will submit a PR for review soon.
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow-providers-google==10.0.0
Deployment
Astronomer
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct