-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
2.6.0
What happened
When accessing data in a "Requester Pays" bucket, the user's project needs to be supplied in the storage client's definition of the bucket, or set in the acl. When calling the "list" or "download" function from the GCSHook, there is no place to supply a user project id. This results in the following error: Bucket is a requester pays bucket but no user project provided.
This is explicit in the GCP documentation.
What you think should happen instead
In the "insert_bucket_acl" function in the GCSHook, a user_project is optionally supplied for Requester Pays projects. This code looks like this:
""":param user_project: (Optional) The project to be billed for this request.
Required for Requester Pays buckets."""
if user_project:
bucket.acl.user_project = user_project
bucket.acl.save()
I believe this code should be added to the list and download functions as well. This should also fix any operators from GCP to GCP/S3/Azure that want to transfer data from a "Requester Pays" bucket.
How to reproduce
Call hook.list() on any GCS bucket with Requester Pays enabled
hook = GCSHook(
gcp_conn_id=self.gcp_conn_id,
delegate_to=self.delegate_to,
impersonation_chain=self.google_impersonation_chain,
)
self.log.info(
'Getting list of the files. Bucket: %s; Delimiter: %s; Prefix: %s',
self.bucket,
self.delimiter,
self.prefix,
)
files = hook.list(bucket_name=self.bucket,
prefix=self.prefix,
delimiter=self.delimiter)
Operating System
Debian 11
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct