Skip to content

GCSHook's functions for list and download do not work with Requester Pays buckets. #31137

@ABoothInTheWild

Description

@ABoothInTheWild

Apache Airflow version

2.6.0

What happened

When accessing data in a "Requester Pays" bucket, the user's project needs to be supplied in the storage client's definition of the bucket, or set in the acl. When calling the "list" or "download" function from the GCSHook, there is no place to supply a user project id. This results in the following error: Bucket is a requester pays bucket but no user project provided.

This is explicit in the GCP documentation.

What you think should happen instead

In the "insert_bucket_acl" function in the GCSHook, a user_project is optionally supplied for Requester Pays projects. This code looks like this:

""":param user_project: (Optional) The project to be billed for this request.
            Required for Requester Pays buckets."""

if user_project:
    bucket.acl.user_project = user_project
bucket.acl.save()

I believe this code should be added to the list and download functions as well. This should also fix any operators from GCP to GCP/S3/Azure that want to transfer data from a "Requester Pays" bucket.

How to reproduce

Call hook.list() on any GCS bucket with Requester Pays enabled


hook = GCSHook(
    gcp_conn_id=self.gcp_conn_id,
    delegate_to=self.delegate_to,
    impersonation_chain=self.google_impersonation_chain,
)

self.log.info(
    'Getting list of the files. Bucket: %s; Delimiter: %s; Prefix: %s',
    self.bucket,
    self.delimiter,
    self.prefix,
)

files = hook.list(bucket_name=self.bucket,
                  prefix=self.prefix,
                  delimiter=self.delimiter)

Operating System

Debian 11

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions