Skip to content

Conversation

@hankehly
Copy link
Contributor

@hankehly hankehly commented Jul 1, 2023

(work in progress)

related: #31137

Summary

This PR adds support for retrieving data from Google Cloud Storage buckets with Requester Pays enabled.

Adds optional user_project argument to below operators and GCSHook methods.

Signature changed Tested in
GCSToS3Operator tests/system/providers/amazon/aws/example_gcs_to_s3.py
GCSListObjectsOperator tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSDeleteObjectsOperator tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSDeleteBucketOperator tests/system/providers/amazon/aws/example_gcs_to_s3.py, tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py, tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
GCSSynchronizeBucketsOperator
GCSToGCSOperator tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSToLocalFilesystemOperator tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
LocalFilesystemToGCSOperator tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py, tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
GCSToGoogleDriveOperator
GCSHook.copy
GCSHook.rewrite tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSHook.download tests/system/providers/amazon/aws/example_gcs_to_s3.py, tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
GCSHook.download_as_byte_array
GCSHook.provide_file tests/system/providers/amazon/aws/example_gcs_to_s3.py
GCSHook.provide_file_and_upload tests/system/providers/amazon/aws/example_gcs_to_s3.py
GCSHook.upload tests/system/providers/amazon/aws/example_gcs_to_s3.py, tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py, tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
GCSHook.exists tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py (check condition)
GCSHook.get_blob_update_time tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSHook.is_updated_after tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSHook.is_updated_between (passes user_project to GCSHook.get_blob_update_time)
GCSHook.is_updated_before (passes user_project to GCSHook.get_blob_update_time)
GCSHook.is_older_than (passes user_project to GCSHook.get_blob_update_time)
GCSHook.delete tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSHook.delete_bucket tests/system/providers/amazon/aws/example_gcs_to_s3.py, tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py, tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
GCSHook.list (_list) tests/system/providers/amazon/aws/example_gcs_to_s3.py, tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
GCSHook.list_by_timespan
GCSHook.get_size
GCSHook.get_crc32c
GCSHook.get_md5hash
GCSHook.compose
GCSHook.sync

Notes:

  • Add optional user_project parameter to each GCSHook method, and pass to calls to client.bucket
  • For operations that span multiple buckets, only one user_project can be supplied (all operations charged to this project)

Testing

Amazon provider:

  • GCSToS3Operator
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_DEFAULT_REGION=
export GOOGLE_APPLICATION_CREDENTIALS=
export GCP_PROJECT_ID=
pytest --system amazon tests/system/providers/amazon/aws/example_gcs_to_s3.py

Google provider:

  • GCSSynchronizeBucketsOperator
  • GCSToLocalFilesystemOperator
  • GCSListObjectsOperator
  • GCSToGCSOperator
  • GCSDeleteObjectsOperator
  • GCSDeleteBucketOperator
  • LocalFilesystemToGCSOperator
export SYSTEM_TESTS_ENV_ID=
export SYSTEM_TESTS_GCP_PROJECT=
export GOOGLE_APPLICATION_CREDENTIALS=
pytest --system google tests/system/providers/google/cloud/gcs/example_gcs_copy_delete.py
pytest --system google tests/system/providers/google/cloud/gcs/example_gcs_upload_download.py
  • GCSToGoogleDriveOperator (requires: Google Workspace Admin account, Google Drive API)
export SYSTEM_TESTS_ENV_ID=
export SYSTEM_TESTS_GCP_PROJECT=
export GOOGLE_APPLICATION_CREDENTIALS=
pytest --system google tests/system/providers/google/cloud/gcs/example_gcs_to_gdrive.py

@boring-cyborg boring-cyborg bot added area:providers area:system-tests provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Jul 1, 2023
@hankehly
Copy link
Contributor Author

The scope of this PR was too big. Closing in favor of smaller PRs.

@hankehly hankehly closed this Jul 22, 2023
@hankehly hankehly changed the title Add support for accessing data from GCS Requester Pays buckets (dead) Add support for accessing data from GCS Requester Pays buckets Jul 22, 2023
@hankehly hankehly changed the title (dead) Add support for accessing data from GCS Requester Pays buckets Add support for accessing data from GCS Requester Pays buckets Jul 22, 2023
@hankehly hankehly deleted the feature/issue-31137 branch July 22, 2023 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers area:system-tests provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant