Skip to content

GCSSynchronizeBucketsOperator fails on 30-second timeout on large files #27488

@billsmithatg

Description

@billsmithatg

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==6.7.0

Apache Airflow version

v2.2.3+composer

Operating System

whatever Linux version Cloud Composer uses

Deployment

Composer

Deployment details

No response

What happened

Using GCSSynchronizeBucketsOperator to clone a folder between two buckets. There are several thousand objects, and they range in size from 50MB to 10GB.

At some point the operation fails with this error: `google.api_core.exceptions.GoogleAPICallError: 413 POST https://storage.googleapis.com/storage/v1/b//o/zoominfo%2FZI_COMP_DESCRIPTION_20211001.csv.gz/copyTo/b//o/zoominfo2%2FZI_COMP_DESCRIPTION_20211001.csv.gz?prettyPrint=false: Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method in the JSON API (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.

What you think should happen instead

GCSSynchronizeBucketsOperator should use the Rewrite method so that it can deal with copies that do not complete within 30 seconds.

How to reproduce

Use GCSSynchronizeBucketsOperator to copy a 10GB file.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions