Skip to content

The S3ToGCSOperator fails on templated dest_gcs URL #14682

@nvembar

Description

@nvembar

Apache Airflow version:

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: Docker

What happened:

When passing a templatized dest_gcs argument to the S3ToGCSOperator operator, the DAG fails to import because the constructor attempts to test the validity of the URL before the template has been populated in execute.

The error is:

Broken DAG: [/opt/airflow/dags/bad_gs_dag.py] Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/gcs.py", line 1051, in gcs_object_is_directory
    _, blob = _parse_gcs_url(bucket)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/gcs.py", line 1063, in _parse_gcs_url
    raise AirflowException('Please provide a bucket name')
airflow.exceptions.AirflowException: Please provide a bucket name

What you expected to happen:

The DAG should successfully parse when using a templatized dest_gcs value.

How to reproduce it:

Instantiating a S3ToGCSOperator task with dest_gcs="{{ var.gcs_url }}" fails.

Details
from airflow.decorators import dag
from airflow.utils.dates import days_ago
from airflow.providers.google.cloud.transfers.s3_to_gcs import S3ToGCSOperator


@dag(
    schedule_interval=None,
    description="Demo S3-to-GS Bug",
    catchup=False,
    start_date=days_ago(1),
)
def demo_bug():

    S3ToGCSOperator(
        task_id="transfer_task",
        bucket="example_bucket",
        prefix="fake/prefix",
        dest_gcs="{{ var.gcs_url }}",
    )


demo_dag = demo_bug()

Anything else we need to know:

Should be fixable by moving the code that evaluates whether the URL is a folder to execute().

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions