Skip to content

SNOW-721174: Support COPY INTO <location> with storage integration and gcs bucket #368

@dahlbaek

Description

@dahlbaek

What is the current behavior?

The COPY INTO <location> statement support gcs buckets, but there is no GCSBucket ClauseElement in https://github.com/snowflakedb/snowflake-sqlalchemy/blob/a84fa77ae4387ed60db036db61dc7d8f82849a98/src/snowflake/sqlalchemy/custom_commands.py. Similarly, the statement supports using a STORAGE_INTEGRATION option, but that option is not supported by CopyInto

class CopyInto(UpdateBase):
"""Copy Into Command base class, for documentation see:
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html"""
__visit_name__ = "copy_into"
_bind = None
def __init__(self, from_, into, formatter=None):
self.from_ = from_
self.into = into
self.formatter = formatter
self.copy_options = {}
def __repr__(self):
"""
repr for debugging / logging purposes only. For compilation logic, see
the corresponding visitor in base.py
"""
return f"COPY INTO {self.into} FROM {repr(self.from_)} {repr(self.formatter)} ({self.copy_options})"
def bind(self):
return None
def force(self, force):
if not isinstance(force, bool):
raise TypeError("Parameter force should be a boolean value")
self.copy_options.update({"FORCE": translate_bool(force)})
return self
def single(self, single_file):
if not isinstance(single_file, bool):
raise TypeError("Parameter single_file should be a boolean value")
self.copy_options.update({"SINGLE": translate_bool(single_file)})
return self
def maxfilesize(self, max_size):
if not isinstance(max_size, int):
raise TypeError("Parameter max_size should be an integer value")
self.copy_options.update({"MAX_FILE_SIZE": max_size})
return self
def files(self, file_names):
self.copy_options.update({"FILES": FilesOption(file_names)})
return self
def pattern(self, pattern):
self.copy_options.update({"PATTERN": pattern})
return self

What is the desired behavior?

The CopyInto class supports Copy Into <location> statements of the type

COPY INTO '<gcs_bucket>'
FROM (<select>)
FILE_FORMAT = ( TYPE = PARQUET )
STORAGE_INTEGRATION = <gcs_storage_integration>
HEADER = TRUE;

How would this improve snowflake-connector-python?

This would make it possible to use the CopyInto class on order to execute such variants of the COPY INTO <location> statement.

References, Other Background

We're a small company that make use of the snowflake-sqlalchemy library. If someone will give us pointers as to what we should add in terms of classes and tests, we can allocate time to work on a pr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurestatus-triage_doneInitial triage done, will be further handled by the driver team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions