Skip to content

CockroachDB Serverless: IMPORT INTO using CSV with array-column fails with syntax error #84631

@its-felix

Description

@its-felix

Describe the problem

I'm trying to import CSV Data from a postgres export made using the following command:

psql --no-password -c "\COPY $table_name TO STDOUT WITH NULL AS 'SQLNULL' CSV HEADER FORCE QUOTE *" > "$PG_BACKUP_PATH/tables/$table_name.csv"

I'm using the following command from a SQL console to import the data to Cockroach Serverless:

IMPORT INTO gw2_api_tokens (account_id, gw2_account_id, creation_time, gw2_api_token, gw2_api_permissions, display_name, last_valid_check_time, is_valid)
    CSV DATA ( 's3 presigned url' )
WITH skip = '1' ;

The column gw2_api_permissions is a STRING[].

When executing the above command, I see the following error:

[XXUUU] ERROR: <s3 presigned url>: error parsing row 2: parse "gw2_api_permissions" as STRING[]: at or near "{": syntax error (row: <uuid>,<uuid>,<timestamp>,<token>,{somevalue,anothervalue,anothervalue123},SomeName,<timestamp>,t)

The CSV I'm trying to import is fully quoted using ". The column gw2_api_permissions in the source CSV looks like this:

"{somevalue,anothervalue,anothervalue123}"

The issue also happens when using DELIMITED DATA:

IMPORT INTO gw2_api_tokens (account_id, gw2_account_id, creation_time, gw2_api_token, gw2_api_permissions, display_name, last_valid_check_time, is_valid)
    DELIMITED DATA ( s3 presigned url' )
WITH skip = '1', fields_terminated_by = ',', fields_enclosed_by = '"', fields_escaped_by = '\' ;

To Reproduce

Table creation script:

CREATE TABLE test (
    id UUID NOT NULL,
    values TEXT[] NOT NULL,
    PRIMARY KEY (id)
) ;

Dummy csv:

"id","values"
"cebd93ac-d9df-4952-ad3f-e1571b67fbb3","{somevalue,anothervalue,anothervalue123}"
  1. Set up a fresh CockroachDB Serverless Cluster
  2. Create the table above in the newly created CockroachDB Serverless Cluster
  3. Import the CSV above using IMPORT INTO test (id, values) CSV DATA ( 'csvurl' ) WITH skip = '1' ;

Expected behavior
The data import should work

Additional data / screenshots
N/A

Environment:

  • CockroachDB Version: 22.1.1 (Serverless)
  • Client app: IntelliJ Query Console (JDBC)

Additional context
For the migration from a self-hosted postgres database to CockroachDB Serverless I'm planning to have a new node of my application running (disconnected from the LB at that time), but connected to CockroachDB Serverless. This application will create the schema on startup, without data.

I was planning to then create a dump of my postgres tables as CSV, import by running a script that creates presigned URLs for the CSVs stored in S3 and then switch the LB to use the new, already running node which is already connected to CockroachDB Serverless.

An alternative approach would be to create a full export (including the schema) using pg_dump and import the whole database using IMPORT PGDUMP. I decided not to take this approach for the following reasons:

  • Without major refactoring of my application, I wouldn't be able to start a node without an existing schema
  • The sql produced by pg_dump contains expression indexes md5(column) which are not yet supported by IMPORT PGDUMP

Jira issue: CRDB-17768
Epic CRDB-14049

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-communityOriginated from the communityT-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)X-blathers-triagedblathers was able to find an owner

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions