-
Notifications
You must be signed in to change notification settings - Fork 4.1k
CockroachDB Serverless: IMPORT INTO using CSV with array-column fails with syntax error #84631
Description
Describe the problem
I'm trying to import CSV Data from a postgres export made using the following command:
psql --no-password -c "\COPY $table_name TO STDOUT WITH NULL AS 'SQLNULL' CSV HEADER FORCE QUOTE *" > "$PG_BACKUP_PATH/tables/$table_name.csv"
I'm using the following command from a SQL console to import the data to Cockroach Serverless:
IMPORT INTO gw2_api_tokens (account_id, gw2_account_id, creation_time, gw2_api_token, gw2_api_permissions, display_name, last_valid_check_time, is_valid)
CSV DATA ( 's3 presigned url' )
WITH skip = '1' ;
The column gw2_api_permissions is a STRING[].
When executing the above command, I see the following error:
[XXUUU] ERROR: <s3 presigned url>: error parsing row 2: parse "gw2_api_permissions" as STRING[]: at or near "{": syntax error (row: <uuid>,<uuid>,<timestamp>,<token>,{somevalue,anothervalue,anothervalue123},SomeName,<timestamp>,t)
The CSV I'm trying to import is fully quoted using ". The column gw2_api_permissions in the source CSV looks like this:
"{somevalue,anothervalue,anothervalue123}"
The issue also happens when using DELIMITED DATA:
IMPORT INTO gw2_api_tokens (account_id, gw2_account_id, creation_time, gw2_api_token, gw2_api_permissions, display_name, last_valid_check_time, is_valid)
DELIMITED DATA ( s3 presigned url' )
WITH skip = '1', fields_terminated_by = ',', fields_enclosed_by = '"', fields_escaped_by = '\' ;
To Reproduce
Table creation script:
CREATE TABLE test (
id UUID NOT NULL,
values TEXT[] NOT NULL,
PRIMARY KEY (id)
) ;Dummy csv:
"id","values"
"cebd93ac-d9df-4952-ad3f-e1571b67fbb3","{somevalue,anothervalue,anothervalue123}"- Set up a fresh CockroachDB Serverless Cluster
- Create the table above in the newly created CockroachDB Serverless Cluster
- Import the CSV above using
IMPORT INTO test (id, values) CSV DATA ( 'csvurl' ) WITH skip = '1' ;
Expected behavior
The data import should work
Additional data / screenshots
N/A
Environment:
- CockroachDB Version: 22.1.1 (Serverless)
- Client app: IntelliJ Query Console (JDBC)
Additional context
For the migration from a self-hosted postgres database to CockroachDB Serverless I'm planning to have a new node of my application running (disconnected from the LB at that time), but connected to CockroachDB Serverless. This application will create the schema on startup, without data.
I was planning to then create a dump of my postgres tables as CSV, import by running a script that creates presigned URLs for the CSVs stored in S3 and then switch the LB to use the new, already running node which is already connected to CockroachDB Serverless.
An alternative approach would be to create a full export (including the schema) using pg_dump and import the whole database using IMPORT PGDUMP. I decided not to take this approach for the following reasons:
- Without major refactoring of my application, I wouldn't be able to start a node without an existing schema
- The sql produced by
pg_dumpcontains expression indexesmd5(column)which are not yet supported byIMPORT PGDUMP
Jira issue: CRDB-17768
Epic CRDB-14049