Skip to content

IMPORT failing to create inverted indexes  #32468

@tim-o

Description

@tim-o

Describe the problem

Running 2.1.0, bulk CSV import fails to create an inverted index for this payload (CockroachLabs only).

The index can be created successfully on the same table as long as its done after the import.

To Reproduce

  1. Attempt to import and create an index, get a SST creation error:
root@:26257/defaultdb> IMPORT TABLE defaultdb.public.skusv2_30 (id UUID PRIMARY KEY, skujson JSONB, masterskuid UUID, createdby UUID, createddate string, lastsavedby UUID, lastsaveddate string, isdeleted BOOL, availabilityjson JSONB, quantityjson JSONB, INVERTED INDEX quantityjson_idx (quantityjson)) CSV DATA ('nodelocal:///SkusV2_full_export30.csv.gz') WITH decompress = 'gzip', "nullif" = '', skip = '1';
pq: SST creation error at /Table/85/2/"quantities"/Arr/"quantity"/0/"\x00\xc5\xf1\x1d\xf8bMۭ\x9f\x95\xe5\x84\xf6\xf7\xb6"/0; this can happen when a primary or unique index has duplicate keys: Invalid argument: Keys must be added in order
  1. Remove the inverted index, complete import:
root@:26257/defaultdb> IMPORT TABLE defaultdb.public.skusv2_30 (id UUID PRIMARY KEY, skujson JSONB, masterskuid UUID, createdby UUID, createddate string, lastsavedby UUID, lastsaveddate string, isdeleted BOOL, availabilityjson JSONB, quantityjson JSONB) CSV DATA ('nodelocal:///SkusV2_full_export30.csv.gz') WITH decompress = 'gzip', "nullif" = '', skip = '1';
        job_id       |  status   | fraction_completed |  rows  | index_entries | system_records |   bytes
+--------------------+-----------+--------------------+--------+---------------+----------------+------------+
  401644233494659073 | succeeded |                  1 | 166493 |             0 |              0 | 6303594991
(1 row)
  1. Create the index without an issue:
root@:26257/defaultdb> create inverted index on skusv2_30 (quantityjson);
CREATE INDEX

Time: 17.085579657s

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions