Skip to content

Improve COPY and batch insert performance #91831

@cucaroach

Description

@cucaroach

COPY/Batch insert performance master tracking bug

Batch inserts and COPY in CRDB are really slow (<1MB/s throughput). Below is a list of things we can do to speed up batch inserts and COPY in the front of the house. Once these are done further speedups will require work at the the KV/storage layers.

COPY:

Batch insert:

IMPORT:

  • Make import use datum free vectorized code

INDEX BACKFILL:

Does it make sense for index backfill to user columnar encoding code? Don't see why not...

Speculative, not currently being persued ideas:

  • Can we just do an import if the table was created in the same transaction as the COPY? See sql: support COPY ... with FREEZE #85573
  • Can batch inserts cheat if table was created in same transaction? Can we detect an empty table and use AddSST? Empty table/same transaction DDL tricks feel like wasted energy, most inserts won’t be into empty table and won’t be in same txn as create table.
  • Can we take secondary indexes offline and update them lazily outside scope of transaction? Maybe just for non-unique indexes?

Jira issue: CRDB-21448

Epic CRDB-25320

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries Teammeta-issueContains a list of several other issues.

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions