Skip to content

copy: figure out how to reduce kv costs further #90743

@cucaroach

Description

@cucaroach

Copy is currently mostly bound by CPU time required to execute kv batch requests. We want to figure out how to reduce this costs so ideas about optimizing the SQL layer (removing datums etc) start to make more sense. Ideas to explore:

  • use columnar kv batches instead of row, @nvanbenschoten sez this could "reduce the number of ranges that a given batch touches, so each range would handle larger batches"
  • play with bigger batch sizes and kv.transaction.write_pipelining_max_batch_size, why does throughput max out at 100 rows?
  • is load based splitting kicking in, if not could we make it kick in sooner?
  • can we make a case that multiple concurrent requests should be possible to the same range? possibly only makes sense for reads.

See image for current COPY profile:

Screen Shot 2022-10-26 at 8 07 08 PM

  • can we do anything about the kvclient side costs which also seem to dominate string parsing and kv encoding (see image)
  • like can we do something about the sorting? ie could we split the per-index sorting costs across multiple goroutines? kv encoding costs too?

Screen Shot 2022-10-26 at 8 03 51 PM

Jira issue: CRDB-20919

Epic CRDB-18892

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-investigationFurther steps needed to qualify. C-label will change.T-storageStorage Team

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions