Skip to content

sql: large latency spikes when creating a large storing index while running tpcc #45888

@rohany

Description

@rohany

Investigation into #44501 and #44504 have uncovered that the core of the problem seems to be creating the index that stores all the columns.

To reproduce:

roachprod create $CLUSTER -n 4 --clouds=aws  --aws-machine-type-ssd=c5d.4xlarge
roachprod stage $CLUSTER:1-3 cockroach
roachprod stage $CLUSTER:4 workload
roachprod start $CLUSTER:1-3
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=2500 --db=tpcc --checks=false"
roachprod run $CLUSTER:4 "./workload run tpcc --ramp=5m --warehouses=2500 --active-warehouses=2000 --split --scatter {pgurl:1-3}"

After the ramp period, run in another shell

roachprod sql $CLUSTER:3
> use tpcc;
> create unique index on customer (c_w_id, c_d_id, c_id) storing (c_first, c_middle, c_last, c_street_1, c_street_2, c_city, c_state, c_zip, c_phone, c_since, c_credit, c_credit_lim, c_discount, c_balance, c_ytd_payment, c_payment_cnt, c_delivery_cnt, c_data);

After some time, large p99 latency spikes can be witnessed, sometimes going up to multiple seconds.

Epic CRDB-8816

Jira issue: CRDB-5120

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-ux-surpriseIssue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.T-disaster-recoveryX-staleno-issue-activity

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions