Skip to content

kv,bulkio: throttle per-store column/index backfill requests #82556

@irfansharif

Description

@irfansharif

Is your feature request related to a problem? Please describe.

Using this issue to track the general case of index/column backfill induced performance impact.

In support escalations (https://github.com/cockroachlabs/support/issues/1628) we've observed that column backfills for a large table was able to consume all available disk write bandwidth on stores (caps out at 150mb/s in the graph below, what the store was provisioned with), resulting in starvation for foreground requests on those stores. The bandwidth saturation led to log commit p99s in the order of seconds (see graph below).

image

image

In internal experimentation (#admission-control) we've also observed throughput/latency effects due to aggressive follower write activity.

Describe the solution you'd like

Disbursing byte-sized IO tokens over time for requests serving these large + long running bulk operations, controlling how much bandwidth use for background operations/ensuring foreground traffic has available capacity. Or something simpler (+backportable) shorter term that aims for a bandwidth target and paces incoming batch requests accordingly. Or perhaps introducing simpler client side knobs to control the rates at which we issue these requests to KV.

Additional context

Relates broadly to #75066 + #79092. Unclear if addressed by #82440, need a repro.

Jira issue: CRDB-16542

Metadata

Metadata

Assignees

Labels

A-admission-controlA-kvAnything in KV that doesn't belong in a more specific category.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)N-followupNeeds followup.O-postmortemOriginated from a Postmortem action item.sync-me

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions