-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv,bulkio: throttle per-store column/index backfill requests #82556
Description
Is your feature request related to a problem? Please describe.
Using this issue to track the general case of index/column backfill induced performance impact.
In support escalations (https://github.com/cockroachlabs/support/issues/1628) we've observed that column backfills for a large table was able to consume all available disk write bandwidth on stores (caps out at 150mb/s in the graph below, what the store was provisioned with), resulting in starvation for foreground requests on those stores. The bandwidth saturation led to log commit p99s in the order of seconds (see graph below).
In internal experimentation (#admission-control) we've also observed throughput/latency effects due to aggressive follower write activity.
Describe the solution you'd like
Disbursing byte-sized IO tokens over time for requests serving these large + long running bulk operations, controlling how much bandwidth use for background operations/ensuring foreground traffic has available capacity. Or something simpler (+backportable) shorter term that aims for a bandwidth target and paces incoming batch requests accordingly. Or perhaps introducing simpler client side knobs to control the rates at which we issue these requests to KV.
Additional context
Relates broadly to #75066 + #79092. Unclear if addressed by #82440, need a repro.
Jira issue: CRDB-16542

