kv, storage: rebalance replicas when disk throughput / IOPS drops

We occasionally see instances of large production clusters where one node inexplicably got a slower disk (often an AWS/GCP local ssd), and the replicas on that node kept falling further and further behind in writes than the rest of the cluster. And since storage level compactions also take up disk write throughput, the most obvious symptom of this often is compactions backing up and Pebble read amplification increasing.

When a node is disproportionately slower at committing to disk than other nodes, replicas on that node need to be balanced away so that that disk doesn't continue to be overloaded with writes.

One metric that can be observed to identify disk slowness is command commit latency; since a `LogData` will have to wait for batches ahead of it to be written to the WAL, an increase in the latency of a `LogData` call would signal a slow disk. We already leverage `LogData` as part of node liveness heartbeats; before a node responds to a heartbeat request, it does a `LogData` to each store's engine. Here's the associated comment from `liveness.go`, which suggests that we already move leases when this latency increases:

```
			// We synchronously write to all disks before updating liveness because we
			// don't want any excessively slow disks to prevent leases from being
			// shifted to other nodes. A slow/stalled disk would block here and cause
			// the node to lose its leases.
```

Other possibilities of metrics to react to could include changes in disk write ops or cross-node differences in disk-write ops; the production instances of this issue that we've observed tend to show a significantly lower disk write ops on the affected node as opposed to on other nodes (in what is still an IO-bounded workload).


gz#9005

Jira issue: CRDB-2831


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv, storage: rebalance replicas when disk throughput / IOPS drops #62168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kv, storage: rebalance replicas when disk throughput / IOPS drops #62168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions