Skip to content

perf: command commit latency is highly correlated with range count #30213

@nvb

Description

@nvb

On a cluster running TPC-C for a few days, I've noticed that the p99 command commit latency and the p99 log commit latency are both slowly growing. This growth seems to be highly correlated with the range count in the cluster.

screenshot_2018-09-13 custom chart debug cockroach console

screenshot_2018-09-13 custom chart debug cockroach console 1

Interestingly, TPC-C has a fixed amount of load, so it would appear that the range count itself is the only moving variable here. More ranges but a fixed amount of load would result in less batching of RocksDB writes because fewer writes would take place in the same Raft groups. However, our RocksDB commit pipeline attempts to transparently batch independent writes together, so this should help avoid this kind of issue:

var leader bool
c.pending, c.groupSize, leader = makeBatchGroup(c.pending, r, c.groupSize, maxBatchGroupSize)

I'd like to instrument this pipeline and see if there are any inefficiencies in it. Specifically, I'd like to check whether the pipeline remains full as the number of batches that it attempts to batch together grows. For instance, it may be that case that the write batch merging begins to take longer than the RocksDB writes themselves. This would allow for gaps in the pipeline where the RocksDB syncLoop remains idle.

Metadata

Metadata

Assignees

Labels

A-storageRelating to our storage engine (Pebble) on-disk storage.C-performancePerf of queries or internals. Solution not expected to change functional behavior.X-stale

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions