Skip to content

storage: batch Raft entry application across ready struct #37426

@nvb

Description

@nvb

Old prototype: #15648.

The Raft proposal pipeline currently applies Raft entries one-by-one to our storage engine in handleRaftReadyRaftMuLocked. We've seen time and time again that batching writes to RocksDB provides a sizable speedup, so this is a perfect candidate for batching. This will come with three concrete wins:

  1. batching the writes reduces the total amount of work incident on the system
  2. batching the writes reduces the number of cgo calls performed by Raft processing
  3. batching the writes should speed up a single iteration of handleRaftReadyRaftMuLocked, allowing the Range to schedule and run the next iteration sooner

Put together, this should be a nice win to the amount of concurrency supported by the Raft proposal pipeline.

Implementation notes

To make this change we will need to transpose the order of operations in the committed entries loop and in processRaftCommand. Instead of the control flow looking like:

for each entry:
    check if failed
    if not:
        apply to engine
    ack client, release latches

it will look something like

for each entry:
    check if failed
    if not:
        apply to batch
commit batch
for each entry:
    ack client, release latches

Care will be needed around handling in-memory side effects of Raft entries correctly.

Predicted win

The improvement on single-range throughput as predicted by rafttoy is:

name                        old time/op    new time/op    delta
Raft/conc=256/bytes=256-16    15.6µs ± 5%    14.6µs ±15%  -6.87%  (p=0.046 n=16+20)

name                        old speed      new speed      delta
Raft/conc=256/bytes=256-16  16.4MB/s ± 5%  17.7MB/s ±17%  +8.30%  (p=0.043 n=16+20)

It's worth noting that rafttoy is using pebble instead of RocksDB. Writing to pebble doesn't incur a cgo call, which means that batching writes isn't quite as critical. This means that it's reasonable to expect the improvement to throughput in Cockroach will be even greater than this prediction.

Metadata

Metadata

Assignees

Labels

A-kv-replicationRelating to Raft, consensus, and coordination.C-performancePerf of queries or internals. Solution not expected to change functional behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions