Skip to content

perf: quantify space/write amplification #18659

@petermattis

Description

@petermattis

Forked from #18657:

The disk io though is quite worrying at around 26MiB writes/s and 8% CPU spent in iowait as indicated by dstat. The data being updated is very small (one integer). Granted, CockroachDB keeps all past values so let's assume each update is like an insert. The string has 3 bytes plus 4 byte integer plus overhead for metadata and encoding. Let's assume a generous 64 bytes per entry. At 2500 qps, that would be around 256KiB/s. LSM storage engines have write amplification. Not sure how many levels were generated in this test but I'd assume not too many. So let's assume each row is actually written 4 times as time goes by. That's 1MiB/s. Still off by a factor of 26. Not sure where all this disk io comes from but it seems excessive.

Unless you were running for a long period of time, the LSM write amplification should be a non-factor. Note that every write involves a 4x write amplification outside of the LSM due to writing the Raft log and then subsequently committing the command. Both the Raft log write and the committing of the command involve a 2x write amplification due to RocksDB writing to its internal WAL and then subsequently flushing the memtable. There is some a possibility this will be reduced in the future. See #16948.

But that write amplification still does explain the 26MiB/sec that you're seeing. The 64 bytes per entry might be a bit pessimistic given some of the other Raft and internal state that is written on each write. I'm going to file an issue to investigate in detail where the space overhead for each write is coming from.

The schema in question is:

CREATE TABLE bench (s string primary key, n int);

And the inserts have the form:

INSERT INTO bench (s, n) VALUES($1, 1) ON CONFLICT (s) DO UPDATE SET n = bench.n + 1

The load generator script is generating random 3 letter keys using upper-case characters giving a total of 17576 rows.

This issue is about understanding where the space and write amplification is coming from. Some places to investigate are the overhead of RaftCommand, raftpb.HardState and MVCCStats which are all written on every write.

Metadata

Metadata

Assignees

Labels

C-performancePerf of queries or internals. Solution not expected to change functional behavior.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions