Skip to content

perf: separate latest and old MVCC versions for faster reads #1170

@sumeerbhola

Description

@sumeerbhola

This issue is different from #847 since this one is specifically about how Pebble is used in CockroachDB to store MVCC data. When there are many MVCC versions for a key, reads slow down because (a) the block cache is less effective since most of each block is data that is not needed, (b) when doing scans one needs to seek from one versioned key to another to skip iterating over all the versions.

A prerequisite to doing something here is Pebble gaining some understanding of MVCC timestamps. That prerequisite is also the case for compaction-time GC cockroachdb/cockroach#57260. One can also argue that compaction-time GC may make separating the latest and old MVCC versions less important.

If we ignore provisional values, a rough approach would be to put “live” and “dead” versions of data into separate files. This should not be hard when the multiple versions are in the same sstable -- when generating that sstable instead of writing one sstable we would write a pair, the “live” and “dead” sstable. These will have overlapping keys and will participate in compactions as a pair. A read at the “current time” can ignore the "dead" sstable. But what is “current time” given that we don’t have TrueTime like spanner? One possibility is to track the newest timestamp in the “live” sstable -- if the txn timestamp is newer than this newest timestamp there is no possibility that it will need anything from the “dead” sstable. This should work for sstables in lower levels which have most of the data.
The above approach will need adjustment to handle provisional values and intents.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions