perf: separate latest and old MVCC versions for faster reads

This issue is different from https://github.com/cockroachdb/pebble/issues/847 since this one is specifically about how Pebble is used in CockroachDB to store MVCC data. When there are many MVCC versions for a key, reads slow down because (a) the block cache is less effective since most of each block is data that is not needed, (b) when doing scans one needs to seek from one versioned key to another to skip iterating over all the versions.

A prerequisite to doing something here is Pebble gaining some understanding of MVCC timestamps. That prerequisite is also the case for compaction-time GC https://github.com/cockroachdb/cockroach/issues/57260. One can also argue that compaction-time GC may make separating the latest and old MVCC versions less important.

If we ignore provisional values, a rough approach would be to put “live” and “dead” versions of data into separate files. This should not be hard when the multiple versions are in the same sstable -- when generating that sstable instead of writing one sstable we would write a pair, the “live” and “dead” sstable. These will have overlapping keys and will participate in compactions as a pair. A read at the “current time” can ignore the "dead" sstable. But what is “current time” given that we don’t have TrueTime like spanner? One possibility is to track the newest timestamp in the “live” sstable -- if the txn timestamp is newer than this newest timestamp there is no possibility that it will need anything from the “dead” sstable. This should work for sstables in lower levels which have most of the data.
The above approach will need adjustment to handle provisional values and intents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: separate latest and old MVCC versions for faster reads #1170

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf: separate latest and old MVCC versions for faster reads #1170

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions