db: benchmark framework for compaction heuristic comparisons

This has come up in discussions in the past such as on https://github.com/cockroachdb/pebble/pull/1603 and https://github.com/cockroachdb/pebble/pull/1746

We need benchmarks that satisfy the following (non-exhaustive) requirements:
- Workloads with different distribution of writes in the key space: For example (a) uniform (probably the most challenging for compactions), (b) hash sharded sequential writes (seem common in the CockroachDB context), (c) Zipfian.
- Large LSMs: we need many levels to be populated, at least L2-L6, for it to be representative of large deployments. We also need to be able to run benchmarks over short time intervals, say 30min. Which means capturing and storing already built LSMs corresponding to various workloads (say in S3 or GCS), and using those as a starting point for benchmarking. Since these benchmarks will be comparing write amplification, we can live with the performance of the starting files staying in blob storage (and not have to copy them to run experiments). The SharedFS developed by @itsbilal can be useful for this.
- Pacing the writes so that compactions are not falling behind. One cannot make good comparisons between different schemes if one fell behind and then caught up at the end, while another kept up. Also the termination condition of the benchmarks need to be similar -- one option is to stop writes and then wait until all levels have compaction scores < 1.0 and there are no more compactions left to run.

Sub-issues:

- [x] https://github.com/cockroachdb/pebble/issues/2050
- [x] https://github.com/cockroachdb/pebble/issues/2056
- [x] https://github.com/cockroachdb/pebble/issues/2057

TODO(jackson): Add issues for remaining work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: benchmark framework for compaction heuristic comparisons #1865

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

db: benchmark framework for compaction heuristic comparisons #1865

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions