-
Notifications
You must be signed in to change notification settings - Fork 555
perf: don't prioritize compaction of pinned range tombstones #872
Description
With the introduction of min-overlapping ratio heuristic #707, Pebble started prioritizing compaction of range tombstones by inflating start-level file sizes by an estimate of data covered by range tombstones.
Prioritizing the compaction of range tombstones has a few benefits:
- Disk space is reclaimed promptly.
- These compactions suffer less write amplification than their uncompensated input file sizes / overlapping ratio suggest.
- Moving broad tombstones into lower levels allows ingested sstables to be ingested into lower levels. (storage: avoid excessively wide range tombstones during Raft snapshot reception cockroach#44048).
But if an open snapshot prevents a range tombstone from dropping keys, these first two benefits do not apply. Additionally, if the output level is L6, these compactions may have a negative effect of cementing tombstones into the bottommost level (#517 (comment)) where they're only cleared by low-priority elision-only compactions.
We might want to improve prioritization of elision-only compactions. However, seeing as these compactions add write amplification that otherwise could've been avoided, maybe we should try to avoid prioritizing compactions that are unlikely to reclaim disk space. This could be done through using uncompensated file sizes during compaction picking under some conditions, like when a start level file's largest sequence number does not fall in the last snapshot stripe.