Skip to content

perf: don't prioritize compaction of pinned range tombstones #872

@jbowens

Description

@jbowens

With the introduction of min-overlapping ratio heuristic #707, Pebble started prioritizing compaction of range tombstones by inflating start-level file sizes by an estimate of data covered by range tombstones.

Prioritizing the compaction of range tombstones has a few benefits:

  1. Disk space is reclaimed promptly.
  2. These compactions suffer less write amplification than their uncompensated input file sizes / overlapping ratio suggest.
  3. Moving broad tombstones into lower levels allows ingested sstables to be ingested into lower levels. (storage: avoid excessively wide range tombstones during Raft snapshot reception cockroach#44048).

But if an open snapshot prevents a range tombstone from dropping keys, these first two benefits do not apply. Additionally, if the output level is L6, these compactions may have a negative effect of cementing tombstones into the bottommost level (#517 (comment)) where they're only cleared by low-priority elision-only compactions.

We might want to improve prioritization of elision-only compactions. However, seeing as these compactions add write amplification that otherwise could've been avoided, maybe we should try to avoid prioritizing compactions that are unlikely to reclaim disk space. This could be done through using uncompensated file sizes during compaction picking under some conditions, like when a start level file's largest sequence number does not fall in the last snapshot stripe.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions