Skip to content

perf: split L6 data into separate obsolete, live files #847

@jbowens

Description

@jbowens

In discussing compactions to remove obsolete keys from L6, Peter had an idea:

I wonder if we could arrange for any key written to a bottommost level that is pinned by a snapshot to be written to a separate sstable. That is, L6 would be separated into L6-live and L6-obsolete where L6-obsolete contains records that are pinned by a snapshot. With this setup, we simply have to wait until snapshots are released and then can perform a compaction which deletes the L6-obsolete sstable.

L6 files may contain obsolete records that must be preserved because they're pinned by an open snapshot. When the pinning snapshot eventually closes, we want to reclaim the disk space occupied by these obsolete records (#838), but L6 tables are large and expensive to compact. If the obsolete keys were segmented into a separate file, the obsolete file could cheaply be dropped by a simple manifest edit once the pinning snapshots were released. This would prevent unnecessary read and write IO.

The L6 obsolete files form an additional level beneath L6. Because records in the L6-obsolete level are known to be obsolete (vs ordinary levels where records might be obsolete), reads may skip any L6-obsolete files containing only sequence numbers strictly less than the iterator sequence number. Read amplification for reads at recent sequence numbers is unaffected, and range iterators at these recent sequence numbers avoid needing to skip over obsolete keys. Read amplification for reads at old sequence numbers is increased by 1.

We would need to experiment to get a sense of how much obsolete data is written to L6 sstables in practice to understand if such a large undertaking is worthwhile.

Jira issue: PEBBLE-215

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-write-amppotential to reduce write amplification

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions