Skip to content

perf: skip output level sstables with no data overlap with input level #389

@petermattis

Description

@petermattis

See facebook/rocksdb#6021.

The idea is that if we're doing a compaction from Ln->Ln+1, we can skip sstables in Ln+1 which do contain any keys in Ln. Right now this determination is done solely based on the sstable boundaries, but it is only a little more work to create a levelIter on the input sstables in Ln and seek to the boundary keys in Ln+1 to see if any actual key is contained within the sstables in Ln+1.

Here's an example of where this could be beneficial:

L0: a-----g------------t-----z
L1:   c----h j------q s-----y        

L0 has one sstable with the bounds [a,z] containing the keys a, g, t, and z. A compaction from L0 to L1 would normally involve every sstable in L1 because the bounds [a,z] covers every sstable in L1. But notice that the L1 sstable [j,q] doesn't overlap with any keys in L0.

After selecting the initial set of input sstables in L0 and L1, we could open a levelIter on the sstables in L0 and seek that iterator to the boundary keys of the tables in L1, trimming the L1 inputs of any sstables that do not contain an overlapping key in L0. The compaction logic would have to be changed to ensure that the tables output to L1 are cut so as not to overlap with the excluded sstables in L1.

Cc @sumeerbhola as this overlaps with other thinking we've been doing in this area. I'm not sure how often this situation occurs in practice. We'd want to do some experimentation to determine if it is common or rare.

Jira issue: PEBBLE-180

Epic CRDB-40361

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-write-amppotential to reduce write amplification

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions