Skip to content

compaction: Don't split user keys during compactions #734

@itsbilal

Description

@itsbilal

Currently, there are some cases in the compaction loop where user keys could be output to two different sstables, like this:

000157.sst:
a.RANGEDEL.6:f
b.SET.15:foo
b.MERGE.8:bar

000158.sst:
b.RANGEDEL.6:f
b.SET.5:baz
b.DEL.3

This results in the implicit creation of an "atomic compaction group"; both these SSTables must
be present in compactions together, or it's possible for deleted keys to reappear after a sequence of compactions (see the comment above expandInputs on how and why this happens).

These implications of splitting user keys are ultimately unhelpful in increasing compaction parallelization and reducing compaction sizes. The only reason why we split user keys is to maintain similarity in behaviour with RocksDB. We should explore not splitting user keys across different sstables for all compactions (something we already do for flushes as of #675). This should help simplify some compaction logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions