-
Notifications
You must be signed in to change notification settings - Fork 555
compaction: Don't split user keys during compactions #734
Description
Currently, there are some cases in the compaction loop where user keys could be output to two different sstables, like this:
000157.sst:
a.RANGEDEL.6:f
b.SET.15:foo
b.MERGE.8:bar
000158.sst:
b.RANGEDEL.6:f
b.SET.5:baz
b.DEL.3
This results in the implicit creation of an "atomic compaction group"; both these SSTables must
be present in compactions together, or it's possible for deleted keys to reappear after a sequence of compactions (see the comment above expandInputs on how and why this happens).
These implications of splitting user keys are ultimately unhelpful in increasing compaction parallelization and reducing compaction sizes. The only reason why we split user keys is to maintain similarity in behaviour with RocksDB. We should explore not splitting user keys across different sstables for all compactions (something we already do for flushes as of #675). This should help simplify some compaction logic.