compaction: use sstable writer for detecting user key changes#1340
compaction: use sstable writer for detecting user key changes#1340jbowens merged 1 commit intocockroachdb:masterfrom
Conversation
jbowens
left a comment
There was a problem hiding this comment.
@itsbilal @sumeerbhola thoughts on this approach?
This is motivated largely by the pragmatic concern that all our test cases assume that once a sstable is larger than the target size, we'll split. If we start to prevent user keys from being split across sstables, we can't split immediately: we need to wait until we also observe a user key change (because we don't buffer the user key until we want to split).
Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)
nicktrav
left a comment
There was a problem hiding this comment.
I looked mainly at the implementation, and it made sense. Wasn't sure if there was some subtle background that I was missing though.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)
itsbilal
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)
compaction.go, line 326 at r1 (raw file):
// any splitters that don't guarantee user key splits (i.e. splitters that make // their determination in ways other than comparing the current key against a // limit key.) If a wrapped splitter advises a split, it must continue
Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if shouldSplitBefore(a) advises a split then shouldSplitBefore(b) should also advise a split given b >= a, until onNewOutput is called.
The `userKeyChangeSplitter` is responsible for avoiding splits within a user key during flushes. Previously, it worked by recording the current user key when a split is requested. This introduced a delay in splitting. When the split is requested, the current key might already be different than the last key written to the sstable. This change alters the `userKeyChangeSplitter` to read the last written point key from the current sstable writer and the last written range key from the range deletion fragmenter. This has a couple advantages: a) It avoids an extra copy of a user key. b) It may split sooner, closer to the target file size. This is a practical concern in writing unit tests that involves flushes and small target file sizes. If we prevent splitting user keys across outputs in compactions too, it will become a practical concern there too. c) It exposes the previous point key to the broader compaction loop, which is necessary for narrowing the conditions during which the compaction loop must flush all/additional range tombstones. Currently, we ignore the splitters' suggested split point during flushes because we may have already output a key with the user key to the current sstable. Future commits will be able to read this previous point key and use the splitters' suggestion if the previous point key's user key is not equal to the splitters' suggestion.
fd8a944 to
c260ba8
Compare
jbowens
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status: 2 of 5 files reviewed, 1 unresolved discussion (waiting on @itsbilal, @nicktrav, and @sumeerbhola)
compaction.go, line 326 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if
shouldSplitBefore(a)advises a split thenshouldSplitBefore(b)should also advise a split givenb >= a, untilonNewOutputis called.
Good call, done.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 2 of 3 files at r2.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions
Quoted 5 lines of code…
Currently, we ignore the splitters' suggested split point during flushes because we may have already output a key with the user key to the current sstable. Future commits will be able to read this previous point key and use the splitters' suggestion if the previous point key's user key is not equal to the splitters' suggestion.
I didn't understand this comment, specifically about future commits. Hasn't this code already changed userKeyChangeSplitter in this manner?
compaction.go, line 333 at r2 (raw file):
cmp Compare splitter compactionOutputSplitter unsafePrevUserKey func() []byte
apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.
compaction.go, line 2404 at r2 (raw file):
}, } outputSplitters = append(outputSplitters, &l0LimitSplitter{c: c, ve: ve})
Can you remind me why having these two splitters being peers in a splitterGroup is correct, in that the group does not split a userkey across two sstables?
Is it because we know that the l0LimitSplitter also splits on a user key boundary because it uses the limit?
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions
Previously, sumeerbhola wrote…
Currently, we ignore the splitters' suggested split point during flushes because we may have already output a key with the user key to the current sstable. Future commits will be able to read this previous point key and use the splitters' suggestion if the previous point key's user key is not equal to the splitters' suggestion.I didn't understand this comment, specifically about future commits. Hasn't this code already changed
userKeyChangeSplitterin this manner?
Sorry, this was poorly worded. This is referring to this case: 1. In thinking about this again, this case will become irrelevant if we start using userKeyChangeSplitter everywhere.
compaction.go, line 333 at r2 (raw file):
Previously, sumeerbhola wrote…
apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.
🎊
compaction.go, line 2404 at r2 (raw file):
Previously, sumeerbhola wrote…
Can you remind me why having these two splitters being peers in a
splitterGroupis correct, in that the group does not split a userkey across two sstables?
Is it because we know that thel0LimitSplitteralso splits on a user key boundary because it uses thelimit?
Yeah, exactly — it never requests splits mid-userkey, since it requests a split as soon as we've exceeded limit.
The
userKeyChangeSplitteris responsible for avoiding splits within auser key during flushes. Previously, it worked by recording the current
user key when a split is requested. This introduced a delay in
splitting. When the split is requested, the current key might already be
different than the last key written to the sstable. This change alters
the
userKeyChangeSplitterto read the last written point key from thecurrent sstable writer and the last written range key from the range
deletion fragmenter.
This has a couple advantages:
a) It avoids an extra copy of a user key.
b) It may split sooner, closer to the target file size. This is a
practical concern in writing unit tests that involves flushes and
small target file sizes. If we prevent splitting user keys across
outputs in compactions too, it will become a practical concern there
too.
c) It exposes the previous point key to the broader compaction loop,
which is necessary for narrowing the conditions during which the
compaction loop must flush all/additional range tombstones.
Currently, we ignore the splitters' suggested split point during
flushes because we may have already output a key with the user key to
the current sstable. Future commits will be able to read this
previous point key and use the splitters' suggestion if the previous
point key's user key is not equal to the splitters' suggestion.