compaction: use sstable writer for detecting user key changes by jbowens · Pull Request #1340 · cockroachdb/pebble

jbowens · 2021-10-18T23:15:05Z

The userKeyChangeSplitter is responsible for avoiding splits within a
user key during flushes. Previously, it worked by recording the current
user key when a split is requested. This introduced a delay in
splitting. When the split is requested, the current key might already be
different than the last key written to the sstable. This change alters
the userKeyChangeSplitter to read the last written point key from the
current sstable writer and the last written range key from the range
deletion fragmenter.

This has a couple advantages:
a) It avoids an extra copy of a user key.
b) It may split sooner, closer to the target file size. This is a
practical concern in writing unit tests that involves flushes and
small target file sizes. If we prevent splitting user keys across
outputs in compactions too, it will become a practical concern there
too.
c) It exposes the previous point key to the broader compaction loop,
which is necessary for narrowing the conditions during which the
compaction loop must flush all/additional range tombstones.
Currently, we ignore the splitters' suggested split point during
flushes because we may have already output a key with the user key to
the current sstable. Future commits will be able to read this
previous point key and use the splitters' suggestion if the previous
point key's user key is not equal to the splitters' suggestion.

cockroach-teamcity · 2021-10-18T23:15:11Z

This change is

jbowens

@itsbilal @sumeerbhola thoughts on this approach?

This is motivated largely by the pragmatic concern that all our test cases assume that once a sstable is larger than the target size, we'll split. If we start to prevent user keys from being split across sstables, we can't split immediately: we need to wait until we also observe a user key change (because we don't buffer the user key until we want to split).

Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)

nicktrav

I looked mainly at the implementation, and it made sense. Wasn't sure if there was some subtle background that I was missing though.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)

itsbilal

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)

compaction.go, line 326 at r1 (raw file):

// any splitters that don't guarantee user key splits (i.e. splitters that make
// their determination in ways other than comparing the current key against a
// limit key.) If a wrapped splitter advises a split, it must continue

Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if shouldSplitBefore(a) advises a split then shouldSplitBefore(b) should also advise a split given b >= a, until onNewOutput is called.

The `userKeyChangeSplitter` is responsible for avoiding splits within a user key during flushes. Previously, it worked by recording the current user key when a split is requested. This introduced a delay in splitting. When the split is requested, the current key might already be different than the last key written to the sstable. This change alters the `userKeyChangeSplitter` to read the last written point key from the current sstable writer and the last written range key from the range deletion fragmenter. This has a couple advantages: a) It avoids an extra copy of a user key. b) It may split sooner, closer to the target file size. This is a practical concern in writing unit tests that involves flushes and small target file sizes. If we prevent splitting user keys across outputs in compactions too, it will become a practical concern there too. c) It exposes the previous point key to the broader compaction loop, which is necessary for narrowing the conditions during which the compaction loop must flush all/additional range tombstones. Currently, we ignore the splitters' suggested split point during flushes because we may have already output a key with the user key to the current sstable. Future commits will be able to read this previous point key and use the splitters' suggestion if the previous point key's user key is not equal to the splitters' suggestion.

jbowens

TFTRs!

Reviewable status: 2 of 5 files reviewed, 1 unresolved discussion (waiting on @itsbilal, @nicktrav, and @sumeerbhola)

compaction.go, line 326 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if shouldSplitBefore(a) advises a split then shouldSplitBefore(b) should also advise a split given b >= a, until onNewOutput is called.

Good call, done.

sumeerbhola

Reviewed 2 of 3 files at r2.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions

-- commits, line 27 at r2:

Quoted 5 lines of code…

     Currently, we ignore the splitters' suggested split point during
     flushes because we may have already output a key with the user key to
     the current sstable. Future commits will be able to read this
     previous point key and use the splitters' suggestion if the previous
     point key's user key is not equal to the splitters' suggestion.

I didn't understand this comment, specifically about future commits. Hasn't this code already changed userKeyChangeSplitter in this manner?

compaction.go, line 333 at r2 (raw file):

	cmp               Compare
	splitter          compactionOutputSplitter
	unsafePrevUserKey func() []byte

apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.

compaction.go, line 2404 at r2 (raw file):

			},
		}
		outputSplitters = append(outputSplitters, &l0LimitSplitter{c: c, ve: ve})

Can you remind me why having these two splitters being peers in a splitterGroup is correct, in that the group does not split a userkey across two sstables?
Is it because we know that the l0LimitSplitter also splits on a user key boundary because it uses the limit?

jbowens

Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions

-- commits, line 27 at r2:

Previously, sumeerbhola wrote…

     Currently, we ignore the splitters' suggested split point during
     flushes because we may have already output a key with the user key to
     the current sstable. Future commits will be able to read this
     previous point key and use the splitters' suggestion if the previous
     point key's user key is not equal to the splitters' suggestion.

I didn't understand this comment, specifically about future commits. Hasn't this code already changed userKeyChangeSplitter in this manner?

Sorry, this was poorly worded. This is referring to this case: 1. In thinking about this again, this case will become irrelevant if we start using userKeyChangeSplitter everywhere.

compaction.go, line 333 at r2 (raw file):

Previously, sumeerbhola wrote…

apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.

🎊

compaction.go, line 2404 at r2 (raw file):

Previously, sumeerbhola wrote…

Can you remind me why having these two splitters being peers in a splitterGroup is correct, in that the group does not split a userkey across two sstables?
Is it because we know that the l0LimitSplitter also splits on a user key boundary because it uses the limit?

Yeah, exactly — it never requests splits mid-userkey, since it requests a split as soon as we've exceeded limit.

jbowens requested review from itsbilal and sumeerbhola October 18, 2021 23:15

jbowens commented Nov 23, 2021

View reviewed changes

jbowens mentioned this pull request Nov 30, 2021

db: support range-key => value #1339

Closed

29 tasks

nicktrav approved these changes Dec 1, 2021

View reviewed changes

itsbilal approved these changes Dec 1, 2021

View reviewed changes

jbowens force-pushed the jackson/prevuserkey branch from fd8a944 to c260ba8 Compare December 1, 2021 23:08

jbowens commented Dec 1, 2021

View reviewed changes

jbowens merged commit 4b595de into cockroachdb:master Dec 2, 2021

jbowens deleted the jackson/prevuserkey branch December 2, 2021 00:22

sumeerbhola reviewed Dec 2, 2021

View reviewed changes

jbowens commented Dec 2, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compaction: use sstable writer for detecting user key changes#1340

compaction: use sstable writer for detecting user key changes#1340
jbowens merged 1 commit intocockroachdb:masterfrom
jbowens:jackson/prevuserkey

jbowens commented Oct 18, 2021

Uh oh!

cockroach-teamcity commented Oct 18, 2021

Uh oh!

jbowens left a comment

Uh oh!

nicktrav left a comment

Uh oh!

itsbilal left a comment

Uh oh!

jbowens left a comment

Uh oh!

sumeerbhola left a comment

Uh oh!

jbowens left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jbowens commented Oct 18, 2021

Uh oh!

cockroach-teamcity commented Oct 18, 2021

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants