Skip to content

compaction: use sstable writer for detecting user key changes#1340

Merged
jbowens merged 1 commit intocockroachdb:masterfrom
jbowens:jackson/prevuserkey
Dec 2, 2021
Merged

compaction: use sstable writer for detecting user key changes#1340
jbowens merged 1 commit intocockroachdb:masterfrom
jbowens:jackson/prevuserkey

Conversation

@jbowens
Copy link
Copy Markdown
Contributor

@jbowens jbowens commented Oct 18, 2021

The userKeyChangeSplitter is responsible for avoiding splits within a
user key during flushes. Previously, it worked by recording the current
user key when a split is requested. This introduced a delay in
splitting. When the split is requested, the current key might already be
different than the last key written to the sstable. This change alters
the userKeyChangeSplitter to read the last written point key from the
current sstable writer and the last written range key from the range
deletion fragmenter.

This has a couple advantages:
a) It avoids an extra copy of a user key.
b) It may split sooner, closer to the target file size. This is a
practical concern in writing unit tests that involves flushes and
small target file sizes. If we prevent splitting user keys across
outputs in compactions too, it will become a practical concern there
too.
c) It exposes the previous point key to the broader compaction loop,
which is necessary for narrowing the conditions during which the
compaction loop must flush all/additional range tombstones.
Currently, we ignore the splitters' suggested split point during
flushes because we may have already output a key with the user key to
the current sstable. Future commits will be able to read this
previous point key and use the splitters' suggestion if the previous
point key's user key is not equal to the splitters' suggestion.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsbilal @sumeerbhola thoughts on this approach?

This is motivated largely by the pragmatic concern that all our test cases assume that once a sstable is larger than the target size, we'll split. If we start to prevent user keys from being split across sstables, we can't split immediately: we need to wait until we also observe a user key change (because we don't buffer the user key until we want to split).

Reviewable status: 0 of 5 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)

@jbowens jbowens mentioned this pull request Nov 30, 2021
29 tasks
Copy link
Copy Markdown
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I looked mainly at the implementation, and it made sense. Wasn't sure if there was some subtle background that I was missing though.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)


compaction.go, line 326 at r1 (raw file):

// any splitters that don't guarantee user key splits (i.e. splitters that make
// their determination in ways other than comparing the current key against a
// limit key.) If a wrapped splitter advises a split, it must continue

Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if shouldSplitBefore(a) advises a split then shouldSplitBefore(b) should also advise a split given b >= a, until onNewOutput is called.

The `userKeyChangeSplitter` is responsible for avoiding splits within a
user key during flushes. Previously, it worked by recording the current
user key when a split is requested. This introduced a delay in
splitting. When the split is requested, the current key might already be
different than the last key written to the sstable. This change alters
the `userKeyChangeSplitter` to read the last written point key from the
current sstable writer and the last written range key from the range
deletion fragmenter.

This has a couple advantages:
a) It avoids an extra copy of a user key.
b) It may split sooner, closer to the target file size. This is a
   practical concern in writing unit tests that involves flushes and
   small target file sizes. If we prevent splitting user keys across
   outputs in compactions too, it will become a practical concern there
   too.
c) It exposes the previous point key to the broader compaction loop,
   which is necessary for narrowing the conditions during which the
   compaction loop must flush all/additional range tombstones.
   Currently, we ignore the splitters' suggested split point during
   flushes because we may have already output a key with the user key to
   the current sstable. Future commits will be able to read this
   previous point key and use the splitters' suggestion if the previous
   point key's user key is not equal to the splitters' suggestion.
@jbowens jbowens force-pushed the jackson/prevuserkey branch from fd8a944 to c260ba8 Compare December 1, 2021 23:08
Copy link
Copy Markdown
Contributor Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: 2 of 5 files reviewed, 1 unresolved discussion (waiting on @itsbilal, @nicktrav, and @sumeerbhola)


compaction.go, line 326 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Should this be moved up to the splitter interface comment as well? I'm good with making it a part of the splitter contract, so to speak - if shouldSplitBefore(a) advises a split then shouldSplitBefore(b) should also advise a split given b >= a, until onNewOutput is called.

Good call, done.

@jbowens jbowens merged commit 4b595de into cockroachdb:master Dec 2, 2021
@jbowens jbowens deleted the jackson/prevuserkey branch December 2, 2021 00:22
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r2.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions


-- commits, line 27 at r2:

Quoted 5 lines of code…
     Currently, we ignore the splitters' suggested split point during
     flushes because we may have already output a key with the user key to
     the current sstable. Future commits will be able to read this
     previous point key and use the splitters' suggestion if the previous
     point key's user key is not equal to the splitters' suggestion.

I didn't understand this comment, specifically about future commits. Hasn't this code already changed userKeyChangeSplitter in this manner?


compaction.go, line 333 at r2 (raw file):

	cmp               Compare
	splitter          compactionOutputSplitter
	unsafePrevUserKey func() []byte

apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.


compaction.go, line 2404 at r2 (raw file):

			},
		}
		outputSplitters = append(outputSplitters, &l0LimitSplitter{c: c, ve: ve})

Can you remind me why having these two splitters being peers in a splitterGroup is correct, in that the group does not split a userkey across two sstables?
Is it because we know that the l0LimitSplitter also splits on a user key boundary because it uses the limit?

Copy link
Copy Markdown
Contributor Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions


-- commits, line 27 at r2:

Previously, sumeerbhola wrote…
     Currently, we ignore the splitters' suggested split point during
     flushes because we may have already output a key with the user key to
     the current sstable. Future commits will be able to read this
     previous point key and use the splitters' suggestion if the previous
     point key's user key is not equal to the splitters' suggestion.

I didn't understand this comment, specifically about future commits. Hasn't this code already changed userKeyChangeSplitter in this manner?

Sorry, this was poorly worded. This is referring to this case: 1. In thinking about this again, this case will become irrelevant if we start using userKeyChangeSplitter everywhere.


compaction.go, line 333 at r2 (raw file):

Previously, sumeerbhola wrote…

apologies for the extreme tardiness in reviewing this code.
I really like the elimination of this key copy.

🎊


compaction.go, line 2404 at r2 (raw file):

Previously, sumeerbhola wrote…

Can you remind me why having these two splitters being peers in a splitterGroup is correct, in that the group does not split a userkey across two sstables?
Is it because we know that the l0LimitSplitter also splits on a user key boundary because it uses the limit?

Yeah, exactly — it never requests splits mid-userkey, since it requests a split as soon as we've exceeded limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants