internal/keyspan: Emit empty spans where there are no range keys by itsbilal · Pull Request #1639 · cockroachdb/pebble

itsbilal · 2022-04-11T19:02:59Z

This change updates keyspan.LevelIter to emit empty Spans (i.e.
spans with valid start/end but no values/Keys) between files.
This is used as an optimization to avoid loading the next file
after a file has been exhausted.

To preserve this optimization up the iterator stack, the
merging iter and defragmenting iter were also updated to
special-case empty spans returned by child iterators and
preserve them as-is instead of defragmenting them
or iterating beyond them.

Fixes #1605

cockroach-teamcity · 2022-04-11T19:03:08Z

This change is

itsbilal · 2022-04-11T20:14:44Z

Nevermind, this change is tripping up the metamorphic test. I'll take a deeper look; marking as draft.

sumeerbhola

Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)

internal/keyspan/defragment.go, line 318 at r1 (raw file):

// checkEqual checks the two spans for logical equivalence. Uses the passed-in
// DefragmentMethod and ensures both spans are NOT empty; not defragmenting
// empty spans is an optimization that lets us load fewer sstable blocks.

Wouldn't this become non-deterministic, because file boundaries are non-deterministic?

jbowens

Reviewable status: 0 of 11 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)

internal/keyspan/defragment.go, line 318 at r1 (raw file):

Previously, sumeerbhola wrote…

Wouldn't this become non-deterministic, because file boundaries are non-deterministic?

I don't think so, because the user doesn't observe the empty spans. One database may not have any empty spans (eg, RangeKeyDel(a,c), RangeKeySet(a,c,@1,) whereas another one finds the entire keyspace is one empty span [a,z). Either way, the bounds of the empty span are never surfaced through user iteration.

itsbilal

Finally managed to track down the challenging cases in the top level Iterator. Not a lot of code, but a lot of investigation. This is ready for a review now.

Reviewable status: 0 of 12 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)

internal/keyspan/defragment.go, line 318 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I don't think so, because the user doesn't observe the empty spans. One database may not have any empty spans (eg, RangeKeyDel(a,c), RangeKeySet(a,c,@1,) whereas another one finds the entire keyspace is one empty span [a,z). Either way, the bounds of the empty span are never surfaced through user iteration.

This is correct; we never surface the empty spans through the high-level Iterator so this remains deterministic.

nicktrav

Still picking through some test cases. Flushing some comments.

Reviewed 6 of 11 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: 9 of 12 files reviewed, 5 unresolved discussions (waiting on @itsbilal and @sumeerbhola)

internal/keyspan/defragment.go, line 316 at r2 (raw file):

}

// checkEqual checks the two spans for logical equivalence. Uses the passed-in

nit: s/. Uses/. It uses/

internal/keyspan/level_iter.go, line 358 at r2 (raw file):

			Keys:  nil,
		}
		l.straddle = l.checkLowerBound(l.straddle)

I noticed that there is no corresponding check in SeekGE (i.e. checkUpperBound). Seems like the tests fail if I add it, so figured it was intentional, but wanted to check.

Also - with this line commented out, the tests still pass. Worth adding a covering case?

internal/keyspan/level_iter.go, line 493 at r2 (raw file):

		// We were at a straddle key, but are now changing directions. l.iterFile
		// was already moved backward by skipEmptyFileBackward, so advance it
		// forward and load the file.

nit: the "and load the file" technically happens a few operations down. Same for the backwards method.

internal/keyspan/testdata/level_iter, line 226 at r2 (raw file):

prev
----
bb-c:{} (file = 000001.sst)

I'm a bit confused by this one. SeekGE is documented to seek "to the first span whose start key is greater than or equal to the given key", in which case shouldn't this be c-d?

Compare with the case where we seek into the middle of a non-empty span, in which case we move up to the next span, e.g. seek-ge cc -> d-e:{}.

nicktrav

Reviewed 3 of 11 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @itsbilal and @sumeerbhola)

itsbilal

TFTR!

Reviewable status: 9 of 12 files reviewed, 2 unresolved discussions (waiting on @nicktrav and @sumeerbhola)

internal/keyspan/level_iter.go, line 358 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

I noticed that there is no corresponding check in SeekGE (i.e. checkUpperBound). Seems like the tests fail if I add it, so figured it was intentional, but wanted to check.

Also - with this line commented out, the tests still pass. Worth adding a covering case?

This is a good catch, I expected to add a check there as well. The failing test pointed out a bug; we were actually not checking the right lower/upper bounds whenever we were at a straddle key. We should use l.lower and l.upper, not l.tableOpts.{Lower,Upper}Bound as the latter are stale from the last file we loaded (i.e. we could have unset them if the last file we loaded was within bounds). I've made that fix and added a test, thanks for catching this!

internal/keyspan/level_iter.go, line 493 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: the "and load the file" technically happens a few operations down. Same for the backwards method.

Done.

internal/keyspan/testdata/level_iter, line 226 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

I'm a bit confused by this one. SeekGE is documented to seek "to the first span whose start key is greater than or equal to the given key", in which case shouldn't this be c-d?

Compare with the case where we seek into the middle of a non-empty span, in which case we move up to the next span, e.g. seek-ge cc -> d-e:{}.

The issue is that the file 00002.sst only starts at c, so loading c-d would necessitate loading that file and its rangekey block. We want to avoid that as much as possible, so when we recognize that we are in between two files, we just return a straddle span until the start of the next file. And to keep it consistent with the SeekGE semantics, we return a made-up straddle bound that begins right at the seeked key.

That's still consistent with the seek in the middle of a non-empty span case that you've described; it's why InterleavingIter does a SeekLT (and then potentially a next based on where it lands) on the range key iterator for every SeekGE on the point key iterator. We only seek based on start keys.

nicktrav

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)

internal/keyspan/testdata/level_iter, line 226 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The issue is that the file 00002.sst only starts at c, so loading c-d would necessitate loading that file and its rangekey block. We want to avoid that as much as possible, so when we recognize that we are in between two files, we just return a straddle span until the start of the next file. And to keep it consistent with the SeekGE semantics, we return a made-up straddle bound that begins right at the seeked key.

That's still consistent with the seek in the middle of a non-empty span case that you've described; it's why InterleavingIter does a SeekLT (and then potentially a next based on where it lands) on the range key iterator for every SeekGE on the point key iterator. We only seek based on start keys.

Ah yes, that makes much more sense now framing it in light of these "synthetic spans". Thank you.

One remaining, related question here is the iteration order being "asymmetric" in the sense that the operations seek-ge bb, next, prev returns bb-c:{}, c-d:{NON_EMPTY}, b-c:{} (the current test case), whereas if we started with a seek within a "real" span, with something like seek-ge cc, next, prev, we'd get d-e:{}, c-d{NON_EMPTY}, d-e:{} (i.e. first and last span have identical bounds, vs. the first example, where is asymmetric).

I recall @jbowens mentioning a possible implementation for one of the iterators where it would "remember" where it had been truncated, so that it could return the same set of spans when iterating in reverse order. I can't seem to find the thread / comment. Could this cause issues elsewhere (I feel like the answer is "no", but just wanted to check)? And / or is this something we can just address with documentation. It is mildly surprising.

jbowens

Thanks for doing this! I think this will go a long way in making the ordinary overhead of checking for mvcc delete ranges close to zero.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

iterator.go, line 382 at r3 (raw file):

				return
			}
			if !pointKeyExists && !i.rangeKey.hasRangeKey {

I think a simpler solution would be to check i.rangeKey.iter.Span().Empty() up above when we begin handling the InternalKeyKindRangeKeySet. If the span is empty, then we don't care about the interleaved range key start and can simply:

			i.stats.ForwardStepCount[InternalIterCall]++
			i.iterKey, i.iterValue = i.iter.Next()
 			continue

We'll either land on the first point key at the same user key, or the next user key. Either is fine.

iterator.go, line 736 at r3 (raw file):

			// return the key even if the MERGE point key is deleted, but only if
			// the current span was non-empty.
			rangeKeyBoundary = i.rangeKey.hasRangeKey

I think we can do something simpler here too. When we encounter the range key set, we check i.rangeKey.iter.Span().Empty(). If it's empty, there's no point in even considering stopping at i.key just for the range key. We can just

			i.iterKey, i.iterValue = i.iter.Prev()
			i.stats.ReverseStepCount[InternalIterCall]++
			continue

and the subsequent iteration will necessarily either exhaust the iterator, or encounter a key with different user key.

internal/keyspan/defragment.go, line 332 at r3 (raw file):

		return i.iterSpan
	}
	i.saveCurrent(i.iterSpan)

can we avoid the i.iter.Next() below if i.iterSpan.Empty()?

With the current logic of avoiding defragmenting abutting spans if they're empty, we only actually avoid the block load if we have two adjacent empty spans. Avoiding advancing i.iter would avoid the block load. (and same in reverse)

internal/keyspan/merging_iter.go, line 624 at r3 (raw file):

		// the start of the next, OR if the configured transform filters any
		// keys out.  We allow empty spans that were emitted by child iterators, but
		// we elide empty spans created by the mergingIter itself (i.e. that bridge

Should we still elide these? It seems like the optimization benefits from empty spans wherever they're constructed.

Code quote:

		// we elide empty spans created by the mergingIter itself (i.e. that bridge
		// two child-iterator-defined spans).

jbowens

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

iterator.go, line 382 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think a simpler solution would be to check i.rangeKey.iter.Span().Empty() up above when we begin handling the InternalKeyKindRangeKeySet. If the span is empty, then we don't care about the interleaved range key start and can simply:
			i.stats.ForwardStepCount[InternalIterCall]++
			i.iterKey, i.iterValue = i.iter.Next()
 			continue
We'll either land on the first point key at the same user key, or the next user key. Either is fine.

On second thought, I think we want to make the changes to InterleavingIter made in #1654. When we spoke I forgot exactly what the interleaving iter TODO around empty spans was. We just don't want to surface the marker to the caller if there's nothing there. We still want the empty span to sit in i.span as long as the current point key sits within the empty span.

Avoiding yielding the marker for empty spans allows us to return a marker with a more reasonable Kind, which is important when we use the interleaving iterator for rangedels.

itsbilal

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens and @sumeerbhola)

iterator.go, line 382 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

On second thought, I think we want to make the changes to InterleavingIter made in #1654. When we spoke I forgot exactly what the interleaving iter TODO around empty spans was. We just don't want to surface the marker to the caller if there's nothing there. We still want the empty span to sit in i.span as long as the current point key sits within the empty span.

Avoiding yielding the marker for empty spans allows us to return a marker with a more reasonable Kind, which is important when we use the interleaving iterator for rangedels.

Done.

iterator.go, line 736 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think we can do something simpler here too. When we encounter the range key set, we check i.rangeKey.iter.Span().Empty(). If it's empty, there's no point in even considering stopping at i.key just for the range key. We can just
			i.iterKey, i.iterValue = i.iter.Prev()
			i.stats.ReverseStepCount[InternalIterCall]++
			continue
and the subsequent iteration will necessarily either exhaust the iterator, or encounter a key with different user key.

Done (N/A after interleaving iter change)

internal/keyspan/defragment.go, line 332 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

can we avoid the i.iter.Next() below if i.iterSpan.Empty()?

With the current logic of avoiding defragmenting abutting spans if they're empty, we only actually avoid the block load if we have two adjacent empty spans. Avoiding advancing i.iter would avoid the block load. (and same in reverse)

Done.

internal/keyspan/merging_iter.go, line 624 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Should we still elide these? It seems like the optimization benefits from empty spans wherever they're constructed.

Is it still a benefit though? Like if we've already loaded a block, may as well return the next non-empty span instead of returning an empty span. The leveliter is already doing the work of returning empty spans where it makes sense.

My main motivation here was to keep the contract of mergingIters just merging child iterator spans, as a bunch of tests rely on it. Changing the tests wouldn't be hard, but it just feels safer this way and relatively easy to implement.

internal/keyspan/testdata/level_iter, line 226 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Ah yes, that makes much more sense now framing it in light of these "synthetic spans". Thank you.

One remaining, related question here is the iteration order being "asymmetric" in the sense that the operations seek-ge bb, next, prev returns bb-c:{}, c-d:{NON_EMPTY}, b-c:{} (the current test case), whereas if we started with a seek within a "real" span, with something like seek-ge cc, next, prev, we'd get d-e:{}, c-d{NON_EMPTY}, d-e:{} (i.e. first and last span have identical bounds, vs. the first example, where is asymmetric).

I recall @jbowens mentioning a possible implementation for one of the iterators where it would "remember" where it had been truncated, so that it could return the same set of spans when iterating in reverse order. I can't seem to find the thread / comment. Could this cause issues elsewhere (I feel like the answer is "no", but just wanted to check)? And / or is this something we can just address with documentation. It is mildly surprising.

It shouldn't cause any issues as empty spans are just not surfaced by the top level iterator, so the view up there would be deterministic and consistent in that case. I'll still add a comment about this where we create this synthetic span, just in case it trips up something in the future. For that reason I'm also unsure if we need to make the effort to remember where we truncated it; it might just be unnecessary.

nicktrav

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens and @sumeerbhola)

jbowens

Reviewed 3 of 11 files at r1, 3 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)

internal/keyspan/interleaving_iter.go, line 445 at r4 (raw file):

						} else {
							i.keyspanInterleaved = true
						}

ah, this makes sense. can we unconditionally advance the keyspan iter here? eg:

if i.span.Empty() {
    i.checkForwardBound(i.keyspanIter.Next())
    continue
}

internal/keyspan/merging_iter.go, line 624 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Is it still a benefit though? Like if we've already loaded a block, may as well return the next non-empty span instead of returning an empty span. The leveliter is already doing the work of returning empty spans where it makes sense.

My main motivation here was to keep the contract of mergingIters just merging child iterator spans, as a bunch of tests rely on it. Changing the tests wouldn't be hard, but it just feels safer this way and relatively easy to implement.

I was misunderstanding your comment. If one child iterator returns an empty span [a,d) and another returns an empty span [c,f), we'll still return empty spans [a,c), [c,d), [d,f)

This change updates keyspan.LevelIter to emit empty Spans (i.e. spans with valid start/end but no values/Keys) between files. This is used as an optimization to avoid loading the next file after a file has been exhausted. To preserve this optimization up the iterator stack, the merging iter and defragmenting iter were also updated to special-case empty spans returned by child iterators and preserve them as-is instead of defragmenting them or iterating beyond them. The top-level Iterator was also updated to elide rangekey boundary keys that correspond to empty spans that do not cover any point keys. Fixes cockroachdb#1605

itsbilal

TFTR!

Reviewable status: 12 of 13 files reviewed, 5 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)

internal/keyspan/interleaving_iter.go, line 445 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

ah, this makes sense. can we unconditionally advance the keyspan iter here? eg:
if i.span.Empty() {
    i.checkForwardBound(i.keyspanIter.Next())
    continue
}

I think that works too, but this is slightly more efficient I believe; it keeps the keyspanIter "covering" the current point key instead of going past it and loading the next range key.

internal/keyspan/merging_iter.go, line 624 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I was misunderstanding your comment. If one child iterator returns an empty span [a,d) and another returns an empty span [c,f), we'll still return empty spans [a,c), [c,d), [d,f)

Ah fair. I'll clarify the comment, I can see how it can be misunderstood that way.

itsbilal requested review from a team and jbowens April 11, 2022 19:03

itsbilal self-assigned this Apr 11, 2022

itsbilal marked this pull request as draft April 11, 2022 19:12

itsbilal force-pushed the leveliter-opt branch from 496f8bd to 101c1b7 Compare April 11, 2022 19:28

itsbilal marked this pull request as ready for review April 11, 2022 19:28

itsbilal marked this pull request as draft April 11, 2022 20:14

sumeerbhola reviewed Apr 11, 2022

View reviewed changes

jbowens reviewed Apr 11, 2022

View reviewed changes

itsbilal force-pushed the leveliter-opt branch from 101c1b7 to b796c61 Compare April 13, 2022 15:28

itsbilal marked this pull request as ready for review April 13, 2022 15:29

itsbilal commented Apr 13, 2022

View reviewed changes

itsbilal force-pushed the leveliter-opt branch from b796c61 to 4194a92 Compare April 13, 2022 17:35

nicktrav reviewed Apr 13, 2022

View reviewed changes

itsbilal force-pushed the leveliter-opt branch from 4194a92 to 76b0d63 Compare April 14, 2022 20:55

itsbilal commented Apr 14, 2022

View reviewed changes

nicktrav reviewed Apr 14, 2022

View reviewed changes

jbowens reviewed Apr 15, 2022

View reviewed changes

itsbilal force-pushed the leveliter-opt branch from 76b0d63 to a746721 Compare April 19, 2022 15:18

itsbilal commented Apr 19, 2022

View reviewed changes

nicktrav reviewed Apr 19, 2022

View reviewed changes

jbowens approved these changes Apr 19, 2022

View reviewed changes

itsbilal force-pushed the leveliter-opt branch from a746721 to ab7c881 Compare April 19, 2022 16:51

itsbilal commented Apr 19, 2022

View reviewed changes

itsbilal merged commit 8b3db52 into cockroachdb:master Apr 19, 2022

itsbilal mentioned this pull request Apr 20, 2022

*: Use keyspan.LevelIter for rangedels in compactions #1648

Merged

Conversation

itsbilal commented Apr 11, 2022

Uh oh!

cockroach-teamcity commented Apr 11, 2022

Uh oh!

itsbilal commented Apr 11, 2022

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants