internal/keyspan: Add LevelIter for range keys by itsbilal · Pull Request #1576 · cockroachdb/pebble

itsbilal · 2022-03-11T17:44:52Z

When reading range keys directly from sstables, we will
need to instantiate levelIters for each level that contains
sstables that could contain range keys. This change adds
a levelIter that implements keyspan.FragmentIterator and
takes advantage of level invariants to only have one sstable's
range key block iter open at once. On top of that,
as fragmentBoundIterator does not currently support
SetBounds or truncation of spans based on iterator bounds, those
features are provided by LevelIter.

Also moves TableNewRangeKeyIter from the base package to
internal/keyspan.

First commit is #1568 .

cockroach-teamcity · 2022-03-11T17:45:03Z

This change is

nicktrav

Reviewed 11 of 11 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: 11 of 20 files reviewed, 12 unresolved discussions (waiting on @itsbilal and @jbowens)

internal/base/options.go, line 70 at r2 (raw file):

// within the sstable. It should not maintain any per-sstable state, and must
// be thread-safe.
type BlockPropertyFilter interface {

Should we move the collector in here too? Can be a follow-up.

internal/keyspan/iter.go, line 52 at r2 (raw file):

// per-sstable range key iterators.
type RangeIterOptions struct {
	// LowerBound specifies the smallest key (inclusive) that the iterator will

Could these say "smallest user key"?

internal/keyspan/iter.go, line 69 at r2 (raw file):

	// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables
	// when iterating over range keys.
	RangeKeyFilters []base.BlockPropertyFilter

Could this be BlockFilters or Filters? The Range part is implicit given the struct name.

internal/keyspan/level_iter.go, line 5 at r2 (raw file):

// the LICENSE file.

package keyspan

This is a bit of a bikeshed comment, and not a strong opinion by any means: we also have the rangekey package? Given the iterator has range key specific behavior, should it be in that package instead? I was under the impression that keyspan is more general.

internal/keyspan/level_iter.go, line 31 at r2 (raw file):

	// The lower/upper bounds for iteration as specified at creation or the most
	// recent call to SetBounds.
	lower []byte

Is it worth a brief comment on the lifetimes of these (either here or in the RangeIterOptions)? I realize we don't do that on the top level version, but it might be helpful.

internal/keyspan/level_iter.go, line 53 at r2 (raw file):

	// operations that could make the iterator valid again.
	exhaustedBounds bool
}

Do we want to add in InternalIteratorStats like we have in level_iter.go at the top level?

internal/keyspan/level_iter.go, line 135 at r2 (raw file):

func (l *LevelIter) findFileGE(key []byte) *manifest.FileMetadata {
	// Find the earliest file whose largest key is >= ikey.

There's some additional commentary in the top level level_iter.go around range del sentinels that would be useful to put here too.

internal/keyspan/level_iter.go, line 219 at r2 (raw file):

		}
		if !file.HasRangeKeys {
			// Skip empty file.

nit: it's not necessarily "empty"? Just has no range keys.

internal/keyspan/level_iter.go, line 246 at r2 (raw file):

		}

		var iter FragmentIterator

Can we add a comment here for completeness for the 0 case? i.e. "This table overlaps the iteration bounds"?

internal/keyspan/level_iter.go, line 289 at r3 (raw file):

	if l.cmp(ikey.UserKey, l.tableOpts.LowerBound) < 0 {
		// Truncate the start key at the lower bound.
		ikey.UserKey = l.tableOpts.LowerBound

I had a comment elsewhere about lifetimes. Seems more relevant to state that it shouldn't be mutated if we're reusing that lower / upper bound byte slice in keys that get returned.

internal/keyspan/testdata/level_iter, line 25 at r2 (raw file):

seek-lt d
----
a-b#2

Could the result of the iteration also print the file number? It makes it easy to see when the iter moves to a new file.

internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):

----
.
b-c#2

Why isn't aa returned for the first SeekGE if it's equal to the lower bound? We go past it, but then on prev, we return it?

itsbilal

TFTR!

Reviewable status: 11 of 20 files reviewed, 4 unresolved discussions (waiting on @jbowens and @nicktrav)

internal/base/options.go, line 70 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Should we move the collector in here too? Can be a follow-up.

The filter seems like it'll be used/referenced in more places than the collector so maybe that's less necessary? I'm good with moving it if it puts everything in one place.

internal/keyspan/iter.go, line 52 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Could these say "smallest user key"?

Done.

internal/keyspan/iter.go, line 69 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Could this be BlockFilters or Filters? The Range part is implicit given the struct name.

Done.

internal/keyspan/level_iter.go, line 5 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

This is a bit of a bikeshed comment, and not a strong opinion by any means: we also have the rangekey package? Given the iterator has range key specific behavior, should it be in that package instead? I was under the impression that keyspan is more general.

My understanding was that anything that's abstract enough to deal only with Spans should be in keyspan, while things that concern rangekey-specific encoding and all should be in rangekey. It's why we also have a merging_iter in keyspan, and since the eventual goal is to also be able to use this iter for range deletes, I think it makes sense here?

internal/keyspan/level_iter.go, line 31 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Is it worth a brief comment on the lifetimes of these (either here or in the RangeIterOptions)? I realize we don't do that on the top level version, but it might be helpful.

Done. It should mimic the point key level_iter, but it's never a bad idea to add a comment about lifetimes!

internal/keyspan/level_iter.go, line 53 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Do we want to add in InternalIteratorStats like we have in level_iter.go at the top level?

I'll add a TODO For this now, but it'll make sense to do this at the same time as merging_iter gets stats?

internal/keyspan/level_iter.go, line 289 at r3 (raw file):

Previously, nicktrav (Nick Travers) wrote…

I had a comment elsewhere about lifetimes. Seems more relevant to state that it shouldn't be mutated if we're reusing that lower / upper bound byte slice in keys that get returned.

Done.

internal/keyspan/testdata/level_iter, line 25 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Could the result of the iteration also print the file number? It makes it easy to see when the iter moves to a new file.

Done.

internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Why isn't aa returned for the first SeekGE if it's equal to the lower bound? We go past it, but then on prev, we return it?

The actual reason is that the underlying fragment is a-b; it's just being truncated by the bound to aa-b, so SeekGE(aa) actually returns the next fragment. But I agree, this is kinda odd and I'm not 100% sure what the right behaviour should be in this case.

We seek based on the start keys only in all iterators right now; should we change that logic to account for start bounds falling in between existing fragments? cc @jbowens . I'm particularly worried about, say, a First at a toplevel Iterator being transformed to a SeekGE at the level iterator that you can then Prev from and land at the range key covering the start bound that you're probably interested in.

jbowens

Looking great! Sorry for all the churn with the changing FragmentIterator interface.

Reviewed 3 of 11 files at r2, 4 of 17 files at r4, all commit messages.
Reviewable status: 7 of 20 files reviewed, 13 unresolved discussions (waiting on @itsbilal and @nicktrav)

internal/keyspan/iter.go, line 68 at r4 (raw file):

	// false to skip scanning. This function must be thread-safe since the same
	// function can be used by multiple iterators, if the iterator is cloned.
	TableFilter func(userProps map[string]string) bool

I'm wondering if we should omit TableFilter. Block property filters also offer the ability to filter whole tables, and it seems like we should eventually deprecate and remove the TableFilter option from IterOptions. For now, we could redefine it as only pertaining to point reads.

internal/keyspan/level_iter.go, line 5 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

My understanding was that anything that's abstract enough to deal only with Spans should be in keyspan, while things that concern rangekey-specific encoding and all should be in rangekey. It's why we also have a merging_iter in keyspan, and since the eventual goal is to also be able to use this iter for range deletes, I think it makes sense here?

Yeah, I think keyspan makes sense given where we're going.

internal/keyspan/level_iter.go, line 48 at r4 (raw file):

	err      error

	tableOpts RangeIterOptions

RangeIterOptions also has LowerBound and UpperBound, in addition to the above lower, upper. What's the relationship and should we document it here?

internal/keyspan/level_iter.go, line 138 at r4 (raw file):

func (l *LevelIter) findFileGE(key []byte) *manifest.FileMetadata {
	// Find the earliest file whose largest key is >= ikey.

nit: s/ikey/key/ here and below

internal/keyspan/level_iter.go, line 156 at r4 (raw file):

func (l *LevelIter) findFileLT(key []byte) *manifest.FileMetadata {
	// Find the last file whose smallest key is < ikey.

nit: s/ikey/key/

internal/keyspan/level_iter.go, line 167 at r4 (raw file):

	l.tableOpts.UpperBound = l.upper
	if l.tableOpts.LowerBound != nil {
		if l.cmp(f.Largest.UserKey, l.tableOpts.LowerBound) < 0 {

Does this need to also handle the f.Largest.IsExclusiveSentinel() case? On second thought, should we be comparing using f.LargestRangeKey for which f.Largest.IsExclusiveSentinel() is necessarily true, and so this can be <= 0 (with an explanatory comment)?

internal/keyspan/level_iter.go, line 171 at r4 (raw file):

			return -1
		}
		if l.cmp(l.tableOpts.LowerBound, f.Smallest.UserKey) <= 0 {

Same question here (and throughout initTableBounds) about using the range key bounds.

internal/keyspan/level_iter.go, line 210 at r4 (raw file):

			// We don't bother comparing the file bounds with the iteration bounds when we have
			// an already open iterator. It is possible that the iter may not be relevant given the
			// current iteration bounds, but it knows those bounds, so it will enforce them.

Is that true right now? l.iter is an iterator constructed from (*sstable.Reader).NewRawRangeKeyIter, right? And that iterator doesn't (yet) enforce bounds. I think we could push that bound checking into the fragmentBlockIter and make this comment correct.

internal/keyspan/level_iter.go, line 228 at r4 (raw file):

			return noFileLoaded
		}
		if !file.HasRangeKeys {

Is this possible now that the manifest.LevelIterator is constructed with Filter(manifest.KeyTypeRange)?

internal/keyspan/level_iter.go, line 287 at r4 (raw file):

// truncateLowerBound truncates l.span's start key according to the lower bound.
// Expects that l.span has already been populated with the current key.

I think we should drop truncation to the bounds and only check that the span does overlap the key range [lower, upper). The interleaving iterator at the top of the iterator stack will perform truncation, which avoids needing to do it at each of these leaf iterators.

internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The actual reason is that the underlying fragment is a-b; it's just being truncated by the bound to aa-b, so SeekGE(aa) actually returns the next fragment. But I agree, this is kinda odd and I'm not 100% sure what the right behaviour should be in this case.

We seek based on the start keys only in all iterators right now; should we change that logic to account for start bounds falling in between existing fragments? cc @jbowens . I'm particularly worried about, say, a First at a toplevel Iterator being transformed to a SeekGE at the level iterator that you can then Prev from and land at the range key covering the start bound that you're probably interested in.

Yeah, currently the interface of FragmentIterator.SeekGE(k) dictates that it should seek to the first span whose start key is ≥ k. This is different than the more intuitive semantics of the rangekey.Iter for which SeekGE(k) seeks to the first span that covers a key > k.

For now, I think we should implement this level iterator to adhere to the existing FragmentIterator semantics. The higher levels of the iterator stack depend on them.

I think longer term we should consider changing the interface to use the rangekey.Iter-style semantics. It's unclear to me right now whether the change would increase or decrease key comparisons across the iterator stack. Leaf iterators would need to perform additional key comparisons, but higher level iterators would avoid oddities like MergingIter.SeekLT's seek and prev. It would also remove the need for the keyspan.SeekGE, keyspan.SeekLE, keyspan.Get functions. But I think that should be left for its own PR.

nicktrav

Closed out all my outstanding questions. I'll wait for the rebase before taking another look. 👍

Reviewed 10 of 10 files at r1, 17 of 17 files at r4, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @itsbilal)

When reading range keys directly from sstables, we will need to instantiate levelIters for each level that contains sstables that could contain range keys. This change adds a levelIter that implements keyspan.FragmentIterator and takes advantage of level invariants to only have one sstable's range key block iter open at once. On top of that, as fragmentBoundIterator does not currently support SetBounds or truncation of spans based on iterator bounds, those features are provided by LevelIter. Also moves TableNewRangeKeyIter from the base package to internal/keyspan.

itsbilal

Reviewable status: 12 of 21 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)

internal/keyspan/iter.go, line 68 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm wondering if we should omit TableFilter. Block property filters also offer the ability to filter whole tables, and it seems like we should eventually deprecate and remove the TableFilter option from IterOptions. For now, we could redefine it as only pertaining to point reads.

Done.

internal/keyspan/level_iter.go, line 48 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

RangeIterOptions also has LowerBound and UpperBound, in addition to the above lower, upper. What's the relationship and should we document it here?

The relationship is similar to the point key levelIter. It's a little odd so I wrote a comment about it. Thanks!

internal/keyspan/level_iter.go, line 167 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Does this need to also handle the f.Largest.IsExclusiveSentinel() case? On second thought, should we be comparing using f.LargestRangeKey for which f.Largest.IsExclusiveSentinel() is necessarily true, and so this can be <= 0 (with an explanatory comment)?

Done. Take a look - just to ensure I understood your comment correctly?

internal/keyspan/level_iter.go, line 171 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Same question here (and throughout initTableBounds) about using the range key bounds.

Done.

internal/keyspan/level_iter.go, line 210 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is that true right now? l.iter is an iterator constructed from (*sstable.Reader).NewRawRangeKeyIter, right? And that iterator doesn't (yet) enforce bounds. I think we could push that bound checking into the fragmentBlockIter and make this comment correct.

Oops, this is a stale comment. Removed and updated the code here. There's already a todo to move the bound checking to fragmentBlockIter.

internal/keyspan/level_iter.go, line 228 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is this possible now that the manifest.LevelIterator is constructed with Filter(manifest.KeyTypeRange)?

Good point, this should not happen. Done.

internal/keyspan/level_iter.go, line 287 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think we should drop truncation to the bounds and only check that the span does overlap the key range [lower, upper). The interleaving iterator at the top of the iterator stack will perform truncation, which avoids needing to do it at each of these leaf iterators.

Done.

itsbilal · 2022-03-25T21:46:03Z

TFTRs! Made the rebase for the big FragmentIterator API change, and removed span truncation - this should be ready for another look now.

jbowens

Looks great!

Reviewed 3 of 9 files at r5.
Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)

internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):

		name   string
		levels []level
	}{

Should these be datadriven test cases using ParseSpan to construct the spans? Fine for a TODO.

itsbilal

TFTR! Biggest todo that remains is to return empty spans until the next file instead of opening the next file, so I'll get on that soon.

Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion (waiting on @jbowens and @nicktrav)

internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Should these be datadriven test cases using ParseSpan to construct the spans? Fine for a TODO.

TestLevelIter below is already like that? This one is more of a "run the same suite through two different iters". I could datadriven-it, but the output would be more like "ok", so it feels less necessary to convert. What do you think?

itsbilal · 2022-03-28T16:13:38Z

Merged - discussed with @nicktrav , he's good with reviewing post-merge and I'll address any points afterwards.

jbowens

Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion

internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

TestLevelIter below is already like that? This one is more of a "run the same suite through two different iters". I could datadriven-it, but the output would be more like "ok", so it feels less necessary to convert. What do you think?

I'm just not a fan of these large struct test case definitions that only contain data. The goal to converting it to a datadriven test would be to make the data in the test cases more legible.

itsbilal · 2022-03-28T16:17:25Z

internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm just not a fan of these large struct test case definitions that only contain data. The goal to converting it to a datadriven test would be to make the data in the test cases more legible.

Ah I see. That's understandable. I could also replace the spans with strings. Also this test will have to change significantly in the light of #1605 so I'll revisit this when we get to that.

nicktrav

Did another pass - stil

Reviewed 9 of 9 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion

sumeerbhola

Reviewed 1 of 11 files at r2, 1 of 17 files at r4, 2 of 9 files at r5.
Reviewable status: all files reviewed, 4 unresolved discussions

internal/keyspan/level_iter.go, line 31 at r5 (raw file):

	// recent call to SetBounds. These slices could point to the same underlying
	// memory as {Lower,Upper}Bounds in the passed-in RangeIterOptions, and also
	// returned as part of Spans when the iterator is at bounds. While the

this "returned as part of Spans" was not true prior to range keys.
I think this is still consistent with what we do in pebbleIterator.{lowerBoundBuf,upperBoundBuf} since someone who retrieved a range key from an iterator should know (just like a point key) that it is valid only until the next thing done to the iterator.
But we've had subtle bugs in the past so we should try to do more testing of the kind we do with storage.unsafeMVCCIterator, but at a lower level (in Pebble).

internal/keyspan/level_iter.go, line 122 at r5 (raw file):

	//
	// If the earliest file has its largest key == key and that largest key is a
	// range deletion sentinel, we know that we manufactured this sentinel to convert

Range keys are different from range deletes, and IIUC this iter is only for range keys. Should this comment be reworded?

internal/keyspan/level_iter.go, line 130 at r5 (raw file):

	m := l.files.SeekGE(l.cmp, key)
	for m != nil && m.LargestRangeKey.IsExclusiveSentinel() &&
		l.cmp(m.LargestRangeKey.UserKey, key) == 0 {

what if there are no range keys in this file? We would want to skip it. What am I missing?

itsbilal self-assigned this Mar 11, 2022

itsbilal requested a review from jbowens March 11, 2022 17:44

itsbilal requested review from a team and nicktrav March 11, 2022 17:45

itsbilal force-pushed the leveliter-rangekeys branch from 70148e5 to 5ad67e2 Compare March 11, 2022 18:31

nicktrav reviewed Mar 11, 2022

View reviewed changes

itsbilal force-pushed the leveliter-rangekeys branch from 5ad67e2 to 48093ef Compare March 23, 2022 16:18

itsbilal commented Mar 23, 2022

View reviewed changes

jbowens reviewed Mar 24, 2022

View reviewed changes

nicktrav reviewed Mar 24, 2022

View reviewed changes

itsbilal force-pushed the leveliter-rangekeys branch from 48093ef to 1470ffe Compare March 25, 2022 21:45

itsbilal commented Mar 25, 2022

View reviewed changes

jbowens approved these changes Mar 28, 2022

View reviewed changes

jbowens mentioned this pull request Mar 28, 2022

db: optimize for reads without range keys #1605

Closed

itsbilal commented Mar 28, 2022

View reviewed changes

itsbilal merged commit 0b5f42a into cockroachdb:master Mar 28, 2022

jbowens reviewed Mar 28, 2022

View reviewed changes

nicktrav reviewed Mar 28, 2022

View reviewed changes

sumeerbhola reviewed Mar 29, 2022

View reviewed changes

Conversation

itsbilal commented Mar 11, 2022

Uh oh!

cockroach-teamcity commented Mar 11, 2022

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal commented Mar 25, 2022

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal commented Mar 28, 2022

Uh oh!

jbowens left a comment

Choose a reason for hiding this comment

Uh oh!

itsbilal commented Mar 28, 2022

Uh oh!

nicktrav left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants