internal/keyspan: Add LevelIter for range keys #1576
internal/keyspan: Add LevelIter for range keys #1576itsbilal merged 1 commit intocockroachdb:masterfrom
Conversation
70148e5 to
5ad67e2
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 11 of 11 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: 11 of 20 files reviewed, 12 unresolved discussions (waiting on @itsbilal and @jbowens)
internal/base/options.go, line 70 at r2 (raw file):
// within the sstable. It should not maintain any per-sstable state, and must // be thread-safe. type BlockPropertyFilter interface {
Should we move the collector in here too? Can be a follow-up.
internal/keyspan/iter.go, line 52 at r2 (raw file):
// per-sstable range key iterators. type RangeIterOptions struct { // LowerBound specifies the smallest key (inclusive) that the iterator will
Could these say "smallest user key"?
internal/keyspan/iter.go, line 69 at r2 (raw file):
// RangeKeyFilters can be usefd to avoid scanning tables and blocks in tables // when iterating over range keys. RangeKeyFilters []base.BlockPropertyFilter
Could this be BlockFilters or Filters? The Range part is implicit given the struct name.
internal/keyspan/level_iter.go, line 5 at r2 (raw file):
// the LICENSE file. package keyspan
This is a bit of a bikeshed comment, and not a strong opinion by any means: we also have the rangekey package? Given the iterator has range key specific behavior, should it be in that package instead? I was under the impression that keyspan is more general.
internal/keyspan/level_iter.go, line 31 at r2 (raw file):
// The lower/upper bounds for iteration as specified at creation or the most // recent call to SetBounds. lower []byte
Is it worth a brief comment on the lifetimes of these (either here or in the RangeIterOptions)? I realize we don't do that on the top level version, but it might be helpful.
internal/keyspan/level_iter.go, line 53 at r2 (raw file):
// operations that could make the iterator valid again. exhaustedBounds bool }
Do we want to add in InternalIteratorStats like we have in level_iter.go at the top level?
internal/keyspan/level_iter.go, line 135 at r2 (raw file):
func (l *LevelIter) findFileGE(key []byte) *manifest.FileMetadata { // Find the earliest file whose largest key is >= ikey.
There's some additional commentary in the top level level_iter.go around range del sentinels that would be useful to put here too.
internal/keyspan/level_iter.go, line 219 at r2 (raw file):
} if !file.HasRangeKeys { // Skip empty file.
nit: it's not necessarily "empty"? Just has no range keys.
internal/keyspan/level_iter.go, line 246 at r2 (raw file):
} var iter FragmentIterator
Can we add a comment here for completeness for the 0 case? i.e. "This table overlaps the iteration bounds"?
internal/keyspan/level_iter.go, line 289 at r3 (raw file):
if l.cmp(ikey.UserKey, l.tableOpts.LowerBound) < 0 { // Truncate the start key at the lower bound. ikey.UserKey = l.tableOpts.LowerBound
I had a comment elsewhere about lifetimes. Seems more relevant to state that it shouldn't be mutated if we're reusing that lower / upper bound byte slice in keys that get returned.
internal/keyspan/testdata/level_iter, line 25 at r2 (raw file):
seek-lt d ---- a-b#2
Could the result of the iteration also print the file number? It makes it easy to see when the iter moves to a new file.
internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):
---- . b-c#2
Why isn't aa returned for the first SeekGE if it's equal to the lower bound? We go past it, but then on prev, we return it?
5ad67e2 to
48093ef
Compare
itsbilal
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status: 11 of 20 files reviewed, 4 unresolved discussions (waiting on @jbowens and @nicktrav)
internal/base/options.go, line 70 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Should we move the collector in here too? Can be a follow-up.
The filter seems like it'll be used/referenced in more places than the collector so maybe that's less necessary? I'm good with moving it if it puts everything in one place.
internal/keyspan/iter.go, line 52 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Could these say "smallest user key"?
Done.
internal/keyspan/iter.go, line 69 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Could this be
BlockFiltersorFilters? TheRangepart is implicit given the struct name.
Done.
internal/keyspan/level_iter.go, line 5 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
This is a bit of a bikeshed comment, and not a strong opinion by any means: we also have the
rangekeypackage? Given the iterator has range key specific behavior, should it be in that package instead? I was under the impression thatkeyspanis more general.
My understanding was that anything that's abstract enough to deal only with Spans should be in keyspan, while things that concern rangekey-specific encoding and all should be in rangekey. It's why we also have a merging_iter in keyspan, and since the eventual goal is to also be able to use this iter for range deletes, I think it makes sense here?
internal/keyspan/level_iter.go, line 31 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Is it worth a brief comment on the lifetimes of these (either here or in the
RangeIterOptions)? I realize we don't do that on the top level version, but it might be helpful.
Done. It should mimic the point key level_iter, but it's never a bad idea to add a comment about lifetimes!
internal/keyspan/level_iter.go, line 53 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Do we want to add in
InternalIteratorStatslike we have inlevel_iter.goat the top level?
I'll add a TODO For this now, but it'll make sense to do this at the same time as merging_iter gets stats?
internal/keyspan/level_iter.go, line 289 at r3 (raw file):
Previously, nicktrav (Nick Travers) wrote…
I had a comment elsewhere about lifetimes. Seems more relevant to state that it shouldn't be mutated if we're reusing that lower / upper bound byte slice in keys that get returned.
Done.
internal/keyspan/testdata/level_iter, line 25 at r2 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Could the result of the iteration also print the file number? It makes it easy to see when the iter moves to a new file.
Done.
internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Why isn't
aareturned for the firstSeekGEif it's equal to the lower bound? We go past it, but then on prev, we return it?
The actual reason is that the underlying fragment is a-b; it's just being truncated by the bound to aa-b, so SeekGE(aa) actually returns the next fragment. But I agree, this is kinda odd and I'm not 100% sure what the right behaviour should be in this case.
We seek based on the start keys only in all iterators right now; should we change that logic to account for start bounds falling in between existing fragments? cc @jbowens . I'm particularly worried about, say, a First at a toplevel Iterator being transformed to a SeekGE at the level iterator that you can then Prev from and land at the range key covering the start bound that you're probably interested in.
jbowens
left a comment
There was a problem hiding this comment.
Looking great! Sorry for all the churn with the changing FragmentIterator interface.
Reviewed 3 of 11 files at r2, 4 of 17 files at r4, all commit messages.
Reviewable status: 7 of 20 files reviewed, 13 unresolved discussions (waiting on @itsbilal and @nicktrav)
internal/keyspan/iter.go, line 68 at r4 (raw file):
// false to skip scanning. This function must be thread-safe since the same // function can be used by multiple iterators, if the iterator is cloned. TableFilter func(userProps map[string]string) bool
I'm wondering if we should omit TableFilter. Block property filters also offer the ability to filter whole tables, and it seems like we should eventually deprecate and remove the TableFilter option from IterOptions. For now, we could redefine it as only pertaining to point reads.
internal/keyspan/level_iter.go, line 5 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
My understanding was that anything that's abstract enough to deal only with
Spans should be in keyspan, while things that concern rangekey-specific encoding and all should be inrangekey. It's why we also have amerging_iterin keyspan, and since the eventual goal is to also be able to use this iter for range deletes, I think it makes sense here?
Yeah, I think keyspan makes sense given where we're going.
internal/keyspan/level_iter.go, line 48 at r4 (raw file):
err error tableOpts RangeIterOptions
RangeIterOptions also has LowerBound and UpperBound, in addition to the above lower, upper. What's the relationship and should we document it here?
internal/keyspan/level_iter.go, line 138 at r4 (raw file):
func (l *LevelIter) findFileGE(key []byte) *manifest.FileMetadata { // Find the earliest file whose largest key is >= ikey.
nit: s/ikey/key/ here and below
internal/keyspan/level_iter.go, line 156 at r4 (raw file):
func (l *LevelIter) findFileLT(key []byte) *manifest.FileMetadata { // Find the last file whose smallest key is < ikey.
nit: s/ikey/key/
internal/keyspan/level_iter.go, line 167 at r4 (raw file):
l.tableOpts.UpperBound = l.upper if l.tableOpts.LowerBound != nil { if l.cmp(f.Largest.UserKey, l.tableOpts.LowerBound) < 0 {
Does this need to also handle the f.Largest.IsExclusiveSentinel() case? On second thought, should we be comparing using f.LargestRangeKey for which f.Largest.IsExclusiveSentinel() is necessarily true, and so this can be <= 0 (with an explanatory comment)?
internal/keyspan/level_iter.go, line 171 at r4 (raw file):
return -1 } if l.cmp(l.tableOpts.LowerBound, f.Smallest.UserKey) <= 0 {
Same question here (and throughout initTableBounds) about using the range key bounds.
internal/keyspan/level_iter.go, line 210 at r4 (raw file):
// We don't bother comparing the file bounds with the iteration bounds when we have // an already open iterator. It is possible that the iter may not be relevant given the // current iteration bounds, but it knows those bounds, so it will enforce them.
Is that true right now? l.iter is an iterator constructed from (*sstable.Reader).NewRawRangeKeyIter, right? And that iterator doesn't (yet) enforce bounds. I think we could push that bound checking into the fragmentBlockIter and make this comment correct.
internal/keyspan/level_iter.go, line 228 at r4 (raw file):
return noFileLoaded } if !file.HasRangeKeys {
Is this possible now that the manifest.LevelIterator is constructed with Filter(manifest.KeyTypeRange)?
internal/keyspan/level_iter.go, line 287 at r4 (raw file):
// truncateLowerBound truncates l.span's start key according to the lower bound. // Expects that l.span has already been populated with the current key.
I think we should drop truncation to the bounds and only check that the span does overlap the key range [lower, upper). The interleaving iterator at the top of the iterator stack will perform truncation, which avoids needing to do it at each of these leaf iterators.
internal/keyspan/testdata/level_iter, line 131 at r3 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
The actual reason is that the underlying fragment is
a-b; it's just being truncated by the bound toaa-b, soSeekGE(aa)actually returns the next fragment. But I agree, this is kinda odd and I'm not 100% sure what the right behaviour should be in this case.We seek based on the start keys only in all iterators right now; should we change that logic to account for start bounds falling in between existing fragments? cc @jbowens . I'm particularly worried about, say, a
Firstat a toplevel Iterator being transformed to aSeekGEat the level iterator that you can thenPrevfrom and land at the range key covering the start bound that you're probably interested in.
Yeah, currently the interface of FragmentIterator.SeekGE(k) dictates that it should seek to the first span whose start key is ≥ k. This is different than the more intuitive semantics of the rangekey.Iter for which SeekGE(k) seeks to the first span that covers a key > k.
For now, I think we should implement this level iterator to adhere to the existing FragmentIterator semantics. The higher levels of the iterator stack depend on them.
I think longer term we should consider changing the interface to use the rangekey.Iter-style semantics. It's unclear to me right now whether the change would increase or decrease key comparisons across the iterator stack. Leaf iterators would need to perform additional key comparisons, but higher level iterators would avoid oddities like MergingIter.SeekLT's seek and prev. It would also remove the need for the keyspan.SeekGE, keyspan.SeekLE, keyspan.Get functions. But I think that should be left for its own PR.
nicktrav
left a comment
There was a problem hiding this comment.
Closed out all my outstanding questions. I'll wait for the rebase before taking another look. 👍
Reviewed 10 of 10 files at r1, 17 of 17 files at r4, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @itsbilal)
When reading range keys directly from sstables, we will need to instantiate levelIters for each level that contains sstables that could contain range keys. This change adds a levelIter that implements keyspan.FragmentIterator and takes advantage of level invariants to only have one sstable's range key block iter open at once. On top of that, as fragmentBoundIterator does not currently support SetBounds or truncation of spans based on iterator bounds, those features are provided by LevelIter. Also moves TableNewRangeKeyIter from the base package to internal/keyspan.
48093ef to
1470ffe
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status: 12 of 21 files reviewed, 7 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)
internal/keyspan/iter.go, line 68 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm wondering if we should omit
TableFilter. Block property filters also offer the ability to filter whole tables, and it seems like we should eventually deprecate and remove theTableFilteroption fromIterOptions. For now, we could redefine it as only pertaining to point reads.
Done.
internal/keyspan/level_iter.go, line 48 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
RangeIterOptionsalso hasLowerBoundandUpperBound, in addition to the abovelower,upper. What's the relationship and should we document it here?
The relationship is similar to the point key levelIter. It's a little odd so I wrote a comment about it. Thanks!
internal/keyspan/level_iter.go, line 167 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Does this need to also handle the
f.Largest.IsExclusiveSentinel()case? On second thought, should we be comparing usingf.LargestRangeKeyfor whichf.Largest.IsExclusiveSentinel()is necessarily true, and so this can be<= 0(with an explanatory comment)?
Done. Take a look - just to ensure I understood your comment correctly?
internal/keyspan/level_iter.go, line 171 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Same question here (and throughout
initTableBounds) about using the range key bounds.
Done.
internal/keyspan/level_iter.go, line 210 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is that true right now?
l.iteris an iterator constructed from(*sstable.Reader).NewRawRangeKeyIter, right? And that iterator doesn't (yet) enforce bounds. I think we could push that bound checking into thefragmentBlockIterand make this comment correct.
Oops, this is a stale comment. Removed and updated the code here. There's already a todo to move the bound checking to fragmentBlockIter.
internal/keyspan/level_iter.go, line 228 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is this possible now that the
manifest.LevelIteratoris constructed withFilter(manifest.KeyTypeRange)?
Good point, this should not happen. Done.
internal/keyspan/level_iter.go, line 287 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think we should drop truncation to the bounds and only check that the span does overlap the key range [lower, upper). The interleaving iterator at the top of the iterator stack will perform truncation, which avoids needing to do it at each of these leaf iterators.
Done.
|
TFTRs! Made the rebase for the big |
jbowens
left a comment
There was a problem hiding this comment.
Looks great!
Reviewed 3 of 9 files at r5.
Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @nicktrav)
internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):
name string levels []level }{
Should these be datadriven test cases using ParseSpan to construct the spans? Fine for a TODO.
itsbilal
left a comment
There was a problem hiding this comment.
TFTR! Biggest todo that remains is to return empty spans until the next file instead of opening the next file, so I'll get on that soon.
Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion (waiting on @jbowens and @nicktrav)
internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Should these be datadriven test cases using
ParseSpanto construct the spans? Fine for a TODO.
TestLevelIter below is already like that? This one is more of a "run the same suite through two different iters". I could datadriven-it, but the output would be more like "ok", so it feels less necessary to convert. What do you think?
|
Merged - discussed with @nicktrav , he's good with reviewing post-merge and I'll address any points afterwards. |
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 15 of 21 files reviewed, 1 unresolved discussion
internal/keyspan/level_iter_test.go, line 24 at r5 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
TestLevelIterbelow is already like that? This one is more of a "run the same suite through two different iters". I could datadriven-it, but the output would be more like "ok", so it feels less necessary to convert. What do you think?
I'm just not a fan of these large struct test case definitions that only contain data. The goal to converting it to a datadriven test would be to make the data in the test cases more legible.
|
internal/keyspan/level_iter_test.go, line 24 at r5 (raw file): Previously, jbowens (Jackson Owens) wrote…
Ah I see. That's understandable. I could also replace the spans with strings. Also this test will have to change significantly in the light of #1605 so I'll revisit this when we get to that. |
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 11 files at r2, 1 of 17 files at r4, 2 of 9 files at r5.
Reviewable status: all files reviewed, 4 unresolved discussions
internal/keyspan/level_iter.go, line 31 at r5 (raw file):
// recent call to SetBounds. These slices could point to the same underlying // memory as {Lower,Upper}Bounds in the passed-in RangeIterOptions, and also // returned as part of Spans when the iterator is at bounds. While the
this "returned as part of Spans" was not true prior to range keys.
I think this is still consistent with what we do in pebbleIterator.{lowerBoundBuf,upperBoundBuf} since someone who retrieved a range key from an iterator should know (just like a point key) that it is valid only until the next thing done to the iterator.
But we've had subtle bugs in the past so we should try to do more testing of the kind we do with storage.unsafeMVCCIterator, but at a lower level (in Pebble).
internal/keyspan/level_iter.go, line 122 at r5 (raw file):
// // If the earliest file has its largest key == key and that largest key is a // range deletion sentinel, we know that we manufactured this sentinel to convert
Range keys are different from range deletes, and IIUC this iter is only for range keys. Should this comment be reworded?
internal/keyspan/level_iter.go, line 130 at r5 (raw file):
m := l.files.SeekGE(l.cmp, key) for m != nil && m.LargestRangeKey.IsExclusiveSentinel() && l.cmp(m.LargestRangeKey.UserKey, key) == 0 {
what if there are no range keys in this file? We would want to skip it. What am I missing?
When reading range keys directly from sstables, we will
need to instantiate levelIters for each level that contains
sstables that could contain range keys. This change adds
a levelIter that implements keyspan.FragmentIterator and
takes advantage of level invariants to only have one sstable's
range key block iter open at once. On top of that,
as fragmentBoundIterator does not currently support
SetBounds or truncation of spans based on iterator bounds, those
features are provided by LevelIter.
Also moves TableNewRangeKeyIter from the base package to
internal/keyspan.
First commit is #1568 .