batch: add RangeKey{Set,Unset,Delete} methods#1386
batch: add RangeKey{Set,Unset,Delete} methods#1386jbowens merged 1 commit intocockroachdb:masterfrom
Conversation
5a01fd2 to
ba28d5c
Compare
806d547 to
0463e08
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
I didn't finish reading -- flushing some comments.
Reviewed 7 of 15 files at r1.
Reviewable status: 7 of 15 files reviewed, 4 unresolved discussions (waiting on @jbowens and @nicktrav)
format_major_version.go, line 73 at r1 (raw file):
FormatBlockPropertyCollector // FormatRangeKeys is a format major version that introduces range keys. FormatRangeKeys
Since support for range keys will be spread across many PRs, should the format major version be introduced once the code is complete? Meanwhile we can just mark the public interfaces with "DO NOT USE" in case they are partially usable.
internal/rangekey/rangekey.go, line 29 at r1 (raw file):
// EncodedSetValueLen precomputes the length of a RangeKeySet's value when // encoded. It may be used to construct a buffer of the appropriate size before // encoding.
Can you add a comment at the top of this file that specifies that RANGEKEYSET [start, end), Set of SuffixValue pairs is encoded as
RANGEKEYSET, start => end, Set of SuffixValue pairs.
internal/rangekey/rangekey.go, line 36 at r1 (raw file):
// Add len(varint(len(endKey))). x := uint32(len(endKey)) n++
is there any reason to believe that we should be inlining instead of defining func lenVarint(x uint64) int?
internal/rangekey/rangekey.go, line 88 at r1 (raw file):
// Encode the value itself. n += copy(dst[n:], suffixValues[i].Value)
This encoding looks simpler than what was mentioned in the design doc. Are you leaving that for later, if needed? If yes, that sounds fine.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 15 of 15 files at r1, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @jbowens and @sumeerbhola)
batch.go, line 220 at r1 (raw file):
countRangeDels uint64 // The count of range sets and unsets in the batch. Updated every time a
nit: count of range sets, unsets and deletes ...?
batch.go, line 669 at r1 (raw file):
} // RangeKeySet sets a range key mapping the key range [start, end) at the MVCC
Just a general comment on references to MVCC. My understanding is that the support we're adding is more general than MVCC timestamps? i.e. a suffix doesn't necessarily have to be an MVCC ts.
Is it worth referring to plain suffixes rather than talking about MVCC?
Same comment applies elsewhere.
batch.go, line 700 at r1 (raw file):
Quoted 9 lines of code…
b.countRangeKeys++ if b.index != nil { b.rangeKeys = nil // Range keys are rare, so we lazily allocate the index for them. if b.rangeKeyIndex == nil { b.rangeKeyIndex = batchskl.NewSkiplist(&b.data, b.cmp, b.abbreviatedKey) } b.deferredOp.index = b.rangeKeyIndex }
It looks like we repeat this chunk across the three deferred ops. Worth extracting out to a helper?
batch.go, line 718 at r1 (raw file):
... RangeKeyUnset returns.
batch_test.go, line 50 at r1 (raw file):
// RangeKeySet and RangeKeyUnset are untested here because they don't expose // deferred variants. This is a consequence of these keys more complex value
nit: keys' (possessive?). Or just ... consequence of these keys having more complex ....
internal/base/internal_test.go, line 40 at r1 (raw file):
"foo", "foo\x08\x07\x06\x05\x04\x03\x02", "foo\x16\x07\x06\x05\x04\x03\x02\x01",
nit: Thoughts on making this less brittle by making the value base16(InternalKeyKindMax)?
internal/rangekey/rangekey.go, line 77 at r1 (raw file):
// Encode the list of (suffix, value-len) tuples. for i := 0; i < len(suffixValues); i++ {
nit: if you were to for _, sv := range suffixValues here, we avoid indexing into the slice four times. Not sure that's a massive win, but slightly less verbose.
internal/rangekey/rangekey.go, line 88 at r1 (raw file):
Previously, sumeerbhola wrote…
This encoding looks simpler than what was mentioned in the design doc. Are you leaving that for later, if needed? If yes, that sounds fine.
On this point, is it worth documenting the encoding somewhere in this file? I didn't see it mentioned.
internal/rangekey/rangekey.go, line 119 at r1 (raw file):
func DecodeSuffixValue(data []byte) (sv SuffixValue, rest []byte, ok bool) { // Decode the suffix. sv.Suffix, data, ok = decodeVarstring(data)
Worth documenting that we're mutating data? The caller should be aware of this.
internal/rangekey/rangekey.go, line 269 at r1 (raw file):
// // For example: // - a.RANGEKEYSET.5: c [(@t10, foo), (@t9, bar)]
nit: This seems at odds with the format definition. Should it be [(@t10=foo), (@t9=bar)] instead? Or is the format definition [(s1, v1), (s2, v2), (s3, v3)]
0463e08 to
8b9e0ed
Compare
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
batch.go, line 669 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Just a general comment on references to MVCC. My understanding is that the support we're adding is more general than MVCC timestamps? i.e. a suffix doesn't necessarily have to be an MVCC ts.
Is it worth referring to plain suffixes rather than talking about MVCC?
Same comment applies elsewhere.
We actually already document Split as splitting a MVCC key into a user key and an encoded timestamp:
// Split returns the length of the prefix of the user key that corresponds to
// the key portion of an MVCC encoding scheme to enable the use of prefix bloom
// filters.
In some of the back-and-forth on the RFC draft, we came to the conclusion that Pebble will likely increasingly depend on MVCC semantics of suffixes.
batch.go, line 700 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
b.countRangeKeys++ if b.index != nil { b.rangeKeys = nil // Range keys are rare, so we lazily allocate the index for them. if b.rangeKeyIndex == nil { b.rangeKeyIndex = batchskl.NewSkiplist(&b.data, b.cmp, b.abbreviatedKey) } b.deferredOp.index = b.rangeKeyIndex }It looks like we repeat this chunk across the three deferred ops. Worth extracting out to a helper?
Sure. This was mirroring the manual inlining used throughout the batch implementation, but our range key design requires range keys to be rare.
batch.go, line 718 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
... RangeKeyUnset returns.
Done.
format_major_version.go, line 73 at r1 (raw file):
Previously, sumeerbhola wrote…
Since support for range keys will be spread across many PRs, should the format major version be introduced once the code is complete? Meanwhile we can just mark the public interfaces with "DO NOT USE" in case they are partially usable.
Sure, done.
internal/rangekey/rangekey.go, line 29 at r1 (raw file):
Previously, sumeerbhola wrote…
Can you add a comment at the top of this file that specifies that RANGEKEYSET [start, end), Set of SuffixValue pairs is encoded as
RANGEKEYSET, start => end, Set of SuffixValue pairs.
Done.
internal/rangekey/rangekey.go, line 36 at r1 (raw file):
Previously, sumeerbhola wrote…
is there any reason to believe that we should be inlining instead of defining
func lenVarint(x uint64) int?
factored it out: just premature optimization
internal/rangekey/rangekey.go, line 88 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
On this point, is it worth documenting the encoding somewhere in this file? I didn't see it mentioned.
I implemented the more complicated encoding first and was dissatisfied with the encoding/decoding API that it required. Yeah, I figured if needed, we can switch back to the more complicated scheme.
I'll add a TODO for documenting the encoding so that we can document once we've made that decision.
internal/rangekey/rangekey.go, line 119 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Worth documenting that we're mutating
data? The caller should be aware of this.
The data slice that the caller provides is unmodified, we just save a resliced tail of it to our data local variable.
internal/rangekey/rangekey.go, line 269 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
nit: This seems at odds with the format definition. Should it be
[(@t10=foo), (@t9=bar)]instead? Or is the format definition[(s1, v1), (s2, v2), (s3, v3)]
Thanks, I changed the format and forgot to update this.
8b9e0ed to
2d67350
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 8 of 8 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @sumeerbhola)
batch.go, line 669 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
We actually already document
Splitas splitting a MVCC key into a user key and an encoded timestamp:// Split returns the length of the prefix of the user key that corresponds to // the key portion of an MVCC encoding scheme to enable the use of prefix bloom // filters.In some of the back-and-forth on the RFC draft, we came to the conclusion that Pebble will likely increasingly depend on MVCC semantics of suffixes.
Gotcha. Makes sense.
internal/rangekey/rangekey.go, line 119 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
The
dataslice that the caller provides is unmodified, we just save a resliced tail of it to ourdatalocal variable.
Woops, my bad. Misinterpreted.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 2 of 15 files at r1, 3 of 8 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @jbowens and @sumeerbhola)
batch.go, line 376 at r3 (raw file):
b.countRangeDels++ case InternalKeyKindRangeKeySet, InternalKeyKindRangeKeyUnset, InternalKeyKindRangeKeyDelete: b.countRangeKeys++
do we need to set b.rangeKeys = nil?
batch.go, line 678 at r3 (raw file):
// WARNING: This is an experimental feature with limited functionality. func (b ExperimentalBatch) RangeKeySet(start, end, suffix, value []byte, _ *WriteOptions) error { suffixValues := [1]rangekey.SuffixValue{{Suffix: suffix, Value: value}}
where do we enforce that start, end are prefix keys?
2d67350 to
433c422
Compare
Add key kinds for range key sets, unsets and deletes. Add corresponding methods to the Batch, behind an experimental type. With this commit, these keys may be encoded into a batch but are discarded when committed.
433c422 to
e2cab32
Compare
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 10 of 15 files reviewed, 6 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
batch.go, line 376 at r3 (raw file):
Previously, sumeerbhola wrote…
do we need to set
b.rangeKeys = nil?
It'll get set a little below here where we handle indexed batches.
batch.go, line 678 at r3 (raw file):
Previously, sumeerbhola wrote…
where do we enforce that start, end are prefix keys?
currently nowhere. doing it here would require all batches, including non-indexed batches, to have a split function. we currently rely on the Batch{} zero value being valid, so I think we should defer it to batch application. I added a TODO there, and a check to ensure that split is defined if the batch contains any range keys.
jbowens
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status: 10 of 15 files reviewed, 6 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
Add key kinds for range key sets, unsets and deletes. Add corresponding methods
to the Batch, behind an experimental type. With this commit, these keys may be
encoded into a batch but are discarded when committed.