sstable: add (*sstable.Writer).RangeKey{Set,Unset,Delete} methods#1445
sstable: add (*sstable.Writer).RangeKey{Set,Unset,Delete} methods#1445nicktrav merged 1 commit intocockroachdb:masterfrom
Conversation
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions
internal/rangekey/rangekey.go, line 73 at r1 (raw file):
// SuffixValues, when encoded for a RangeKeySet. It may be used to construct a // buffer of the appropriate size before encoding. func EncodedSetSuffixValuesLen(suffixValues []SuffixValue) int {
I found myself doing a lot of gymnastics with just the suffix values (i.e. the value without the end key encoded). I split these functions out.
sstable/writer.go, line 398 at r1 (raw file):
// not required to be fragmented. func (w *Writer) RangeKeySet(start, end, suffix, value []byte) error { // FIXME(travers): Should the API allow for adding multiple (suffix, value)
Open question: is this sufficient? Or should we support allowing the caller to pass in multiple suffix / value pairs. Same for RangeKeyUnset.
sstable/writer.go, line 401 at r1 (raw file):
// pairs at the same time? startKey := base.MakeInternalKey(start, 0, base.InternalKeyKindRangeKeySet) return w.addRangeKeySpan(startKey, end, suffix, value)
Open question: there's some subtlety in tracking the lifetimes of the byte slices that are passed to these methods. As we only perform a copy when the fragmented span is coalesced and finalized, it's possible that the caller could alter the underlying data before we add to the block. Do we want to preemptively copy?
jbowens
left a comment
There was a problem hiding this comment.
Looks great!
Reviewed 2 of 6 files at r1.
Reviewable status: 2 of 6 files reviewed, 11 unresolved discussions (waiting on @nicktrav and @sumeerbhola)
internal/rangekey/rangekey.go, line 180 at r1 (raw file):
} // EncodeUnsetSuffixValues encodes a slice of SuffixValues for a RangeKeyUnset
nit: s/slice of SuffixValues/slice of suffixes/
internal/rangekey/rangekey.go, line 182 at r1 (raw file):
// EncodeUnsetSuffixValues encodes a slice of SuffixValues for a RangeKeyUnset // into dst. The length of dst must be greater than or equal to // EncodedSetSuffixValuesLen. EncodeSetSuffixValues returns the number of bytes
nit: s/EncodeSetSuffixValues/EncodeUnsetSuffixes/
internal/rangekey/rangekey.go, line 185 at r1 (raw file):
// written, which should always equal the EncodedSetValueLen with the same // arguments. func EncodeUnsetSuffixValues(dst []byte, suffixes [][]byte) int {
s/EncodeUnsetSuffixValues/EncodeUnsetSuffixes/
sstable/writer.go, line 398 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Open question: is this sufficient? Or should we support allowing the caller to pass in multiple suffix / value pairs. Same for
RangeKeyUnset.
I think this is sufficient. 👍
sstable/writer.go, line 401 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Open question: there's some subtlety in tracking the lifetimes of the byte slices that are passed to these methods. As we only perform a
copywhen the fragmented span is coalesced and finalized, it's possible that the caller could alter the underlying data before we add to the block. Do we want to preemptively copy?
I vote preemptively copy to err towards guarding against misuse, since we expect range keys to be rare in CockroachDB. We can walk that decision back if we need to during performance testing.
We could share a bytes.Buffer for all range key buffered data (including the encoded suffix-values tuples) to reduce the allocations..
sstable/writer.go, line 451 at r1 (raw file):
case base.InternalKeyKindRangeKeyUnset: suffixes := [][]byte{suffix} buf := make([]byte, rangekey.EncodedUnsetSuffixValuesLen(suffixes))
I think it might be worthwhile to use a shared bytes.Buffer on the Writer for these encoded Span.Values to reduce allocations.
sstable/writer.go, line 468 at r1 (raw file):
// only flush when we've moved past the previous key. if w.compare(span.Start.UserKey, w.prevRangeKeyUserKey) > 0 { w.fragmenter.TruncateAndFlushTo(w.prevRangeKeyUserKey)
Is this necessary? I think Fragmenter.Add will flush if necessary. I think we can Add each key span and call Finish when finishing the sstable, and that should be enough to ensure everything is flushed.
sstable/writer.go, line 496 at r1 (raw file):
} func (w *Writer) addCoalescedSpan(s rangekey.CoalescedSpan) {
Can you add a comment here stating that this method is used as the emit function of the Coalescer?
sstable/testdata/writer_range_keys, line 1 at r1 (raw file):
# NOTE: The operations SET, UNSET, and DEL in this test file are aliases for
These test cases are great (and so easy to read!)
sstable/testdata/writer_range_keys, line 85 at r1 (raw file):
d.RANGEKEYSET.0: e [(@9=v9)] d.RANGEKEYDEL.0: e f.RANGEKEYSET.0: i [(@4=v4),(@5=v5),(@7=v7),(@8=v8),(@9=v9)]
nit: The ordering of these suffix-values is opposite, because the test uses base.DefaultComparer which compares suffixes as byte slices. I think it'd be worthwhile to use testkeys.Comparer and reverse the order here to avoid confusion.
41e32ac to
2bf7d93
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Dismissed @jbowens from a discussion.
Reviewable status: 1 of 6 files reviewed, 4 unresolved discussions (waiting on @jbowens and @sumeerbhola)
internal/rangekey/rangekey.go, line 185 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
s/EncodeUnsetSuffixValues/EncodeUnsetSuffixes/
Done.
sstable/writer.go, line 398 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think this is sufficient. 👍
👍
sstable/writer.go, line 401 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I vote preemptively copy to err towards guarding against misuse, since we expect range keys to be rare in CockroachDB. We can walk that decision back if we need to during performance testing.
We could share a
bytes.Bufferfor all range key buffered data (including the encoded suffix-values tuples) to reduce the allocations..
Done. I added (*Writer).tempRangeKeyBuf, which is similar to how we do things over in the block property collector.
sstable/writer.go, line 451 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think it might be worthwhile to use a shared
bytes.Bufferon theWriterfor these encodedSpan.Values to reduce allocations.
Same as above. Will re-use the same mechanism for buffering.
sstable/writer.go, line 468 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is this necessary? I think
Fragmenter.Addwill flush if necessary. I think we canAddeach key span and callFinishwhen finishing the sstable, and that should be enough to ensure everything is flushed.
Removed.
sstable/writer.go, line 496 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Can you add a comment here stating that this method is used as the emit function of the
Coalescer?
Done.
sstable/testdata/writer_range_keys, line 1 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
These test cases are great (and so easy to read!)
👍
sstable/testdata/writer_range_keys, line 85 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: The ordering of these suffix-values is opposite, because the test uses
base.DefaultComparerwhich compares suffixes as byte slices. I think it'd be worthwhile to usetestkeys.Comparerand reverse the order here to avoid confusion.
Fixed.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 4 of 5 files at r2.
Reviewable status: 5 of 6 files reviewed, 7 unresolved discussions (waiting on @jbowens, @nicktrav, and @sumeerbhola)
sstable/writer.go, line 442 at r2 (raw file):
// caller to avoid re-use of buffers until the Writer is closed. endCopy := w.tempRangeKeyBuf(len(end)) copy(endCopy, end)
where are we copying start? I am assuming that like the other slices it can be mutated by the caller when this method returns.
sstable/writer.go, line 497 at r2 (raw file):
var unsets [][]byte for _, item := range s.Items { sv := rangekey.SuffixValue{Suffix: item.Suffix}
nit: initializing sv here adds a level of indirection that is unnecessary for the if-block.
how about
if item.Unset {
unsets = append(unsets, item.Suffix)
} else {
sets = append(sets, rangekey.SuffixValue{Suffix: item.Suffix, Value: item.Value)
}
sstable/writer_rangekey_test.go, line 50 at r2 (raw file):
start, end := startEndSplit[0], startEndSplit[1] switch kind {
Consider scrambling the bytes in the byte slices passed as parameters after the return from the method. We've had hard to track down bugs in the past.
2bf7d93 to
8fc3e51
Compare
Users of `sstable.Writer` other than Pebble itself (e.g. Cockroach)
require a means of adding range keys to an sstable that is slightly more
flexible than the existing `AddRangeKey` method in the way in which key
spans can be added to the table.
Specifically, `AddRangeKey` requires that spans be sorted (start / end
key ascending, sequence number and key kind descending), in addition to
fragmented and coalesced (i.e. multiple suffixes within the same span
are coalesced into a single value).
Add the `RangeKey{Set,Unset,Delete}` methods on the `sstable.Writer`
that allow adding spans in order of start key. These methods have no
requirement that the spans be fragmented or coalesced. Instead, the
implementation handles the fragmenting of spans as keys are added. A
`keyspan.Fragmenter` emits completed spans into a `rangekey.coalescer`,
which aggregates the fragments for the same span into a
`rangekey.CoalescedSpan`, which can then be used to write the individual
`RangeKey{Set,Unset,Delete}` key / value pairs into the rang key block
in the sstable.
The new `RangeKey{Set,Unset,Delete}` methods have similar semantics to
the existing `Set,Delete,DeleteRange,Merge` methods that write the key /
value pairs at sequence number zero. In the range key case, these new
methods are intended to be used by external callers when preparing
sstables for ingestion.
Add a data-driven test specific to the new range key methods. These
tests are separate to the existing data driven test cases for the
`AddRangeKey` method, which has stricter requirements on how keys are
added (i.e. already ordered, fragmented and coalesced).
Rename `AddInternalRangeKey` to `AddRangeKey` to better match the
existing naming of `(*sstable.Writer).Add`.
Add new utility methods to the `rangekey` package to aid in encoding
range key values.
Related to cockroachdb#1339.
8fc3e51 to
f2b06ed
Compare
nicktrav
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 6 files reviewed, 6 unresolved discussions (waiting on @jbowens and @sumeerbhola)
sstable/writer.go, line 442 at r2 (raw file):
Previously, sumeerbhola wrote…
where are we copying
start? I am assuming that like the other slices it can be mutated by the caller when this method returns.
Added.
sstable/writer.go, line 497 at r2 (raw file):
Previously, sumeerbhola wrote…
nit: initializing
svhere adds a level of indirection that is unnecessary for the if-block.
how aboutif item.Unset { unsets = append(unsets, item.Suffix) } else { sets = append(sets, rangekey.SuffixValue{Suffix: item.Suffix, Value: item.Value) }
Done.
sstable/writer_rangekey_test.go, line 50 at r2 (raw file):
Previously, sumeerbhola wrote…
Consider scrambling the bytes in the byte slices passed as parameters after the return from the method. We've had hard to track down bugs in the past.
Good point. Done.
|
cc: @erikgrinaker - FYI. Feel free to take this for a spin. There might be some subtleties we're not aware of that you may be able point out. |
|
TFTRs! |
|
Thanks @nicktrav, will do! |
Users of
sstable.Writerother than Pebble itself (e.g. Cockroach)require a means of adding range keys to an sstable that is slightly more
flexible than the existing
AddRangeKeymethod in the way in which keyspans can be added to the table.
Specifically,
AddRangeKeyrequires that spans be sorted (start / endkey ascending, sequence number and key kind descending), in addition to
fragmented and coalesced (i.e. multiple suffixes within the same span
are coalesced into a single value).
Add the
RangeKey{Set,Unset,Delete}methods on thesstable.Writerthat allow adding spans in order of start key. These methods have no
requirement that the spans be fragmented or coalesced. Instead, the
implementation handles the fragmenting of spans as keys are added. A
keyspan.Fragmenteremits completed spans into arangekey.coalescer,which aggregates the fragments for the same span into a
rangekey.CoalescedSpan, which can then be used to write the individualRangeKey{Set,Unset,Delete}key / value pairs into the rang key blockin the sstable.
The new
RangeKey{Set,Unset,Delete}methods have similar semantics tothe existing
Set,Delete,DeleteRange,Mergemethods that write the key /value pairs at sequence number zero. In the range key case, these new
methods are intended to be used by external callers when preparing
sstables for ingestion.
Add a data-driven test specific to the new range key methods. These
tests are separate to the existing data driven test cases for the
AddRangeKeymethod, which has stricter requirements on how keys areadded (i.e. already ordered, fragmented and coalesced).
Rename
AddInternalRangeKeytoAddRangeKeyto better match theexisting naming of
(*sstable.Writer).Add.Add new utility methods to the
rangekeypackage to aid in encodingrange key values.
Related to #1339.
Some open questions: