kvserver: add range keys to replica consistency checks#76880
kvserver: add range keys to replica consistency checks#76880erikgrinaker wants to merge 9 commits intocockroachdb:masterfrom
Conversation
tbg
left a comment
There was a problem hiding this comment.
Would be good if Oleg gave this a look as well.
| } | ||
|
|
||
| // Encode the length of the start key, end key, and value. | ||
| binary.LittleEndian.PutUint64(intBuf[:], uint64(len(unsafeKey.StartKey))) |
There was a problem hiding this comment.
I know we do that for point keys too, but could you remind me why?
Also, what does intBuf[:] do again? It seems identical to intBuf: https://go.dev/play/p/QjMUqx3Z8Ks
There was a problem hiding this comment.
I know we do that for point keys too, but could you remind me why?
No idea. I'm guessing it was maybe lifted from somewhere that we do KV encoding that needs length prefixes for decoding, e.g. MVCCScanDecodeKeyValue.
I figured I'd just keep it consistent here, but happy to drop the length hashing too.
Also, what does
intBuf[:]do again?
intBuf is an [8]byte array, PutUInt64() takes a []byte slice. [:] casts an array to a slice.
| expect string | ||
| }{ | ||
| // Start with a noop write to hash the initial state. | ||
| {"", "", 0, "", "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"}, |
There was a problem hiding this comment.
Consider making this datadriven using echotest.Require:
cockroach/pkg/testutils/echotest/echotest.go
Lines 19 to 28 in ad59351
That way all you have to do is to print the checksums, and let datadriven handle the rest via -rewrite. This is relevant since this test will break every time anything changes about the initial state, which does happen occasionally.
This patch adds an iteration key type `IterKeyTypePointsWithRanges` for `MVCCIterator` which iterates only over point keys, but surfaces range keys overlapping those points. Release note: None
This patch automatically clears all range keys (at any timestamp) in the `Engine` methods `ClearMVCCRangeAndIntents` and `ClearIterRange`. Range keys are not affected by `ClearRawRange` (which clears raw engine keys) and `ClearMVCCRange` (which is intended for clearing a subset of versions for a single key). This is implemented via a new `Engine.ExperimentalClearMVCCRangeKeys` method which calls through to Pebble's `RangeKeyDelete`, efficiently deleting all range keys in a key span. By extension, the `ClearRange` RPC method now also clears range keys, both when using point deletions and range deletions. Release note: None
This patch adds basic support for MVCC range tombstones in garbage collection. It garbage collects points below a range tombstone, as well as the range tombstones themselves. However, `MVCCStats` do not currently take range tombstones into account for garbage statistics -- this will be addressed separately. Garbage collection below range tombstones does not do anything fancy like dropping a Pebble range tombstone when there are no newer versions above the range tombstone -- it still uses point clears for every GCed key. This can be optimized later. Release note: None
This adds a parameter `ExperimentalRanges` to `GCRequest`, and a corresponding `ExperimentalGCRanges` version gate, which allows GCing large swathes of keys using Pebble range tombstones. This parameter is not yet in use, but is added preemptively to allow garbage collection to make use of it in the future when GCing below an MVCC range tombstone with no keys above it. Since MVCC range tombstones are experimental, this can possibly be added in a 22.1 patch release. Release note: None
This patch split out `MVCCRangeKeyDefragmenter` from `MVCCRangeKeyIterator`. Keeping defragmentation separate from the iterator allows for ad hoc range key defragmentation in other contexts, e.g. during combined point/range key iteration. Release note: None
This patch adds `MVCCStats` tracking for range keys. It only considers range tombstones for now, since the semantics of other range keys are still unclear (e.g. how should overlapping range keys be interpreted?), and will error if encountering non-tombstone range keys. Two new fields are added to `MVCCStats`: * `RangeKeyCount`: the number of individual defragmented logical range keys, regardless of any overlap with other range keys. Multiple versions count as separate range keys, even if they overlap exactly. * `RangeKeyBytes`: the logical encoded byte size of all range keys, excluding value. This ignores fragmentation, and counts the key bounds separately for each version even if multiple range keys overlap exactly. Unlike point keys, which for historical reasons use a fixed-size timestamp contribution, this uses the actual variable-length timestamp encoding contribution. `ComputeStatsForRange()` has been extended to calculate the above quantities, and additionally account for range tombstones themselves in `GCBytesAge` along with their effect on point keys. However, these statistics are not yet updated during range key mutations and GC, nor on CRDB range splits and merges -- this will be addressed separately. All relevant call sites have been updated to surface range keys for the MVCC iterators passed to `ComputeStatsForRange()`. Release note: None
These will be used e.g. for generating debug output for MVCC range keys. Release note: None
Release note: None
6f6694a to
b2aa422
Compare
This patch adds handling of range keys in replica consistency checks. Range keys are defragmented and iterated over as part of `MVCCStats` calculations, and then hashed similarly to point keys. Range keys are experimental, and will only be written when `storage.CanUseExperimentalMVCCRangeTombstones` is enabled (off by default). This will only happen once all nodes have upgraded to the version gate `ExperimentalMVCCRangeTombstones`, so there is no need for a separate version gate here. Release note: None
b2aa422 to
5518d64
Compare
|
Superseded by #78104. |
Only the last 3 commits should be reviewed here. The previous ones are from #76783 and #76860.
storage: export
EncodeMVCCKeyPrefixandEncodeMVCCTimestampSuffixThese will be used e.g. for generating debug output for MVCC range keys.
Release note: None
storage: add
MVCCKey.Compare()Release note: None
kvserver: add range keys to replica consistency checks
This patch adds handling of range keys in replica consistency checks.
Range keys are defragmented and iterated over as part of
MVCCStatscalculations, and then hashed similarly to point keys.
Range keys are experimental, and will only be written when
storage.CanUseExperimentalMVCCRangeTombstonesis enabled (off bydefault). This will only happen once all nodes have upgraded to the
version gate
ExperimentalMVCCRangeTombstones, so there is no need fora separate version gate here.
Release note: None