Skip to content

kvserver: add range keys to replica consistency checks#76880

Closed
erikgrinaker wants to merge 9 commits intocockroachdb:masterfrom
erikgrinaker:consistency-checker-range-keys
Closed

kvserver: add range keys to replica consistency checks#76880
erikgrinaker wants to merge 9 commits intocockroachdb:masterfrom
erikgrinaker:consistency-checker-range-keys

Conversation

@erikgrinaker
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker commented Feb 22, 2022

Only the last 3 commits should be reviewed here. The previous ones are from #76783 and #76860.


storage: export EncodeMVCCKeyPrefix and EncodeMVCCTimestampSuffix

These will be used e.g. for generating debug output for MVCC range keys.

Release note: None

storage: add MVCCKey.Compare()

Release note: None

kvserver: add range keys to replica consistency checks

This patch adds handling of range keys in replica consistency checks.
Range keys are defragmented and iterated over as part of MVCCStats
calculations, and then hashed similarly to point keys.

Range keys are experimental, and will only be written when
storage.CanUseExperimentalMVCCRangeTombstones is enabled (off by
default). This will only happen once all nodes have upgraded to the
version gate ExperimentalMVCCRangeTombstones, so there is no need for
a separate version gate here.

Release note: None

@erikgrinaker erikgrinaker self-assigned this Feb 22, 2022
@erikgrinaker erikgrinaker requested review from a team as code owners February 22, 2022 10:51
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if Oleg gave this a look as well.

}

// Encode the length of the start key, end key, and value.
binary.LittleEndian.PutUint64(intBuf[:], uint64(len(unsafeKey.StartKey)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we do that for point keys too, but could you remind me why?

Also, what does intBuf[:] do again? It seems identical to intBuf: https://go.dev/play/p/QjMUqx3Z8Ks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we do that for point keys too, but could you remind me why?

No idea. I'm guessing it was maybe lifted from somewhere that we do KV encoding that needs length prefixes for decoding, e.g. MVCCScanDecodeKeyValue.

I figured I'd just keep it consistent here, but happy to drop the length hashing too.

Also, what does intBuf[:] do again?

intBuf is an [8]byte array, PutUInt64() takes a []byte slice. [:] casts an array to a slice.

expect string
}{
// Start with a noop write to hash the initial state.
{"", "", 0, "", "cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e"},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this datadriven using echotest.Require:

// Require checks that the string matches what is found in the file located at
// the provided path. The file must follow the datadriven format:
//
// echo
// ----
// <output of exp>
//
// The contents of the file can be updated automatically using datadriven's
// -rewrite flag.
func Require(t *testing.T, act, path string) {

That way all you have to do is to print the checksums, and let datadriven handle the rest via -rewrite. This is relevant since this test will break every time anything changes about the initial state, which does happen occasionally.

This patch adds an iteration key type `IterKeyTypePointsWithRanges` for
`MVCCIterator` which iterates only over point keys, but surfaces range
keys overlapping those points.

Release note: None
This patch automatically clears all range keys (at any timestamp) in the
`Engine` methods `ClearMVCCRangeAndIntents` and `ClearIterRange`. Range
keys are not affected by `ClearRawRange` (which clears raw engine keys)
and `ClearMVCCRange` (which is intended for clearing a subset of
versions for a single key).

This is implemented via a new `Engine.ExperimentalClearMVCCRangeKeys`
method which calls through to Pebble's `RangeKeyDelete`, efficiently
deleting all range keys in a key span.

By extension, the `ClearRange` RPC method now also clears range keys,
both when using point deletions and range deletions.

Release note: None
This patch adds basic support for MVCC range tombstones in garbage
collection. It garbage collects points below a range tombstone, as well
as the range tombstones themselves. However, `MVCCStats` do not
currently take range tombstones into account for garbage statistics --
this will be addressed separately.

Garbage collection below range tombstones does not do anything fancy
like dropping a Pebble range tombstone when there are no newer versions
above the range tombstone -- it still uses point clears for every GCed
key. This can be optimized later.

Release note: None
This adds a parameter `ExperimentalRanges` to `GCRequest`, and a
corresponding `ExperimentalGCRanges` version gate, which allows GCing
large swathes of keys using Pebble range tombstones.

This parameter is not yet in use, but is added preemptively to allow
garbage collection to make use of it in the future when GCing below an
MVCC range tombstone with no keys above it. Since MVCC range tombstones
are experimental, this can possibly be added in a 22.1 patch release.

Release note: None
This patch split out `MVCCRangeKeyDefragmenter` from
`MVCCRangeKeyIterator`. Keeping defragmentation separate from the
iterator allows for ad hoc range key defragmentation in other contexts,
e.g. during combined point/range key iteration.

Release note: None
This patch adds `MVCCStats` tracking for range keys. It only considers
range tombstones for now, since the semantics of other range keys are
still unclear (e.g. how should overlapping range keys be interpreted?),
and will error if encountering non-tombstone range keys.

Two new fields are added to `MVCCStats`:

* `RangeKeyCount`: the number of individual defragmented logical range
  keys, regardless of any overlap with other range keys. Multiple versions
  count as separate range keys, even if they overlap exactly.

* `RangeKeyBytes`: the logical encoded byte size of all range keys,
  excluding value. This ignores fragmentation, and counts the key bounds
  separately for each version even if multiple range keys overlap exactly.
  Unlike point keys, which for historical reasons use a fixed-size
  timestamp contribution, this uses the actual variable-length timestamp
  encoding contribution.

`ComputeStatsForRange()` has been extended to calculate the above
quantities, and additionally account for range tombstones themselves in
`GCBytesAge` along with their effect on point keys.  However, these
statistics are not yet updated during range key mutations and GC, nor on
CRDB range splits and merges -- this will be addressed separately. All
relevant call sites have been updated to surface range keys for the MVCC
iterators passed to `ComputeStatsForRange()`.

Release note: None
These will be used e.g. for generating debug output for MVCC range keys.

Release note: None
@erikgrinaker erikgrinaker force-pushed the consistency-checker-range-keys branch from 6f6694a to b2aa422 Compare February 22, 2022 12:02
This patch adds handling of range keys in replica consistency checks.
Range keys are defragmented and iterated over as part of `MVCCStats`
calculations, and then hashed similarly to point keys.

Range keys are experimental, and will only be written when
`storage.CanUseExperimentalMVCCRangeTombstones` is enabled (off by
default). This will only happen once all nodes have upgraded to the
version gate `ExperimentalMVCCRangeTombstones`, so there is no need for
a separate version gate here.

Release note: None
@erikgrinaker erikgrinaker force-pushed the consistency-checker-range-keys branch from b2aa422 to 5518d64 Compare February 22, 2022 12:20
@erikgrinaker erikgrinaker marked this pull request as draft February 23, 2022 09:33
@erikgrinaker
Copy link
Copy Markdown
Contributor Author

Superseded by #78104.

@erikgrinaker erikgrinaker deleted the consistency-checker-range-keys branch June 4, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants