storage: add initial experimental MVCC range tombstone primitives#76090
storage: add initial experimental MVCC range tombstone primitives#76090erikgrinaker wants to merge 1 commit intocockroachdb:masterfrom
Conversation
3330cb9 to
7c5c746
Compare
This patch adds initial experimental primitives for MVCC range tombstones, based on experimental Pebble range keys: * Data structures for in-memory and on-disk representation: * `storage.MVCCRangeKey` * `enginepb.MVCCRangeValue` * `enginepb.MVCCRangeTombstone` * Functions for writing and clearing range tombstones: * `Engine.ExperimentalClearMVCCRangeTombstone()` * `Engine.ExperimentalDeleteMVCCRange()` * `storage.ExperimentalMVCCDeleteRangeUsingTombstone()` * Iterator and function for reading range tombstones: * `Engine.NewMVCCRangeTombstoneIterator()` * `storage.MVCCRangeTombstoneIterator` * `storage.ScanMVCCRangeTombstones()` Range tombstones do not have a distinct identity, and should instead be considered a tombstone continuum: they will merge with abutting tombstones, can be partially cleared, can split or merge along with ranges, and so on. Bounded scans will truncate them to the scan bounds. Range tombstones are not yet handled in the rest of the MVCC API, nor are they exposed via KV APIs. They are not persisted to disk either, due to Pebble range key limitations. Subsequent pull requests will extend their functionality and integrate them with other components. Release note: None
7c5c746 to
3c5dbc3
Compare
|
I was hoping we could use point tombstone synthesis to integrate with However, while prototyping backup support for range tombstones, it's clear that I would essentially be reimplementing the combined point/range key iteration mode in Pebble, which seems unnecessary. It would be useful to integrate range tombstones into It might be better to just expose range keys via type SimpleMVCCIterator struct {
// ...
HasPointAndRangeKey() (bool, bool)
RangeBounds() (roachpb.Key, roachpb.Key)
RangeTombstones() []hlc.Timestamp
}I'm not sure if we should use range tombstones as a fundamental primitive, or simply expose arbitrary range values. This also extends to e.g. |
I implemented this alternative API in #76131, and exposed range keys as a first-class primitive. I think I prefer that approach. WDYT? |
|
Looks like we're going with #76131, closing this. |
76131: storage: add experimental MVCC range tombstone primitives r=jbowens,sumeerbhola,nicktrav a=erikgrinaker This is an alternative to #76090. Rather than hiding Pebble range keys inside the MVCC API, this exposes them as a first-class primitive for internal use. This allows callers to take advantage of Pebble range key functionality such as combined point/range key iteration and range key masking, which is useful e.g. for `IncrementalMVCCIterator` and `SSTIterator` (used by backups and rangefeed catchup scans). It will also allow the upcoming point synthesizing iterator to just be another `MVCCIterator` implementation that can be plugged in as needed, instead of sitting below `pebbleIterator`. For consistency with point keys, this also stores range tombstones as a range key with a `nil` value instead of a mostly-useless Protobuf that only adds overhead. --- This patch adds initial experimental primitives for MVCC range tombstones and the range keys they build on, based on experimental Pebble range keys, * Data structures: * `MVCCRangeKey` * `MVCCRangeKeyValue` * `nil` value for range tombstones (as with point tombstones) * Engine support for reading, writing, and clearing range keys: * `Engine.ExperimentalClearMVCCRangeKey()` * `Engine.ExperimentalPutMVCCRangeKey()` * `SimpleMVCCIterator.HasPointAndRange()` * `SimpleMVCCIterator.RangeBounds()` * `SimpleMVCCIterator.RangeKeys()` * `MVCCRangeKeyIterator` * MVCC function for writing a range tombstones: * `ExperimentalMVCCDeleteRangeUsingTombstone()` Range tombstones do not have a distinct identity, and should instead be considered a tombstone continuum: they will merge with abutting tombstones, can be partially cleared, can split or merge along with ranges, and so on. Bounded scans will truncate them to the scan bounds. The generalized range keys that range tombstones build on are also exposed via the `Engine` API. This is primarily for internal MVCC use. Exposing this in terms of range key/value pairs rather than range tombstones allows for additional use-cases such as ranged intents. Range tombstones are not yet handled in the rest of the MVCC or KV API, nor are they persisted to disk. Subsequent pull requests will extend their functionality and integrate them with other components. Touches #70412. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
There is an alternative API in #76131, which I think I prefer. This exposes MVCC range keys as a first-class primitive, which is useful e.g. for backup/restore, rangefeed catchup scans, and other components that need to access range tombstones as distinct entities -- these can then take advantage of Pebble facilities such as combined iteration and range key masking.
This patch adds initial experimental primitives for MVCC range
tombstones, based on experimental Pebble range keys:
Data structures for in-memory and on-disk representation:
storage.MVCCRangeKeyenginepb.MVCCRangeValueenginepb.MVCCRangeTombstoneFunctions for writing and clearing range tombstones:
Engine.ExperimentalClearMVCCRangeTombstone()Engine.ExperimentalDeleteMVCCRange()storage.ExperimentalMVCCDeleteRangeUsingTombstone()Iterator and function for reading range tombstones:
Engine.NewMVCCRangeTombstoneIterator()storage.MVCCRangeTombstoneIteratorstorage.ScanMVCCRangeTombstones()Range tombstones do not have a distinct identity, and should instead be
considered a tombstone continuum: they will merge with abutting
tombstones, can be partially cleared, can split or merge along with
ranges, and so on. Bounded scans will truncate them to the scan bounds.
Range tombstones are not yet handled in the rest of the MVCC API, nor
are they exposed via KV APIs. They are not persisted to disk either, due
to Pebble range key limitations. Subsequent pull requests will extend
their functionality and integrate them with other components.
Touches #70412.
Release note: None
For high-level summary, see internal draft tech note.