Skip to content

storage: add initial experimental MVCC range tombstone primitives#76090

Closed
erikgrinaker wants to merge 1 commit intocockroachdb:masterfrom
erikgrinaker:mvcc-range-tombstones
Closed

storage: add initial experimental MVCC range tombstone primitives#76090
erikgrinaker wants to merge 1 commit intocockroachdb:masterfrom
erikgrinaker:mvcc-range-tombstones

Conversation

@erikgrinaker
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker commented Feb 4, 2022

There is an alternative API in #76131, which I think I prefer. This exposes MVCC range keys as a first-class primitive, which is useful e.g. for backup/restore, rangefeed catchup scans, and other components that need to access range tombstones as distinct entities -- these can then take advantage of Pebble facilities such as combined iteration and range key masking.


This patch adds initial experimental primitives for MVCC range
tombstones, based on experimental Pebble range keys:

  • Data structures for in-memory and on-disk representation:

    • storage.MVCCRangeKey
    • enginepb.MVCCRangeValue
    • enginepb.MVCCRangeTombstone
  • Functions for writing and clearing range tombstones:

    • Engine.ExperimentalClearMVCCRangeTombstone()
    • Engine.ExperimentalDeleteMVCCRange()
    • storage.ExperimentalMVCCDeleteRangeUsingTombstone()
  • Iterator and function for reading range tombstones:

    • Engine.NewMVCCRangeTombstoneIterator()
    • storage.MVCCRangeTombstoneIterator
    • storage.ScanMVCCRangeTombstones()

Range tombstones do not have a distinct identity, and should instead be
considered a tombstone continuum: they will merge with abutting
tombstones, can be partially cleared, can split or merge along with
ranges, and so on. Bounded scans will truncate them to the scan bounds.

Range tombstones are not yet handled in the rest of the MVCC API, nor
are they exposed via KV APIs. They are not persisted to disk either, due
to Pebble range key limitations. Subsequent pull requests will extend
their functionality and integrate them with other components.

Touches #70412.

Release note: None


For high-level summary, see internal draft tech note.

@erikgrinaker erikgrinaker self-assigned this Feb 4, 2022
@erikgrinaker erikgrinaker requested review from a team as code owners February 4, 2022 22:10
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones branch 3 times, most recently from 3330cb9 to 7c5c746 Compare February 5, 2022 15:00
This patch adds initial experimental primitives for MVCC range
tombstones, based on experimental Pebble range keys:

* Data structures for in-memory and on-disk representation:
  * `storage.MVCCRangeKey`
  * `enginepb.MVCCRangeValue`
  * `enginepb.MVCCRangeTombstone`

* Functions for writing and clearing range tombstones:
  * `Engine.ExperimentalClearMVCCRangeTombstone()`
  * `Engine.ExperimentalDeleteMVCCRange()`
  * `storage.ExperimentalMVCCDeleteRangeUsingTombstone()`

* Iterator and function for reading range tombstones:
  * `Engine.NewMVCCRangeTombstoneIterator()`
  * `storage.MVCCRangeTombstoneIterator`
  * `storage.ScanMVCCRangeTombstones()`

Range tombstones do not have a distinct identity, and should instead be
considered a tombstone continuum: they will merge with abutting
tombstones, can be partially cleared, can split or merge along with
ranges, and so on. Bounded scans will truncate them to the scan bounds.

Range tombstones are not yet handled in the rest of the MVCC API, nor
are they exposed via KV APIs. They are not persisted to disk either, due
to Pebble range key limitations. Subsequent pull requests will extend
their functionality and integrate them with other components.

Release note: None
@erikgrinaker erikgrinaker force-pushed the mvcc-range-tombstones branch from 7c5c746 to 3c5dbc3 Compare February 6, 2022 11:22
@erikgrinaker
Copy link
Copy Markdown
Contributor Author

I was hoping we could use point tombstone synthesis to integrate with MVCCIterator, and keep MVCCRangeTombstoneIterator off to the side for anything that needed to know about range tombstones as a distinct entity.

However, while prototyping backup support for range tombstones, it's clear that I would essentially be reimplementing the combined point/range key iteration mode in Pebble, which seems unnecessary. It would be useful to integrate range tombstones into IncrementalMVCCIterator, also used by e.g. rangefeed catchup scans and read refreshes, and this operates over an MVCCIterator that currently doesn't know anything about range keys. Similarly, we're likely going to need range key support in SSTIterator, which implements SimpleMVCCIterator as well.

It might be better to just expose range keys via SimpleMVCCIterator, plumbing it through to Pebble. The MVCCRangeTombstoneIterator would then build on top of an SimpleMVCCIterator instead of being constructed via the Engine. Essentially:

type SimpleMVCCIterator struct {
	// ...
	HasPointAndRangeKey() (bool, bool)
	RangeBounds() (roachpb.Key, roachpb.Key)
	RangeTombstones() []hlc.Timestamp
}

I'm not sure if we should use range tombstones as a fundamental primitive, or simply expose arbitrary range values. This also extends to e.g. Engine.DeleteMVCCRange() vs Engine.MVCCRangeSet(). I'm leaning towards range tombstones, and we can change it later if necessary.

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

It might be better to just expose range keys via SimpleMVCCIterator, plumbing it through to Pebble. The MVCCRangeTombstoneIterator would then build on top of an SimpleMVCCIterator instead of being constructed via the Engine.

I implemented this alternative API in #76131, and exposed range keys as a first-class primitive. I think I prefer that approach. WDYT?

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

Looks like we're going with #76131, closing this.

craig bot pushed a commit that referenced this pull request Feb 21, 2022
76131: storage: add experimental MVCC range tombstone primitives r=jbowens,sumeerbhola,nicktrav a=erikgrinaker

This is an alternative to #76090. Rather than hiding Pebble range keys inside the MVCC API, this exposes them as a first-class primitive for internal use. This allows callers to take advantage of Pebble range key functionality such as combined point/range key iteration and range key masking, which is useful e.g. for `IncrementalMVCCIterator` and `SSTIterator` (used by backups and rangefeed catchup scans). It will also allow the upcoming point synthesizing iterator to just be another `MVCCIterator` implementation that can be plugged in as needed, instead of sitting below `pebbleIterator`.

For consistency with point keys, this also stores range tombstones as a range key with a `nil` value instead of a mostly-useless Protobuf that only adds overhead.

---

This patch adds initial experimental primitives for MVCC range
tombstones and the range keys they build on, based on experimental
Pebble range keys,

* Data structures:
  * `MVCCRangeKey`
  * `MVCCRangeKeyValue`
  * `nil` value for range tombstones (as with point tombstones)

* Engine support for reading, writing, and clearing range keys:
  * `Engine.ExperimentalClearMVCCRangeKey()`
  * `Engine.ExperimentalPutMVCCRangeKey()`
  * `SimpleMVCCIterator.HasPointAndRange()`
  * `SimpleMVCCIterator.RangeBounds()`
  * `SimpleMVCCIterator.RangeKeys()`
  * `MVCCRangeKeyIterator`

* MVCC function for writing a range tombstones:
  * `ExperimentalMVCCDeleteRangeUsingTombstone()`

Range tombstones do not have a distinct identity, and should instead be
considered a tombstone continuum: they will merge with abutting
tombstones, can be partially cleared, can split or merge along with
ranges, and so on. Bounded scans will truncate them to the scan bounds.

The generalized range keys that range tombstones build on are also
exposed via the `Engine` API. This is primarily for internal MVCC use.
Exposing this in terms of range key/value pairs rather than range
tombstones allows for additional use-cases such as ranged intents.

Range tombstones are not yet handled in the rest of the MVCC or KV API,
nor are they persisted to disk. Subsequent pull requests will extend
their functionality and integrate them with other components.

Touches #70412.

Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
@erikgrinaker erikgrinaker deleted the mvcc-range-tombstones branch March 4, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants