Skip to content

sstable: add RangeKeyIter iterator#1513

Closed
jbowens wants to merge 1 commit intocockroachdb:masterfrom
jbowens:rk-reader-external
Closed

sstable: add RangeKeyIter iterator#1513
jbowens wants to merge 1 commit intocockroachdb:masterfrom
jbowens:rk-reader-external

Conversation

@jbowens
Copy link
Copy Markdown
Contributor

@jbowens jbowens commented Feb 15, 2022

Add a range key iterator type intended to be used externally for reading the
range-keys contained within an sstable. The interface is simple, providing
forward iteration from beginning to end only. Each individual range key is
surfaced separately, even if coalesced into single physical fragments.

Add a range key iterator type intended to be used externally for reading the
range-keys contained within an sstable. The interface is simple, providing
forward iteration from beginning to end only. Each individual range key is
surfaced separately, even if coalesced into single physical fragments.
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikgrinaker A simple forward iteration interface is enough, right?

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @erikgrinaker, @nicktrav, and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple forward iteration interface is enough, right?

Yep, that'll do, thanks! Would it be easy to allow lower/upper key bounds? Otherwise, it's trivial to just do the filtering in the MVCC range key iterator too.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @nicktrav and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we'd ideally want the CRDB SSTIterator to satisfy the SimpleMVCCIterator interface, which uses interleaved point/range key iteration. This is based on sstable.Reader.NewIter(). Would it be possible to reuse the interleaving logic that we get from pebble.Iterator?

If not, nbd -- I can create separate SST iterator implementations for point and range keys, which covers my needs for now, and we can revisit this later.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @nicktrav and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. :lgtm:

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we'd ideally want the CRDB SSTIterator to satisfy the SimpleMVCCIterator interface, which uses interleaved point/range key iteration.

Where is this sst originating from? We can't defragment across ssts in this context, so we would need to expose range keys that are not prefix keys. I'm worried about polluting the SimpleMVCCIterator interface -- I thought we had managed to come up with a scheme such that our common iterator interfaces in the storage package would expose [start, end) pairs that were roachpb.Keys.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)


sstable/external.go, line 100 at r1 (raw file):

	endKey, value, ok := rangekey.DecodeEndKey(i.iterKey.Kind(), v)
	if !ok {
		panic("pebble: unable to decode rangekey end")

I believe this can happen due to a corruption, in which case we should not panic.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this sst originating from? We can't defragment across ssts in this context, so we would need to expose range keys that are not prefix keys. I'm worried about polluting the SimpleMVCCIterator interface -- I thought we had managed to come up with a scheme such that our common iterator interfaces in the storage package would expose [start, end) pairs that were roachpb.Keys.

These will e.g. be emitted by Export or ingested by AddSSTable, so they're generated by client and server logic. Consider e.g. streaming replication, which will reads events off of a rangefeed, build SSTs which include range tombstones, and ingest these into the target cluster. Requiring range keys to have roachpb.Key bounds seems perfectly fine for these, and we can assert that in AddSSTable during race test builds (or maybe always, if it's cheap enough).

The iteration is needed e.g. by backup restoration reading exported SSTs from an S3 bucket somewhere, combining them to figure out the latest visible point keys, then building an SST containing the final visible point keys to be ingested and restored. It's also needed for tests.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)

@jbowens
Copy link
Copy Markdown
Contributor Author

jbowens commented Feb 23, 2022

Closing in favor of #1529.

@jbowens jbowens closed this Feb 23, 2022
@jbowens jbowens deleted the rk-reader-external branch September 13, 2022 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants