Skip to content

*: Expose range key iterators through table cache#1517

Merged
itsbilal merged 1 commit intocockroachdb:masterfrom
itsbilal:table-cache-rangekeys
Feb 24, 2022
Merged

*: Expose range key iterators through table cache#1517
itsbilal merged 1 commit intocockroachdb:masterfrom
itsbilal:table-cache-rangekeys

Conversation

@itsbilal
Copy link
Copy Markdown
Contributor

Currently, we instantiate new per-table iterators through a
tableNewIters function that is defined on the tableCacheContainer
and has a reference in the db. This function returns a point key
and a rangedel iterator. Having the same function return a range
key iterator would be wasteful as we are likely to have
point and range key iterators open on different files at once;
they will be merged at the high level iterator and the range key
iterators will live in their own levelIter.

This change adds a newRangeKeyIter function to the table cache
that creates this new rangeKeyIter using the Reader managed
by the table cache.

@itsbilal itsbilal requested review from jbowens and nicktrav February 16, 2022 21:29
@itsbilal itsbilal self-assigned this Feb 16, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@itsbilal itsbilal force-pushed the table-cache-rangekeys branch from c739fbf to 41957a7 Compare February 16, 2022 21:48
Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I forgot about the block property filter complication. Pinging @nicktrav because he might remember the options we were considering before.

Reviewed 4 of 7 files at r1, all commit messages.
Reviewable status: 4 of 7 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @nicktrav)


db.go, line 264 at r1 (raw file):

	tableCache           *tableCacheContainer
	newIters             tableNewIters
	tableNewRangeKeyIter tableNewRangeKeyIter

nit: call the field newRangeKeyIter to match the naming of newIters/tableNewIters?


level_iter.go, line 26 at r1 (raw file):

// tableNewRangeKeyIter creates a new range key iterator for the given file
// number.
type tableNewRangeKeyIter func(file *manifest.FileMetadata, opts *IterOptions) (internalIterator, error)

Rather than internalIterator, I think we want the return type to be new keyspan.FragmentIterator. It's for now a superset of the internalIterator interface, although we intend to eventually disentangle them.

Also, I think we want to move this type definition into internal/rangekey. I'm hoping that we'll also be able to stick the range-key level iterator in that package as well, to keep the range-key iterator stack organized together.


table_cache.go, line 409 at r1 (raw file):

	}

	_, empty, err := c.checkAndIntersectFilters(v, opts)

I think we'll need to tweak something here, because I believe we're collecting different block properties for range keys and point keys, but we currently only have the one shared IterOptions.BlockPropertyFilters.

@nicktrav I completely forgot what we discussed about range key iterators and configuring separate filters for them. Did we have a plan here?

Copy link
Copy Markdown
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)


table_cache.go, line 409 at r1 (raw file):

Did we have a plan here?

We'd talked about this over here. Specifically:

It would be configured with IterOptions.{PointFilters,RangeKeyFilters}.

The general idea was that filters for points and ranges would be set separately via separate fields onIterOptions and you'd use the appropriate slice depending on the situation.

Copy link
Copy Markdown
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @itsbilal and @jbowens)


table_cache.go, line 308 at r1 (raw file):

func (c *tableCacheShard) checkAndIntersectFilters(
	v *tableCacheValue, opts *IterOptions,
) (filterer *sstable.BlockPropertiesFilterer, empty bool, err error) {

Could we adopt ok-like semantics and return true if the filter intersects, and false if it doesn't? e.g. rename empty to intersects and flip the return values.


table_cache.go, line 316 at r1 (raw file):

	var bpfs []BlockPropertyFilter
	if opts != nil {

Could we return early ifopts == nil at the top of the method instead? Saves checking twice, and bpfs := opts.BlockPropertyFilters will be safe to call.

@itsbilal itsbilal force-pushed the table-cache-rangekeys branch from 41957a7 to 30781f6 Compare February 23, 2022 16:01
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 2 of 9 files reviewed, 2 unresolved discussions (waiting on @itsbilal, @jbowens, and @nicktrav)


db.go, line 264 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: call the field newRangeKeyIter to match the naming of newIters/tableNewIters?

That's what I was doing initially, but the issue is that db already has a newRangeKeyIter method that creates a merged rangeKeyIter. Since this function is to be used for creating table-specific iterators, I went with prefixing it with table, even though it's slightly inconsistent with newIters.


level_iter.go, line 26 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Rather than internalIterator, I think we want the return type to be new keyspan.FragmentIterator. It's for now a superset of the internalIterator interface, although we intend to eventually disentangle them.

Also, I think we want to move this type definition into internal/rangekey. I'm hoping that we'll also be able to stick the range-key level iterator in that package as well, to keep the range-key iterator stack organized together.

Done - at least about the FragmentIterator. I ended up not moving the definition to internal/RangeKey as untangling the dependency on IterOptions and particularly BlockPropertiesFilter was pretty unintuitive and dirty, so it feels cleaner this way.


table_cache.go, line 308 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Could we adopt ok-like semantics and return true if the filter intersects, and false if it doesn't? e.g. rename empty to intersects and flip the return values.

Done. Good idea!


table_cache.go, line 409 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Did we have a plan here?

We'd talked about this over here. Specifically:

It would be configured with IterOptions.{PointFilters,RangeKeyFilters}.

The general idea was that filters for points and ranges would be set separately via separate fields onIterOptions and you'd use the appropriate slice depending on the situation.

Done. Incorporated the change to add those fields to IterOptions too. Thanks!

Copy link
Copy Markdown
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)


options.go, line 97 at r2 (raw file):

	TableFilter func(userProps map[string]string) bool
	// PointKeyFilters can be used to avoid scanning tables and blocks in tables
	// when iterating over point keys. It is requires that this slice is sorted in

nit: it is required

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @itsbilal)


level_iter.go, line 26 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done - at least about the FragmentIterator. I ended up not moving the definition to internal/RangeKey as untangling the dependency on IterOptions and particularly BlockPropertiesFilter was pretty unintuitive and dirty, so it feels cleaner this way.

Dang, yeah, I forgot about the dependency from the sstable package on internal/rangekey.


sstable/reader.go, line 107 at r2 (raw file):

// support to raw block iterators. For use when the only part of the table
// we want to iterate over is contained in one block, eg. the range keys block.
type wrappedBlockIter struct {

#1529 reworked the range key iterator to have a separate type fragmentBlockIter, which does its own wrapping of the blockIter. I think we can move the SetCloseHook method onto that new type now.


sstable/reader.go, line 120 at r2 (raw file):

	var err error
	if w.closeHook != nil {
		err = w.closeHook(w)

Soon keyspan.FragmentIterator will not be able to satisfy the sstable.Iterator interface, so we won't be able to pass the range key iterator into a func(Iterator)error. Can we change the SetCloseHook interface to take a func(FragmentIterator) error instead?

Currently, we instantiate new per-table iterators through a
`tableNewIters` function that is defined on the tableCacheContainer
and has a reference in the db. This function returns a point key
and a rangedel iterator. Having the same function return a range
key iterator would be wasteful as we are likely to have
point and range key iterators open on _different_ files at once;
they will be merged at the high level iterator and the range key
iterators will live in their own levelIter.

This change adds a newRangeKeyIter function to the table cache
that creates this new rangeKeyIter using the Reader managed
by the table cache.

Also adds a RangeKeyFilters field to IterOptions to specify
different block-level filters for range key iteration. Renames
BlockFilters to PointKeyFilters.
@itsbilal itsbilal force-pushed the table-cache-rangekeys branch from 30781f6 to 319b28b Compare February 24, 2022 15:37
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @itsbilal and @jbowens)


sstable/reader.go, line 107 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

#1529 reworked the range key iterator to have a separate type fragmentBlockIter, which does its own wrapping of the blockIter. I think we can move the SetCloseHook method onto that new type now.

Done.


sstable/reader.go, line 120 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Soon keyspan.FragmentIterator will not be able to satisfy the sstable.Iterator interface, so we won't be able to pass the range key iterator into a func(Iterator)error. Can we change the SetCloseHook interface to take a func(FragmentIterator) error instead?

Done.

@itsbilal itsbilal merged commit 09203fd into cockroachdb:master Feb 24, 2022
@itsbilal
Copy link
Copy Markdown
Contributor Author


options.go, line 97 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: it is required

oops, made this fix but then lost it in a stash when rebasing 😬 I guess a future PR will fix this grammar.

@nicktrav nicktrav mentioned this pull request Feb 26, 2022
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants