Skip to content

*: Add IterOption to optionally read L6 filter blocks.#1685

Merged
itsbilal merged 1 commit intocockroachdb:masterfrom
itsbilal:opt-into-l6-filters
May 6, 2022
Merged

*: Add IterOption to optionally read L6 filter blocks.#1685
itsbilal merged 1 commit intocockroachdb:masterfrom
itsbilal:opt-into-l6-filters

Conversation

@itsbilal
Copy link
Copy Markdown
Contributor

@itsbilal itsbilal commented May 5, 2022

This change adds an IterOption, defaulting to false,
that lets the caller opt into reading L6 filter blocks
if they exist on the sstable. Adding this option
allows for a low-risk, low-behaviour-change enabling
of always writing filter blocks to L6 sstables, as we
will not be reading them by default.

Necessary to unblock cockroachdb/cockroach#80980 .

This change adds an IterOption, defaulting to false,
that lets the caller opt into reading L6 filter blocks
if they exist on the sstable. Adding this option
allows for a low-risk, low-behaviour-change enabling
of always writing filter blocks to L6 sstables, as we
will not be reading them by default.

Necessary to unblock cockroachdb/cockroach#80980 .
@itsbilal itsbilal requested a review from a team May 5, 2022 13:33
@itsbilal itsbilal self-assigned this May 5, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r1, all commit messages.
Reviewable status: 1 of 4 files reviewed, 1 unresolved discussion (waiting on @itsbilal)


options.go line 150 at r1 (raw file):

	// existing is not low or if we just expect a one-time Seek (where loading the
	// data block directly is better).
	UseL6Filters bool

Do sstables ingested from snapshots have filter blocks today? I'm wondering if there's some minor benefit to this already, even before we start writing filter blocks in L6 for Pebble-constructed sstables.

Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 1 of 4 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)


options.go line 150 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Do sstables ingested from snapshots have filter blocks today? I'm wondering if there's some minor benefit to this already, even before we start writing filter blocks in L6 for Pebble-constructed sstables.

That's correct, any ingested SST that gets ingested straight into L6 will have filter blocks. That should be relatively rare on a longer running cluster/node though, so I'm not sure how obvious the difference would be.

@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented May 6, 2022

options.go line 150 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

That's correct, any ingested SST that gets ingested straight into L6 will have filter blocks. That should be relatively rare on a longer running cluster/node though, so I'm not sure how obvious the difference would be.

clarification: SSTs made using the MakeIngestionSSTWriter SST writer have filters. Which is KV snapshots, but not backup SSTs.

@itsbilal itsbilal merged commit f889707 into cockroachdb:master May 6, 2022
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 16, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 20, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 24, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 25, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 25, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request May 26, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request May 26, 2022
81062: storage,kvserver: Improve SST collision checking for wide SSTs r=sumeerbhola,nicktrav a=itsbilal

This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in #73514
and then reverted later).
2) Write bloom filters in L6 by default, to help speed up 1.
Also thread a new IterOption to only optionally read these
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes #80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Jun 10, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Jun 22, 2022
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants