*: Add IterOption to optionally read L6 filter blocks.#1685
*: Add IterOption to optionally read L6 filter blocks.#1685itsbilal merged 1 commit intocockroachdb:masterfrom
Conversation
This change adds an IterOption, defaulting to false, that lets the caller opt into reading L6 filter blocks if they exist on the sstable. Adding this option allows for a low-risk, low-behaviour-change enabling of always writing filter blocks to L6 sstables, as we will not be reading them by default. Necessary to unblock cockroachdb/cockroach#80980 .
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 1 of 4 files at r1, all commit messages.
Reviewable status: 1 of 4 files reviewed, 1 unresolved discussion (waiting on @itsbilal)
options.go line 150 at r1 (raw file):
// existing is not low or if we just expect a one-time Seek (where loading the // data block directly is better). UseL6Filters bool
Do sstables ingested from snapshots have filter blocks today? I'm wondering if there's some minor benefit to this already, even before we start writing filter blocks in L6 for Pebble-constructed sstables.
itsbilal
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status: 1 of 4 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @jbowens)
options.go line 150 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Do sstables ingested from snapshots have filter blocks today? I'm wondering if there's some minor benefit to this already, even before we start writing filter blocks in L6 for Pebble-constructed sstables.
That's correct, any ingested SST that gets ingested straight into L6 will have filter blocks. That should be relatively rare on a longer running cluster/node though, so I'm not sure how obvious the difference would be.
|
Previously, itsbilal (Bilal Akhtar) wrote…
clarification: SSTs made using the |
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
81062: storage,kvserver: Improve SST collision checking for wide SSTs r=sumeerbhola,nicktrav a=itsbilal This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in #73514 and then reverted later). 2) Write bloom filters in L6 by default, to help speed up 1. Also thread a new IterOption to only optionally read these filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes #80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns. Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
This change adds an IterOption, defaulting to false,
that lets the caller opt into reading L6 filter blocks
if they exist on the sstable. Adding this option
allows for a low-risk, low-behaviour-change enabling
of always writing filter blocks to L6 sstables, as we
will not be reading them by default.
Necessary to unblock cockroachdb/cockroach#80980 .