Skip to content

[release-22.1] storage,kvserver: Improve SST collision checking for wide SSTs #82746

Merged
itsbilal merged 1 commit intocockroachdb:release-22.1from
itsbilal:improve-checksstcollisions-22.1
Jun 24, 2022
Merged

[release-22.1] storage,kvserver: Improve SST collision checking for wide SSTs #82746
itsbilal merged 1 commit intocockroachdb:release-22.1from
itsbilal:improve-checksstcollisions-22.1

Conversation

@itsbilal
Copy link
Copy Markdown
Contributor

22.1 backport of the first part of #81062. Also bumps Pebble to pick up the relevant backported commit there.


This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

  1. Check if we're adding an sstable that has a small number of
    keys, or has a 100x greater overlap with the engine relative
    to its own size. In that case, switch to doing prefix seeks
    inside CheckSSTCollisions, similar to what we did in storage: Use prefix iteration for CheckSSTConflicts #73514
    and then reverted later).
  2. Thread a new IterOption to only optionally read L6
    filter blocks in prefix iteration, defaulting to not reading
    them. This iterator option hooks into the one added in
    Pebble in *: Add IterOption to optionally read L6 filter blocks. pebble#1685 ).
  3. Add a ParseHook to allow command-line pebble options to
    allow configuring the writing of bloom filters. This allows
    L6 filter block writing to be turned on through a command
    line argument.
  4. Update intentInterleavingIter to not prefix-seek the
    intent/lock table iterator if we are in prefix seek mode and
    the MVCC iterator returned nothing.

Fixes #80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.

@itsbilal itsbilal requested review from a team as code owners June 10, 2022 19:28
@itsbilal itsbilal self-assigned this Jun 10, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@itsbilal
Copy link
Copy Markdown
Contributor Author

This would benefit from a closer re-review, as the patch did not apply cleanly and had to be manually merged.

This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
@itsbilal itsbilal force-pushed the improve-checksstcollisions-22.1 branch from aa53099 to bd3c083 Compare June 22, 2022 16:08
Copy link
Copy Markdown
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1, 14 of 16 files at r2, 7 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 189 at r3 (raw file):

		// case of no conflicts.
		usePrefixSeek := false
		bytes, err := cArgs.EvalCtx.GetApproximateDiskBytes(start.Key, end.Key)

Any reason for not returning the err here?

Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nicktrav)


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 189 at r3 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Any reason for not returning the err here?

We do the same thing on master, but it's a "best effort" check that falls back to the old behaviour in case of error. Maybe we should return the error because this could mean something else is awry in pebble/storage/the disk; can change that here and on master if you feel strongly that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants