Skip to content

storage,kvserver: Improve SST collision checking for wide SSTs#81062

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
itsbilal:improve-checksstcollisions
May 26, 2022
Merged

storage,kvserver: Improve SST collision checking for wide SSTs#81062
craig[bot] merged 2 commits intocockroachdb:masterfrom
itsbilal:improve-checksstcollisions

Conversation

@itsbilal
Copy link
Copy Markdown
Contributor

@itsbilal itsbilal commented May 5, 2022

This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

  1. Check if we're adding an sstable that has a small number of
    keys, or has a 100x greater overlap with the engine relative
    to its own size. In that case, switch to doing prefix seeks
    inside CheckSSTCollisions, similar to what we did in storage: Use prefix iteration for CheckSSTConflicts #73514
    and then reverted later).
  2. Write bloom filters in L6 by default, to help speed up 1.
    Also thread a new IterOption to only optionally read these
    filter blocks in prefix iteration, defaulting to not reading
    them. This iterator option hooks into the one added in
    Pebble in *: Add IterOption to optionally read L6 filter blocks. pebble#1685 ).
  3. Add a ParseHook to allow command-line pebble options to
    allow configuring the writing of bloom filters.
  4. Update intentInterleavingIter to not prefix-seek the
    intent/lock table iterator if we are in prefix seek mode and
    the MVCC iterator returned nothing.

Fixes #80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.

@itsbilal itsbilal requested review from a team as code owners May 5, 2022 18:39
@itsbilal itsbilal self-assigned this May 5, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from 66132fd to 0e1fa35 Compare May 5, 2022 19:54
@itsbilal
Copy link
Copy Markdown
Contributor Author

itsbilal commented May 6, 2022

Actually, it'd be better for me to split this into multiple commits. At least the write-L6-bloom-filters-by-default piece can be moved into its own commit, as that'll be the only thing not backported to 22.1.

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from 0e1fa35 to 195485f Compare May 6, 2022 14:29
@itsbilal itsbilal force-pushed the improve-checksstcollisions branch 2 times, most recently from 91340f7 to a4093cc Compare May 16, 2022 14:58
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):

		bytes, err := cArgs.EvalCtx.GetApproximateDiskBytes(start.Key, end.Key)
		if err == nil {
			usePrefixSeek = bytes > 100*uint64(len(sst))

Can you add some code commentary about how these two constants were chosen. It will help someone in the future if they want to refine the heuristic.


pkg/server/config.go line 636 at r1 (raw file):

						case "none":
							return nil, nil
						case "rocksdb.BuiltinBloomFilter":

we already do this in DefaultPebbleOptions, so I am confused on why we are adding this.


pkg/storage/intent_interleaving_iter.go line 331 at r1 (raw file):

		intentSeekKey = nil
		i.intentKey = nil
	}

are we covering this in the datadriven test?


pkg/storage/pebble.go line 637 at r2 (raw file):

	// of the benefit of having bloom filters on every level for only 10% of the
	// memory cost.
	opts.Levels[6].FilterPolicy = nil

don't we need a way for someone to configure that they want L0-L5 bloomfilters, but not L6 bloomfilters?


pkg/storage/sst.go line 74 at r1 (raw file):

	}

	extIter := reader.NewMVCCIterator(MVCCKeyAndIntentsIterKind, IterOptions{

nit: can this extIter initialization be moved down after the sstIter initialization, so it is all in one place?


pkg/storage/sst.go line 102 at r1 (raw file):

		extOK, extErr = extIter.Valid()
	}
	for sstErr == nil && sstOK && (usePrefixSeek || (extOK && extErr == nil)) {

Can you try coding separate loops for the two cases and see if it looks simple and doesn't have too much code duplication. The loop here is confusing me -- large parts are gated by usePrefixSeek and the part that does the coordinated seeks isn't gated by !seekPrefixGE which is also confusing.


pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):

						// non-prefix Seek in the conflict check. Sending in nil stats
						// makes this easier as that forces the function to rely exclusively
						// on ApproxDiskBytes.

I didn't understand this comment. I thought approxDiskBytes==0 is when we want the SeekPrefixGE to be used. And why don't we always compute the MVCCStats for the sst?


pkg/storage/sst_test.go line 90 at r1 (raw file):

					startKey, endKey := MVCCKey{Key: roachpb.Key(start)}, MVCCKey{Key: roachpb.Key(end)}
					_, err := CheckSSTConflicts(ctx, sstFile.Bytes(), engine, startKey, endKey,
						false /*disallowShadowing*/, hlc.Timestamp{} /*disallowShadowingBelow*/, tc.maxIntents, false /* usePrefixSeek */)

are we not testing the true case?

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/storage/engine.go line 295 at r1 (raw file):

	// is minimized if the probability of the key existing is not low or if
	// this is a one-time Seek (where loading the data block directly is better).
	UseL6Filters bool

Is this expected to be used outside of storage.CheckSSTConflicts? I think we should generally err towards fewer options and knobs exposed through the storage package interface. Can we unexport it?


pkg/storage/sst.go line 110 at r1 (raw file):

			// key in sstIter. Note that we can't just use an exhausted extIter
			// as a sign that we are done; extIter could be skipping keys, so it
			// must be re-seeked.

by "skipping keys" this comment is referring to the fact that exhaustion of a prefix iterator indicates the prefix is exhausted, not that there are no keys after the prefix? I think it could use some rephrasing.

Copy link
Copy Markdown
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


-- commits line 40 at r2:

mitigated

nit: is it worth mentioning the tradeoff of additional disk usage due to the writing of these L6 filter blocks?

Also - is this worth of a release note, given it is "visible" under the interpretation that an operator would observe a material disk usage increase after this change.


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):

Previously, sumeerbhola wrote…

Can you add some code commentary about how these two constants were chosen. It will help someone in the future if they want to refine the heuristic.

In addition to the commentary, mind lifting to a const (either in this method, or outside of it). Will focus the commentary a bit more.


pkg/storage/sst.go line 67 at r1 (raw file):

		nonPrefixIter := reader.NewMVCCIterator(MVCCKeyAndIntentsIterKind, IterOptions{UpperBound: end.Key})
		nonPrefixIter.SeekGE(start)
		valid, _ := nonPrefixIter.Valid()

nit: store err here and return a few lines below? IIUC valid will be false if err != nil, so you don't need to check the err.

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch 3 times, most recently from d459f4f to 40a9a92 Compare May 20, 2022 20:31
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I actually hadn't run the microbenchmark earlier for figuring out a good threshold to set usePrefixSeek. Here it is. I just adapted BenchmarkCheckSSTCollisions to test along the overlap bytes axis, not along the earlier number-of-versions axis. Old = usePrefixSeek false, new = usePrefixSeek true:

CheckSSTConflicts/keys=1000/sstKeys=10/overlap=true/prefixSeek=false-24         76.6µs ± 1%   69.7µs ± 2%    -9.01%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=1000/sstKeys=200/overlap=true/prefixSeek=false-24         379µs ± 1%    181µs ± 1%   -52.07%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=1000/overlap=true/prefixSeek=false-24        887µs ± 2%    654µs ± 1%   -26.23%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=1000/sstKeys=10000/overlap=true/prefixSeek=false-24      2.13ms ± 1%   5.90ms ± 1%  +177.30%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=10/overlap=true/prefixSeek=false-24        76.7µs ± 1%   67.9µs ± 1%   -11.44%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=200/overlap=true/prefixSeek=false-24        414µs ± 1%    181µs ± 1%   -56.20%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=1000/overlap=true/prefixSeek=false-24      1.76ms ± 1%   0.66ms ± 1%   -62.53%  (p=0.000 n=9+10)
CheckSSTConflicts/keys=10000/sstKeys=10000/overlap=true/prefixSeek=false-24     7.96ms ± 2%   5.99ms ± 1%   -24.70%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=10/overlap=true/prefixSeek=false-24       73.8µs ± 1%   66.3µs ± 1%   -10.16%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=200/overlap=true/prefixSeek=false-24       472µs ± 0%    176µs ± 0%   -62.63%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=1000/overlap=true/prefixSeek=false-24     2.09ms ± 1%   0.65ms ± 0%   -68.81%  (p=0.000 n=10+8)
CheckSSTConflicts/keys=100000/sstKeys=10000/overlap=true/prefixSeek=false-24    19.4ms ± 1%    6.1ms ± 1%   -68.84%  (p=0.000 n=10+9)
CheckSSTConflicts/keys=1000/sstKeys=100/overlap=true/prefixSeek=false-24         219µs ± 0%    123µs ± 1%   -44.12%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=100000/overlap=true/prefixSeek=false-24     6.97ms ± 0%  58.27ms ± 0%  +735.83%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=100/overlap=true/prefixSeek=false-24        240µs ± 2%    123µs ± 1%   -48.57%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=100000/overlap=true/prefixSeek=false-24    21.5ms ± 1%   59.2ms ± 2%  +174.97%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=100/overlap=true/prefixSeek=false-24       257µs ± 2%    120µs ± 2%   -53.25%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=100000/overlap=true/prefixSeek=false-24   81.3ms ± 1%   60.5ms ± 0%   -25.59%  (p=0.008 n=5+5)

I've split it up between overlap ratios between sst keys (numerator) and engine keys (denominator):

1/10000:
CheckSSTConflicts/keys=100000/sstKeys=10/overlap=true/prefixSeek=false-24       73.8µs ± 1%   66.3µs ± 1%   -10.16%  (p=0.000 n=10+10)

1/1000:
CheckSSTConflicts/keys=10000/sstKeys=10/overlap=true/prefixSeek=false-24        76.7µs ± 1%   67.9µs ± 1%   -11.44%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=100/overlap=true/prefixSeek=false-24       257µs ± 2%    120µs ± 2%   -53.25%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=200/overlap=true/prefixSeek=false-24       472µs ± 0%    176µs ± 0%   -62.63%  (p=0.008 n=5+5)

1/100:
CheckSSTConflicts/keys=1000/sstKeys=10/overlap=true/prefixSeek=false-24         76.6µs ± 1%   69.7µs ± 2%    -9.01%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=100/overlap=true/prefixSeek=false-24        240µs ± 2%    123µs ± 1%   -48.57%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=200/overlap=true/prefixSeek=false-24        414µs ± 1%    181µs ± 1%   -56.20%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=1000/overlap=true/prefixSeek=false-24     2.09ms ± 1%   0.65ms ± 0%   -68.81%  (p=0.000 n=10+8)

1/10:
CheckSSTConflicts/keys=1000/sstKeys=100/overlap=true/prefixSeek=false-24         219µs ± 0%    123µs ± 1%   -44.12%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=200/overlap=true/prefixSeek=false-24         379µs ± 1%    181µs ± 1%   -52.07%  (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=1000/overlap=true/prefixSeek=false-24      1.76ms ± 1%   0.66ms ± 1%   -62.53%  (p=0.000 n=9+10)
CheckSSTConflicts/keys=100000/sstKeys=10000/overlap=true/prefixSeek=false-24    19.4ms ± 1%    6.1ms ± 1%   -68.84%  (p=0.000 n=10+9)

1/1:
CheckSSTConflicts/keys=1000/sstKeys=1000/overlap=true/prefixSeek=false-24        887µs ± 2%    654µs ± 1%   -26.23%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=10000/overlap=true/prefixSeek=false-24     7.96ms ± 2%   5.99ms ± 1%   -24.70%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=100000/overlap=true/prefixSeek=false-24   81.3ms ± 1%   60.5ms ± 0%   -25.59%  (p=0.008 n=5+5)

10/1:
CheckSSTConflicts/keys=1000/sstKeys=10000/overlap=true/prefixSeek=false-24      2.13ms ± 1%   5.90ms ± 1%  +177.30%  (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=100000/overlap=true/prefixSeek=false-24    21.5ms ± 1%   59.2ms ± 2%  +174.97%  (p=0.008 n=5+5)

100/1:
CheckSSTConflicts/keys=1000/sstKeys=100000/overlap=true/prefixSeek=false-24     6.97ms ± 0%  58.27ms ± 0%  +735.83%  (p=0.008 n=5+5)

Excluding the tiny-sst cases (where very little time is spent iterating through keys), we see that cases where the sst:engine overlap is 1:10 or lower (i.e. more engine keys per sst key), then there's a clear >50% benefit to using prefix seeks. At 1:1 it's more of a wash, and when the ratio is more sst-heavy, it's definitely better to do the regular seek.

Maybe this calls for setting the ratio to 10 instead of the currently-conservative 100. I could rerun my end-to-end experiments with the const at 10, but I'm sure it won't change anything because all the overlap ratios (in bytes) that I saw there were in the 200-10,000 range.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)


-- commits line 40 at r2:

Previously, nicktrav (Nick Travers) wrote…

mitigated

nit: is it worth mentioning the tradeoff of additional disk usage due to the writing of these L6 filter blocks?

Also - is this worth of a release note, given it is "visible" under the interpretation that an operator would observe a material disk usage increase after this change.

I don't think that'd be necessary. We saw only a 1% increase in space amplification in L6 with filter blocks enabled. For the LSm as a whole, the % increase will be even lower. The memory usage of repeatedly reading that block is a bigger concern and that's why we mitigated it with that IterOption.


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

In addition to the commentary, mind lifting to a const (either in this method, or outside of it). Will focus the commentary a bit more.

Done.


pkg/server/config.go line 636 at r1 (raw file):

Previously, sumeerbhola wrote…

we already do this in DefaultPebbleOptions, so I am confused on why we are adding this.

The goal is to be able to opt into L6 bloom filters with a command line argument. Currently you can't do that because our parser ignores filter policies. A customer that wants to do a big import on 22.1 could opt into it by passing this flag, without us changing the default behaviour.

The DefaultPebbleOptions change is in the second commit which won't be backported.


pkg/storage/engine.go line 295 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is this expected to be used outside of storage.CheckSSTConflicts? I think we should generally err towards fewer options and knobs exposed through the storage package interface. Can we unexport it?

Unexported. Feels a little weird to have an unexported field here but it's better than the alternative.


pkg/storage/intent_interleaving_iter.go line 331 at r1 (raw file):

Previously, sumeerbhola wrote…

are we covering this in the datadriven test?

yep, pkg/storage/testdata/intent_interleaving_iter/basic:143 tests this already.


pkg/storage/pebble.go line 637 at r2 (raw file):

Previously, sumeerbhola wrote…

don't we need a way for someone to configure that they want L0-L5 bloomfilters, but not L6 bloomfilters?

They can do that by passing in a [Level 6]TableFilter=none pebble option. But we're changing the default write-time behaviour on master (after 22.1), as the disk space amplification is minimal (1%). The bigger worry is read-time memory usage, which we'll mitigate with the IterOption defaulting to false except in CheckSSTCollisions.


pkg/storage/sst.go line 67 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: store err here and return a few lines below? IIUC valid will be false if err != nil, so you don't need to check the err.

Done.


pkg/storage/sst.go line 74 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: can this extIter initialization be moved down after the sstIter initialization, so it is all in one place?

Done.


pkg/storage/sst.go line 102 at r1 (raw file):

Previously, sumeerbhola wrote…

Can you try coding separate loops for the two cases and see if it looks simple and doesn't have too much code duplication. The loop here is confusing me -- large parts are gated by usePrefixSeek and the part that does the coordinated seeks isn't gated by !seekPrefixGE which is also confusing.

There are only two places in the loop where we use if usePrefixSeek, so the separate loop thing duplicates a lot of code. I did add more comments if that helps. What part are you referring to where we do the coordinated seeks? The one where cmp < 0 down below? In that case we would be seeking the engine iterator forward either way. I could maybe rewrite that if statement, but you'd still need a conditional for the sstIter.NextKey() part

Any of the cases near the bottom where we're calling a Next on the sst iter and then seeking the engine iterator, we don't need to do anything as those cases will work as-is for both kinds of seeks. The trySeekUsingNext optimization also helps a lot there.


pkg/storage/sst.go line 110 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

by "skipping keys" this comment is referring to the fact that exhaustion of a prefix iterator indicates the prefix is exhausted, not that there are no keys after the prefix? I think it could use some rephrasing.

Yep I meant exhaustion mostly, but also I just generally meant that we can't rely on prefix iteration to give us all keys. Clarifying.


pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):

Previously, sumeerbhola wrote…

I didn't understand this comment. I thought approxDiskBytes==0 is when we want the SeekPrefixGE to be used. And why don't we always compute the MVCCStats for the sst?

approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.

Not sending in stats is a weird test-only knob to get around the "if less than 100 keys then always prefix seek" check in cmd_add_sstable, because that relies on the stats to be passed in. We still calculate stats but further down in that function. That's what the comment is talking about. I'll clarify it a little bit.


pkg/storage/sst_test.go line 90 at r1 (raw file):

Previously, sumeerbhola wrote…

are we not testing the true case?

We were, in the EvalAddSSTable test. But now I fixed it here too, good catch.

Copy link
Copy Markdown
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my feedback addressed. :lgtm:

Reviewed 12 of 12 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 89 at r3 (raw file):

// i.e. the engine:sst byte/key ratio is high and skewed in favour of the engine.
// In those cases, since we are less likely to skip SST keys with regular seeks
// on the engine, it is more efficient to utilize bloom filters the engine's SSTs

nit: "utilize bloom filters on the engine's SSTs"?

Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/pebble_iterator.go line 129 at r4 (raw file):

	// through in pebble.
	//
	// p.options.useL6Filters = opts.useL6Filters

Pebble bump is merged now. 👍


pkg/storage/sst.go line 102 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

There are only two places in the loop where we use if usePrefixSeek, so the separate loop thing duplicates a lot of code. I did add more comments if that helps. What part are you referring to where we do the coordinated seeks? The one where cmp < 0 down below? In that case we would be seeking the engine iterator forward either way. I could maybe rewrite that if statement, but you'd still need a conditional for the sstIter.NextKey() part

Any of the cases near the bottom where we're calling a Next on the sst iter and then seeking the engine iterator, we don't need to do anything as those cases will work as-is for both kinds of seeks. The trySeekUsingNext optimization also helps a lot there.

Since the usePrefixSeek-gated sections are concentrated at the beginning, where we're positioning the iterators, maybe there's a split implementation that factors out the remainder of the loop into a compareForConflicts function?


pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):

approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.

Do we know how frequent this case is? Is it only relevant for AddSSTables appending to the very end of the keyspace? For the same reason snapshots tend to end up in L0 (#80589), I'd expect boundary overlap would produce small non-zero byte estimates for key ranges other than those at the very end of the keyspace (eg, during an ordered import into a single table with no indexes).

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from 40a9a92 to 823ba67 Compare May 24, 2022 16:23
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)


pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 89 at r3 (raw file):

Previously, nicktrav (Nick Travers) wrote…

nit: "utilize bloom filters on the engine's SSTs"?

Done.


pkg/storage/pebble_iterator.go line 129 at r4 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Pebble bump is merged now. 👍

Done.


pkg/storage/sst.go line 102 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Since the usePrefixSeek-gated sections are concentrated at the beginning, where we're positioning the iterators, maybe there's a split implementation that factors out the remainder of the loop into a compareForConflicts function?

Done.


pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.

Do we know how frequent this case is? Is it only relevant for AddSSTables appending to the very end of the keyspace? For the same reason snapshots tend to end up in L0 (#80589), I'd expect boundary overlap would produce small non-zero byte estimates for key ranges other than those at the very end of the keyspace (eg, during an ordered import into a single table with no indexes).

I saw it a lot for sorted imports, where the overlap was straight up 0 in the engine. More commonly than when the overlap was a small nonzero value. Either way this is just a special case in the test, the actual code in practice doesn't special-case == 0.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 12 files at r3, 3 of 6 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)


pkg/storage/sst.go line 86 at r6 (raw file):

	}
	defer sstIter.Close()
	sstIter.SeekGE(start)

can you move this sstIter.SeekGE to after compareForCollision so it is in the same place as sstIter.Valid()?


pkg/storage/sst.go line 224 at r6 (raw file):

			if err := ctx.Err(); err != nil {
				return enginepb.MVCCStats{}, err
			}

Can we move the following here and not do it before the for loop?

extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
extOK, extErr = extIter.Valid()

pkg/storage/sst.go line 234 at r6 (raw file):

				sstOK, sstErr = sstIter.Valid()
				if sstOK {
					extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})

this seek could happen in the next iteration instead of being coded here.

Also, we seem to be ignoring the case where extErr != nil.


pkg/storage/sst.go line 244 at r6 (raw file):

			// Keep seeking the iterators until both keys are equal.
			if cmp := bytes.Compare(extKey.Key, sstKey.Key); cmp < 0 {

aren't the cases cmp < 0 and cmp > 0 an internal error since we are doing prefix iteration and SeekPrefixGE guarantees that the prefix must match?

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from fbcf2e0 to dc37cce Compare May 25, 2022 19:14
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)


pkg/storage/sst.go line 86 at r6 (raw file):

Previously, sumeerbhola wrote…

can you move this sstIter.SeekGE to after compareForCollision so it is in the same place as sstIter.Valid()?

Done.


pkg/storage/sst.go line 224 at r6 (raw file):

Previously, sumeerbhola wrote…

Can we move the following here and not do it before the for loop?

extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
extOK, extErr = extIter.Valid()

Done.


pkg/storage/sst.go line 234 at r6 (raw file):

Previously, sumeerbhola wrote…

this seek could happen in the next iteration instead of being coded here.

Also, we seem to be ignoring the case where extErr != nil.

Done.


pkg/storage/sst.go line 244 at r6 (raw file):

Previously, sumeerbhola wrote…

aren't the cases cmp < 0 and cmp > 0 an internal error since we are doing prefix iteration and SeekPrefixGE guarantees that the prefix must match?

Not necessarily, since the prefix is defined as only the user key part of the MVCC key and we could still have cmp < 0 or cmp > 0 for cases where we find a different version of the other key.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)


pkg/storage/sst.go line 244 at r6 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Not necessarily, since the prefix is defined as only the user key part of the MVCC key and we could still have cmp < 0 or cmp > 0 for cases where we find a different version of the other key.

But bytes.Compare(extKey.Key, sstKey.Key) is comparing the roachpb.Key, which is the prefix.


pkg/storage/sst.go line 212 at r8 (raw file):

		if sstOK {
			extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
		}

I was suggesting removing this SeekGE since we now do it at the beginning of the for loop.


pkg/storage/sst.go line 213 at r8 (raw file):

			extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
		}
		// In the case of prefix seeks, only look at sst iterator exhaustion/errors.

... seeks, do not look at engine prefix iter exhaustion. This is because ...

(since we do want to look at engine prefix errors)

@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from dc37cce to 555e92c Compare May 25, 2022 23:56
Copy link
Copy Markdown
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)


pkg/storage/sst.go line 244 at r6 (raw file):

Previously, sumeerbhola wrote…

But bytes.Compare(extKey.Key, sstKey.Key) is comparing the roachpb.Key, which is the prefix.

Done. Good point.


pkg/storage/sst.go line 212 at r8 (raw file):

Previously, sumeerbhola wrote…

I was suggesting removing this SeekGE since we now do it at the beginning of the for loop.

Oops, yeah my bad. Done.


pkg/storage/sst.go line 213 at r8 (raw file):

Previously, sumeerbhola wrote…

... seeks, do not look at engine prefix iter exhaustion. This is because ...

(since we do want to look at engine prefix errors)

Done.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 12 files at r3, 1 of 2 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)


pkg/storage/sst.go line 242 at r10 (raw file):

			// We just seeked the engine iter. If it has a mismatching prefix,
			// the iterator is not obeying its contract.
			if cmp := bytes.Compare(extKey.Key, sstKey.Key); cmp != 0 {

nit: could use bytes.Equal.

itsbilal added 2 commits May 25, 2022 20:28
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:

1) Check if we're adding an sstable that has a small number of
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in cockroachdb#73514
and then reverted later).
2) Thread a new IterOption to only optionally read L6
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in cockroachdb/pebble#1685 ).
3) Add a ParseHook to allow command-line pebble options to
allow configuring the writing of bloom filters. This allows
L6 filter block writing to be turned on through a command
line argument.
4) Update intentInterleavingIter to not prefix-seek the
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.

Fixes cockroachdb#80980.

Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.
Previously, we did not write bloom filters on L6 sstables.
The main worry here was the higher memory usage and the
overhead of loading a large filter block for the large L6
SSTs in memory. This is now mitigated thanks to the optional
reading of L6 filter blocks only in narrow use cases where
they're most helpful.

Release note: None.
@itsbilal itsbilal force-pushed the improve-checksstcollisions branch from 555e92c to 7531327 Compare May 26, 2022 00:28
@itsbilal
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=sumeerbhola,nicktrav

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 26, 2022

Build succeeded:

@craig craig bot merged commit 7a17498 into cockroachdb:master May 26, 2022
jbowens added a commit to jbowens/cockroach that referenced this pull request May 26, 2022
Skip TestStoreMetrics assertions about bloom filter lookups.  Pebble now
only consults L6 bloom filters if explicitly requested, and CockroachDB
only requests their use when checking for ingested sstable collisions
(see cockroachdb#81062).

Release note: none

Fix cockroachdb#81823.
craig bot pushed a commit that referenced this pull request May 26, 2022
81859: util/mon: augment "budget exceeded" errors from root monitor with a hint r=yuzefovich a=yuzefovich

This commit adjusts the root SQL memory monitor to return "budget
exceeded" errors with a hint about considering increasing the
`--max-sql-memory` startup argument. This should hopefully help the
users to be a bit more self-sufficient.

Here is an example error from SQL shell:
```
ERROR: scan with start key /Table/105/2/0: root: memory budget exceeded: 4198400 bytes requested, 4689920 currently allocated, 8388608 bytes in budget
SQLSTATE: 53200
HINT: Consider increasing --max-sql-memory startup parameter.
```

Release note: None

81923: kv/kvserver: fix flaky TestStoreMetrics r=nicktrav,nvanbenschoten a=jbowens

Skip TestStoreMetrics assertions about bloom filter lookups.  Pebble now
only consults L6 bloom filters if explicitly requested, and CockroachDB
only requests their use when checking for ingested sstable collisions
(see #81062).

Release note: none

Fix #81823.

81927: sql: audit json.AsText for nil return value r=yuzefovich a=yuzefovich

This commit audits all calls of `json.AsText` to make sure that we
always do nil-pointer check. I think all of the callsites where this
commit adds the check are impossible to hit the nil-pointer panic, but
it's still good to have the checks explicitly.

Informs: #81097.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage,kvserver: Improve CheckSSTCollisions for unsorted imports

5 participants