storage,kvserver: Improve SST collision checking for wide SSTs#81062
storage,kvserver: Improve SST collision checking for wide SSTs#81062craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
66132fd to
0e1fa35
Compare
|
Actually, it'd be better for me to split this into multiple commits. At least the write-L6-bloom-filters-by-default piece can be moved into its own commit, as that'll be the only thing not backported to 22.1. |
0e1fa35 to
195485f
Compare
91340f7 to
a4093cc
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 13 of 13 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):
bytes, err := cArgs.EvalCtx.GetApproximateDiskBytes(start.Key, end.Key) if err == nil { usePrefixSeek = bytes > 100*uint64(len(sst))
Can you add some code commentary about how these two constants were chosen. It will help someone in the future if they want to refine the heuristic.
pkg/server/config.go line 636 at r1 (raw file):
case "none": return nil, nil case "rocksdb.BuiltinBloomFilter":
we already do this in DefaultPebbleOptions, so I am confused on why we are adding this.
pkg/storage/intent_interleaving_iter.go line 331 at r1 (raw file):
intentSeekKey = nil i.intentKey = nil }
are we covering this in the datadriven test?
pkg/storage/pebble.go line 637 at r2 (raw file):
// of the benefit of having bloom filters on every level for only 10% of the // memory cost. opts.Levels[6].FilterPolicy = nil
don't we need a way for someone to configure that they want L0-L5 bloomfilters, but not L6 bloomfilters?
pkg/storage/sst.go line 74 at r1 (raw file):
} extIter := reader.NewMVCCIterator(MVCCKeyAndIntentsIterKind, IterOptions{
nit: can this extIter initialization be moved down after the sstIter initialization, so it is all in one place?
pkg/storage/sst.go line 102 at r1 (raw file):
extOK, extErr = extIter.Valid() } for sstErr == nil && sstOK && (usePrefixSeek || (extOK && extErr == nil)) {
Can you try coding separate loops for the two cases and see if it looks simple and doesn't have too much code duplication. The loop here is confusing me -- large parts are gated by usePrefixSeek and the part that does the coordinated seeks isn't gated by !seekPrefixGE which is also confusing.
pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):
// non-prefix Seek in the conflict check. Sending in nil stats // makes this easier as that forces the function to rely exclusively // on ApproxDiskBytes.
I didn't understand this comment. I thought approxDiskBytes==0 is when we want the SeekPrefixGE to be used. And why don't we always compute the MVCCStats for the sst?
pkg/storage/sst_test.go line 90 at r1 (raw file):
startKey, endKey := MVCCKey{Key: roachpb.Key(start)}, MVCCKey{Key: roachpb.Key(end)} _, err := CheckSSTConflicts(ctx, sstFile.Bytes(), engine, startKey, endKey, false /*disallowShadowing*/, hlc.Timestamp{} /*disallowShadowingBelow*/, tc.maxIntents, false /* usePrefixSeek */)
are we not testing the true case?
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 6 of 13 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @itsbilal)
pkg/storage/engine.go line 295 at r1 (raw file):
// is minimized if the probability of the key existing is not low or if // this is a one-time Seek (where loading the data block directly is better). UseL6Filters bool
Is this expected to be used outside of storage.CheckSSTConflicts? I think we should generally err towards fewer options and knobs exposed through the storage package interface. Can we unexport it?
pkg/storage/sst.go line 110 at r1 (raw file):
// key in sstIter. Note that we can't just use an exhausted extIter // as a sign that we are done; extIter could be skipping keys, so it // must be re-seeked.
by "skipping keys" this comment is referring to the fact that exhaustion of a prefix iterator indicates the prefix is exhausted, not that there are no keys after the prefix? I think it could use some rephrasing.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 13 of 13 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)
mitigated
nit: is it worth mentioning the tradeoff of additional disk usage due to the writing of these L6 filter blocks?
Also - is this worth of a release note, given it is "visible" under the interpretation that an operator would observe a material disk usage increase after this change.
pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):
Previously, sumeerbhola wrote…
Can you add some code commentary about how these two constants were chosen. It will help someone in the future if they want to refine the heuristic.
In addition to the commentary, mind lifting to a const (either in this method, or outside of it). Will focus the commentary a bit more.
pkg/storage/sst.go line 67 at r1 (raw file):
nonPrefixIter := reader.NewMVCCIterator(MVCCKeyAndIntentsIterKind, IterOptions{UpperBound: end.Key}) nonPrefixIter.SeekGE(start) valid, _ := nonPrefixIter.Valid()
nit: store err here and return a few lines below? IIUC valid will be false if err != nil, so you don't need to check the err.
d459f4f to
40a9a92
Compare
itsbilal
left a comment
There was a problem hiding this comment.
TFTRs!
I actually hadn't run the microbenchmark earlier for figuring out a good threshold to set usePrefixSeek. Here it is. I just adapted BenchmarkCheckSSTCollisions to test along the overlap bytes axis, not along the earlier number-of-versions axis. Old = usePrefixSeek false, new = usePrefixSeek true:
CheckSSTConflicts/keys=1000/sstKeys=10/overlap=true/prefixSeek=false-24 76.6µs ± 1% 69.7µs ± 2% -9.01% (p=0.000 n=10+10)
CheckSSTConflicts/keys=1000/sstKeys=200/overlap=true/prefixSeek=false-24 379µs ± 1% 181µs ± 1% -52.07% (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=1000/overlap=true/prefixSeek=false-24 887µs ± 2% 654µs ± 1% -26.23% (p=0.000 n=10+10)
CheckSSTConflicts/keys=1000/sstKeys=10000/overlap=true/prefixSeek=false-24 2.13ms ± 1% 5.90ms ± 1% +177.30% (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=10/overlap=true/prefixSeek=false-24 76.7µs ± 1% 67.9µs ± 1% -11.44% (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=200/overlap=true/prefixSeek=false-24 414µs ± 1% 181µs ± 1% -56.20% (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=1000/overlap=true/prefixSeek=false-24 1.76ms ± 1% 0.66ms ± 1% -62.53% (p=0.000 n=9+10)
CheckSSTConflicts/keys=10000/sstKeys=10000/overlap=true/prefixSeek=false-24 7.96ms ± 2% 5.99ms ± 1% -24.70% (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=10/overlap=true/prefixSeek=false-24 73.8µs ± 1% 66.3µs ± 1% -10.16% (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=200/overlap=true/prefixSeek=false-24 472µs ± 0% 176µs ± 0% -62.63% (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=1000/overlap=true/prefixSeek=false-24 2.09ms ± 1% 0.65ms ± 0% -68.81% (p=0.000 n=10+8)
CheckSSTConflicts/keys=100000/sstKeys=10000/overlap=true/prefixSeek=false-24 19.4ms ± 1% 6.1ms ± 1% -68.84% (p=0.000 n=10+9)
CheckSSTConflicts/keys=1000/sstKeys=100/overlap=true/prefixSeek=false-24 219µs ± 0% 123µs ± 1% -44.12% (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=100000/overlap=true/prefixSeek=false-24 6.97ms ± 0% 58.27ms ± 0% +735.83% (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=100/overlap=true/prefixSeek=false-24 240µs ± 2% 123µs ± 1% -48.57% (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=100000/overlap=true/prefixSeek=false-24 21.5ms ± 1% 59.2ms ± 2% +174.97% (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=100/overlap=true/prefixSeek=false-24 257µs ± 2% 120µs ± 2% -53.25% (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=100000/overlap=true/prefixSeek=false-24 81.3ms ± 1% 60.5ms ± 0% -25.59% (p=0.008 n=5+5)
I've split it up between overlap ratios between sst keys (numerator) and engine keys (denominator):
1/10000:
CheckSSTConflicts/keys=100000/sstKeys=10/overlap=true/prefixSeek=false-24 73.8µs ± 1% 66.3µs ± 1% -10.16% (p=0.000 n=10+10)
1/1000:
CheckSSTConflicts/keys=10000/sstKeys=10/overlap=true/prefixSeek=false-24 76.7µs ± 1% 67.9µs ± 1% -11.44% (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=100/overlap=true/prefixSeek=false-24 257µs ± 2% 120µs ± 2% -53.25% (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=200/overlap=true/prefixSeek=false-24 472µs ± 0% 176µs ± 0% -62.63% (p=0.008 n=5+5)
1/100:
CheckSSTConflicts/keys=1000/sstKeys=10/overlap=true/prefixSeek=false-24 76.6µs ± 1% 69.7µs ± 2% -9.01% (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=100/overlap=true/prefixSeek=false-24 240µs ± 2% 123µs ± 1% -48.57% (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=200/overlap=true/prefixSeek=false-24 414µs ± 1% 181µs ± 1% -56.20% (p=0.008 n=5+5)
CheckSSTConflicts/keys=100000/sstKeys=1000/overlap=true/prefixSeek=false-24 2.09ms ± 1% 0.65ms ± 0% -68.81% (p=0.000 n=10+8)
1/10:
CheckSSTConflicts/keys=1000/sstKeys=100/overlap=true/prefixSeek=false-24 219µs ± 0% 123µs ± 1% -44.12% (p=0.008 n=5+5)
CheckSSTConflicts/keys=1000/sstKeys=200/overlap=true/prefixSeek=false-24 379µs ± 1% 181µs ± 1% -52.07% (p=0.008 n=5+5)
CheckSSTConflicts/keys=10000/sstKeys=1000/overlap=true/prefixSeek=false-24 1.76ms ± 1% 0.66ms ± 1% -62.53% (p=0.000 n=9+10)
CheckSSTConflicts/keys=100000/sstKeys=10000/overlap=true/prefixSeek=false-24 19.4ms ± 1% 6.1ms ± 1% -68.84% (p=0.000 n=10+9)
1/1:
CheckSSTConflicts/keys=1000/sstKeys=1000/overlap=true/prefixSeek=false-24 887µs ± 2% 654µs ± 1% -26.23% (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=10000/overlap=true/prefixSeek=false-24 7.96ms ± 2% 5.99ms ± 1% -24.70% (p=0.000 n=10+10)
CheckSSTConflicts/keys=100000/sstKeys=100000/overlap=true/prefixSeek=false-24 81.3ms ± 1% 60.5ms ± 0% -25.59% (p=0.008 n=5+5)
10/1:
CheckSSTConflicts/keys=1000/sstKeys=10000/overlap=true/prefixSeek=false-24 2.13ms ± 1% 5.90ms ± 1% +177.30% (p=0.000 n=10+10)
CheckSSTConflicts/keys=10000/sstKeys=100000/overlap=true/prefixSeek=false-24 21.5ms ± 1% 59.2ms ± 2% +174.97% (p=0.008 n=5+5)
100/1:
CheckSSTConflicts/keys=1000/sstKeys=100000/overlap=true/prefixSeek=false-24 6.97ms ± 0% 58.27ms ± 0% +735.83% (p=0.008 n=5+5)
Excluding the tiny-sst cases (where very little time is spent iterating through keys), we see that cases where the sst:engine overlap is 1:10 or lower (i.e. more engine keys per sst key), then there's a clear >50% benefit to using prefix seeks. At 1:1 it's more of a wash, and when the ratio is more sst-heavy, it's definitely better to do the regular seek.
Maybe this calls for setting the ratio to 10 instead of the currently-conservative 100. I could rerun my end-to-end experiments with the const at 10, but I'm sure it won't change anything because all the overlap ratios (in bytes) that I saw there were in the 200-10,000 range.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
Previously, nicktrav (Nick Travers) wrote…
mitigated
nit: is it worth mentioning the tradeoff of additional disk usage due to the writing of these L6 filter blocks?
Also - is this worth of a release note, given it is "visible" under the interpretation that an operator would observe a material disk usage increase after this change.
I don't think that'd be necessary. We saw only a 1% increase in space amplification in L6 with filter blocks enabled. For the LSm as a whole, the % increase will be even lower. The memory usage of repeatedly reading that block is a bigger concern and that's why we mitigated it with that IterOption.
pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 172 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
In addition to the commentary, mind lifting to a
const(either in this method, or outside of it). Will focus the commentary a bit more.
Done.
pkg/server/config.go line 636 at r1 (raw file):
Previously, sumeerbhola wrote…
we already do this in
DefaultPebbleOptions, so I am confused on why we are adding this.
The goal is to be able to opt into L6 bloom filters with a command line argument. Currently you can't do that because our parser ignores filter policies. A customer that wants to do a big import on 22.1 could opt into it by passing this flag, without us changing the default behaviour.
The DefaultPebbleOptions change is in the second commit which won't be backported.
pkg/storage/engine.go line 295 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Is this expected to be used outside of
storage.CheckSSTConflicts? I think we should generally err towards fewer options and knobs exposed through the storage package interface. Can we unexport it?
Unexported. Feels a little weird to have an unexported field here but it's better than the alternative.
pkg/storage/intent_interleaving_iter.go line 331 at r1 (raw file):
Previously, sumeerbhola wrote…
are we covering this in the datadriven test?
yep, pkg/storage/testdata/intent_interleaving_iter/basic:143 tests this already.
pkg/storage/pebble.go line 637 at r2 (raw file):
Previously, sumeerbhola wrote…
don't we need a way for someone to configure that they want L0-L5 bloomfilters, but not L6 bloomfilters?
They can do that by passing in a [Level 6]TableFilter=none pebble option. But we're changing the default write-time behaviour on master (after 22.1), as the disk space amplification is minimal (1%). The bigger worry is read-time memory usage, which we'll mitigate with the IterOption defaulting to false except in CheckSSTCollisions.
pkg/storage/sst.go line 67 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
nit: store err here and return a few lines below? IIUC
validwill be false iferr != nil, so you don't need to check the err.
Done.
pkg/storage/sst.go line 74 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: can this extIter initialization be moved down after the sstIter initialization, so it is all in one place?
Done.
pkg/storage/sst.go line 102 at r1 (raw file):
Previously, sumeerbhola wrote…
Can you try coding separate loops for the two cases and see if it looks simple and doesn't have too much code duplication. The loop here is confusing me -- large parts are gated by
usePrefixSeekand the part that does the coordinated seeks isn't gated by!seekPrefixGEwhich is also confusing.
There are only two places in the loop where we use if usePrefixSeek, so the separate loop thing duplicates a lot of code. I did add more comments if that helps. What part are you referring to where we do the coordinated seeks? The one where cmp < 0 down below? In that case we would be seeking the engine iterator forward either way. I could maybe rewrite that if statement, but you'd still need a conditional for the sstIter.NextKey() part
Any of the cases near the bottom where we're calling a Next on the sst iter and then seeking the engine iterator, we don't need to do anything as those cases will work as-is for both kinds of seeks. The trySeekUsingNext optimization also helps a lot there.
pkg/storage/sst.go line 110 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
by "skipping keys" this comment is referring to the fact that exhaustion of a prefix iterator indicates the prefix is exhausted, not that there are no keys after the prefix? I think it could use some rephrasing.
Yep I meant exhaustion mostly, but also I just generally meant that we can't rely on prefix iteration to give us all keys. Clarifying.
pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):
Previously, sumeerbhola wrote…
I didn't understand this comment. I thought
approxDiskBytes==0is when we want the SeekPrefixGE to be used. And why don't we always compute the MVCCStats for the sst?
approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.
Not sending in stats is a weird test-only knob to get around the "if less than 100 keys then always prefix seek" check in cmd_add_sstable, because that relies on the stats to be passed in. We still calculate stats but further down in that function. That's what the comment is talking about. I'll clarify it a little bit.
pkg/storage/sst_test.go line 90 at r1 (raw file):
Previously, sumeerbhola wrote…
are we not testing the true case?
We were, in the EvalAddSSTable test. But now I fixed it here too, good catch.
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 12 of 12 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @sumeerbhola)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 89 at r3 (raw file):
// i.e. the engine:sst byte/key ratio is high and skewed in favour of the engine. // In those cases, since we are less likely to skip SST keys with regular seeks // on the engine, it is more efficient to utilize bloom filters the engine's SSTs
nit: "utilize bloom filters on the engine's SSTs"?
jbowens
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)
pkg/storage/pebble_iterator.go line 129 at r4 (raw file):
// through in pebble. // // p.options.useL6Filters = opts.useL6Filters
Pebble bump is merged now. 👍
pkg/storage/sst.go line 102 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
There are only two places in the loop where we use
if usePrefixSeek, so the separate loop thing duplicates a lot of code. I did add more comments if that helps. What part are you referring to where we do the coordinated seeks? The one wherecmp < 0down below? In that case we would be seeking the engine iterator forward either way. I could maybe rewrite that if statement, but you'd still need a conditional for thesstIter.NextKey()partAny of the cases near the bottom where we're calling a Next on the sst iter and then seeking the engine iterator, we don't need to do anything as those cases will work as-is for both kinds of seeks. The
trySeekUsingNextoptimization also helps a lot there.
Since the usePrefixSeek-gated sections are concentrated at the beginning, where we're positioning the iterators, maybe there's a split implementation that factors out the remainder of the loop into a compareForConflicts function?
pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):
approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.
Do we know how frequent this case is? Is it only relevant for AddSSTables appending to the very end of the keyspace? For the same reason snapshots tend to end up in L0 (#80589), I'd expect boundary overlap would produce small non-zero byte estimates for key ranges other than those at the very end of the keyspace (eg, during an ordered import into a single table with no indexes).
40a9a92 to
823ba67
Compare
itsbilal
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go line 89 at r3 (raw file):
Previously, nicktrav (Nick Travers) wrote…
nit: "utilize bloom filters on the engine's SSTs"?
Done.
pkg/storage/pebble_iterator.go line 129 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Pebble bump is merged now. 👍
Done.
pkg/storage/sst.go line 102 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Since the
usePrefixSeek-gated sections are concentrated at the beginning, where we're positioning the iterators, maybe there's a split implementation that factors out the remainder of the loop into acompareForConflictsfunction?
Done.
pkg/kv/kvserver/batcheval/cmd_add_sstable_test.go line 703 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
approxDiskBytes=0 is no overlap with the engine, so we want a regular seek to just skip past everything.
Do we know how frequent this case is? Is it only relevant for AddSSTables appending to the very end of the keyspace? For the same reason snapshots tend to end up in L0 (#80589), I'd expect boundary overlap would produce small non-zero byte estimates for key ranges other than those at the very end of the keyspace (eg, during an ordered import into a single table with no indexes).
I saw it a lot for sorted imports, where the overlap was straight up 0 in the engine. More commonly than when the overlap was a small nonzero value. Either way this is just a special case in the test, the actual code in practice doesn't special-case == 0.
823ba67 to
fbcf2e0
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 4 of 12 files at r3, 3 of 6 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
pkg/storage/sst.go line 86 at r6 (raw file):
} defer sstIter.Close() sstIter.SeekGE(start)
can you move this sstIter.SeekGE to after compareForCollision so it is in the same place as sstIter.Valid()?
pkg/storage/sst.go line 224 at r6 (raw file):
if err := ctx.Err(); err != nil { return enginepb.MVCCStats{}, err }
Can we move the following here and not do it before the for loop?
extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
extOK, extErr = extIter.Valid()
pkg/storage/sst.go line 234 at r6 (raw file):
sstOK, sstErr = sstIter.Valid() if sstOK { extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key})
this seek could happen in the next iteration instead of being coded here.
Also, we seem to be ignoring the case where extErr != nil.
pkg/storage/sst.go line 244 at r6 (raw file):
// Keep seeking the iterators until both keys are equal. if cmp := bytes.Compare(extKey.Key, sstKey.Key); cmp < 0 {
aren't the cases cmp < 0 and cmp > 0 an internal error since we are doing prefix iteration and SeekPrefixGE guarantees that the prefix must match?
fbcf2e0 to
dc37cce
Compare
itsbilal
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)
pkg/storage/sst.go line 86 at r6 (raw file):
Previously, sumeerbhola wrote…
can you move this
sstIter.SeekGEto aftercompareForCollisionso it is in the same place assstIter.Valid()?
Done.
pkg/storage/sst.go line 224 at r6 (raw file):
Previously, sumeerbhola wrote…
Can we move the following here and not do it before the for loop?
extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key}) extOK, extErr = extIter.Valid()
Done.
pkg/storage/sst.go line 234 at r6 (raw file):
Previously, sumeerbhola wrote…
this seek could happen in the next iteration instead of being coded here.
Also, we seem to be ignoring the case where
extErr != nil.
Done.
pkg/storage/sst.go line 244 at r6 (raw file):
Previously, sumeerbhola wrote…
aren't the cases
cmp < 0andcmp > 0an internal error since we are doing prefix iteration andSeekPrefixGEguarantees that the prefix must match?
Not necessarily, since the prefix is defined as only the user key part of the MVCC key and we could still have cmp < 0 or cmp > 0 for cases where we find a different version of the other key.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r7.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
pkg/storage/sst.go line 244 at r6 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Not necessarily, since the prefix is defined as only the user key part of the MVCC key and we could still have cmp < 0 or cmp > 0 for cases where we find a different version of the other key.
But bytes.Compare(extKey.Key, sstKey.Key) is comparing the roachpb.Key, which is the prefix.
pkg/storage/sst.go line 212 at r8 (raw file):
if sstOK { extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key}) }
I was suggesting removing this SeekGE since we now do it at the beginning of the for loop.
pkg/storage/sst.go line 213 at r8 (raw file):
extIter.SeekGE(MVCCKey{Key: sstIter.UnsafeKey().Key}) } // In the case of prefix seeks, only look at sst iterator exhaustion/errors.
... seeks, do not look at engine prefix iter exhaustion. This is because ...
(since we do want to look at engine prefix errors)
dc37cce to
555e92c
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens, @nicktrav, and @sumeerbhola)
pkg/storage/sst.go line 244 at r6 (raw file):
Previously, sumeerbhola wrote…
But
bytes.Compare(extKey.Key, sstKey.Key)is comparing theroachpb.Key, which is the prefix.
Done. Good point.
pkg/storage/sst.go line 212 at r8 (raw file):
Previously, sumeerbhola wrote…
I was suggesting removing this SeekGE since we now do it at the beginning of the for loop.
Oops, yeah my bad. Done.
pkg/storage/sst.go line 213 at r8 (raw file):
Previously, sumeerbhola wrote…
... seeks, do not look at engine prefix iter exhaustion. This is because ...
(since we do want to look at engine prefix errors)
Done.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 12 files at r3, 1 of 2 files at r9.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @jbowens, @nicktrav, and @sumeerbhola)
pkg/storage/sst.go line 242 at r10 (raw file):
// We just seeked the engine iter. If it has a mismatching prefix, // the iterator is not obeying its contract. if cmp := bytes.Compare(extKey.Key, sstKey.Key); cmp != 0 {
nit: could use bytes.Equal.
This change includes 4 changes to significantly improve the performance of CheckSSTCollisions (and by extension, AddSSTable) for cases where we add very wide sstables relative to the engine: 1) Check if we're adding an sstable that has a small number of keys, or has a 100x greater overlap with the engine relative to its own size. In that case, switch to doing prefix seeks inside CheckSSTCollisions, similar to what we did in cockroachdb#73514 and then reverted later). 2) Thread a new IterOption to only optionally read L6 filter blocks in prefix iteration, defaulting to not reading them. This iterator option hooks into the one added in Pebble in cockroachdb/pebble#1685 ). 3) Add a ParseHook to allow command-line pebble options to allow configuring the writing of bloom filters. This allows L6 filter block writing to be turned on through a command line argument. 4) Update intentInterleavingIter to not prefix-seek the intent/lock table iterator if we are in prefix seek mode and the MVCC iterator returned nothing. Fixes cockroachdb#80980. Release note (performance improvement): Significantly improve performance of IMPORTs when the source is producing data not sorted by the destination table's primary key, especially if the destination table has a very large primary key with lots of columns.
Previously, we did not write bloom filters on L6 sstables. The main worry here was the higher memory usage and the overhead of loading a large filter block for the large L6 SSTs in memory. This is now mitigated thanks to the optional reading of L6 filter blocks only in narrow use cases where they're most helpful. Release note: None.
555e92c to
7531327
Compare
|
TFTR! bors r=sumeerbhola,nicktrav |
|
Build succeeded: |
Skip TestStoreMetrics assertions about bloom filter lookups. Pebble now only consults L6 bloom filters if explicitly requested, and CockroachDB only requests their use when checking for ingested sstable collisions (see cockroachdb#81062). Release note: none Fix cockroachdb#81823.
81859: util/mon: augment "budget exceeded" errors from root monitor with a hint r=yuzefovich a=yuzefovich This commit adjusts the root SQL memory monitor to return "budget exceeded" errors with a hint about considering increasing the `--max-sql-memory` startup argument. This should hopefully help the users to be a bit more self-sufficient. Here is an example error from SQL shell: ``` ERROR: scan with start key /Table/105/2/0: root: memory budget exceeded: 4198400 bytes requested, 4689920 currently allocated, 8388608 bytes in budget SQLSTATE: 53200 HINT: Consider increasing --max-sql-memory startup parameter. ``` Release note: None 81923: kv/kvserver: fix flaky TestStoreMetrics r=nicktrav,nvanbenschoten a=jbowens Skip TestStoreMetrics assertions about bloom filter lookups. Pebble now only consults L6 bloom filters if explicitly requested, and CockroachDB only requests their use when checking for ingested sstable collisions (see #81062). Release note: none Fix #81823. 81927: sql: audit json.AsText for nil return value r=yuzefovich a=yuzefovich This commit audits all calls of `json.AsText` to make sure that we always do nil-pointer check. I think all of the callsites where this commit adds the check are impossible to hit the nil-pointer panic, but it's still good to have the checks explicitly. Informs: #81097. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
This change includes 4 changes to significantly improve
the performance of CheckSSTCollisions (and by extension,
AddSSTable) for cases where we add very wide sstables
relative to the engine:
keys, or has a 100x greater overlap with the engine relative
to its own size. In that case, switch to doing prefix seeks
inside CheckSSTCollisions, similar to what we did in storage: Use prefix iteration for CheckSSTConflicts #73514
and then reverted later).
Also thread a new IterOption to only optionally read these
filter blocks in prefix iteration, defaulting to not reading
them. This iterator option hooks into the one added in
Pebble in *: Add IterOption to optionally read L6 filter blocks. pebble#1685 ).
allow configuring the writing of bloom filters.
intent/lock table iterator if we are in prefix seek mode and
the MVCC iterator returned nothing.
Fixes #80980.
Release note (performance improvement): Significantly improve
performance of IMPORTs when the source is producing data
not sorted by the destination table's primary key, especially
if the destination table has a very large primary key with
lots of columns.