Skip to content

rangefeed: use time-bound iterator for some catchup scans#66312

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
stevendanna:tbi-for-catchup-scan
Sep 1, 2021
Merged

rangefeed: use time-bound iterator for some catchup scans#66312
craig[bot] merged 3 commits intocockroachdb:masterfrom
stevendanna:tbi-for-catchup-scan

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna commented Jun 10, 2021

We've been seeing periodic failures near the end of cdc/ledger
tests. During these failures, our changefeed's high-watermark falls
too far behind realtime. We believe that this is the result of a
range-split that happens when the entry table reaches > 512 MiB. The
range split results in a catchup-scan which takes considerable time.

We've seen similar behaviour in customer installations where we've
observed rangefeed catch-up scan taking many minutes.

Using the IncrementalIterator, our cdc/ledger test survives the
range-split with no substantial increase in our changefeed latency.

The existing IncrementalIterator assumed that any intent or inline
values inside the range were errors. To fully support range-feeds, we
want to see both of these. To facilitate this, we add two new
configuration options to the IncrementalIterator, IntentPolicy and
InlinePolicy.

The code for the catch-up scan has been extracted into its own method
to make it easier to benchmark. Below are some benchmark
results. Here, we see that in our best case, TBI improves catch-up
scan performance substantially; however, there are some worst-cases
where it doesn't. We've opted to make this a cluster option that is
disable by default for now -- allowing us to recommend enabling it for
customers where we think it will have a positive impact.

goos: darwin
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkCatchupScan
BenchmarkCatchupScan/linear-keys
BenchmarkCatchupScan/linear-keys/useTBI=true
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16                       2         683491479 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16                      4         328828152 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16                      7         160052262 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16                     39          28729009 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16                     57          19237201 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16                      3         408734474 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16                     4         329713989 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16                     4         286746010 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16                     4         255228312 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16                     4         259558576 ns/op
BenchmarkCatchupScan/random-keys
BenchmarkCatchupScan/random-keys/useTBI=true
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=0.00-16                       2         697607634 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=50.00-16                      2         577198422 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=75.00-16                      2         543513832 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=95.00-16                      3         483985331 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=99.00-16                      3         456677720 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=0.00-16                      3         417899254 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=50.00-16                     3         347611040 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=75.00-16                     4         310572985 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=95.00-16                     4         264969646 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=99.00-16                     4         300928994 ns/op
BenchmarkCatchupScan/mixed-case
BenchmarkCatchupScan/mixed-case/useTBI=true
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=0.00-16                        2         668092698 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=50.00-16                       2         556421222 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=75.00-16                       2         501923714 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=95.00-16                       6         187828123 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=99.00-16                       6         172837738 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=0.00-16                       3         414819631 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=50.00-16                      3         336780253 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=75.00-16                      3         344123744 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=95.00-16                      4         255228776 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=99.00-16                      4         491268277 ns/op
PASS
ok      github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed      75.496s

When using the incremental iterator, the previous value of a key might
be outside the time window of the iterator. A future PR will extend
the incremental iterator to handle this case; but here, we've disabled
the optimization if withDiff is specified for the rangefeed.

Release note (enterprise change): Improve the performance for
CHANGEFEEDs during some range-split operations.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner
Copy link
Copy Markdown
Contributor

This looks right to me. I wonder if there's any reason to have any hesitation. @nvanbenschoten any reason for hesitation here that you can think of?

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems safe in theory, but I don't understand the current state of MVCCIncrementalIterator enough to vouch for the safety of the use of MinTimestampHint without a corresponding non-time bound iterator. I remember some concern from years ago about an intent's metadata key and its provisional value key being split between SSTs and then iteration considering the provisional value to be a committed value. I don't know whether this is still possible.

Perhaps @adityamaru or @pbardea could weigh-in, as they have touched MVCCIncrementalIterator fairly recently.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

@stevendanna
Copy link
Copy Markdown
Collaborator Author

I remember some concern from years ago about an intent's metadata key and its provisional value key being split between SSTs and then iteration considering the provisional value to be a committed value. I don't know whether this is still possible.

That sounds very bad indeed. It looks like there was an attempt to address just that here:

#31290

but I'll need to dig into it to see if that fix also covers the case of time-bound iterators.

@stevendanna
Copy link
Copy Markdown
Collaborator Author

stevendanna commented Jun 11, 2021

I wrote a version of TestMVCCIncrementalIteratorIntentStraddlesSStables that matches our usage here, So I can confirm we do get the metadata key even when it staddles an SSTable. However, that test ends up creating a database with interleaved intents. It would be nice to create a similar test for separated intents. This comment above MinTimestampHint:

// These fields are only relevant for MVCCIterators. Additionally, an
// MVCCIterator with timestamp hints will not see separated intents, and may
// not see some interleaved intents. Currently, the only way to correctly
// use such an iterator is to use it in concert with an iterator without
// timestamp hints, as done by MVCCIncrementalIterator.

I took to mean that in the case of separated intents, we wouldn't see the intent and also wouldn't see the provisional value. But now I worry it means we won't see the intents but will see the provisional values.

#43799 also has lot of conversation I should probably peruse as well.

I can see if NewMVCCIncrementalIterator provides a similar performance improvement as using the tbi alone; however, I think we'll need a different version of it since it doesn't provide us the semantics we need with respect to seeing every revision of a key.

@stevendanna stevendanna added the do-not-merge bors won't merge a PR with this label. label Jun 11, 2021
@stevendanna
Copy link
Copy Markdown
Collaborator Author

stevendanna commented Jun 11, 2021

From what I can tell, this is going to be wrong for separated intents and it makes me pretty nervous that we don't have a test that failed here.

For example, if I create a database with a few keys and 1 pending transaction and a TBI on that DB with the settings here, and if separated intents are off, then we still end up getting the intent and the uncommitted key ("c"):

    mvcc_incremental_iterator_test.go:1063: ------- db 1 ------- 
    mvcc_incremental_iterator_test.go:1069: separated intents: false
    mvcc_incremental_iterator_test.go:1031: saw key "b"/0.000000001,0 -> b value
    mvcc_incremental_iterator_test.go:1038: saw meta for c
    mvcc_incremental_iterator_test.go:1031: saw key "c"/0.000000002,0 -> c value

But, with separated intents on:

    mvcc_incremental_iterator_test.go:1063: ------- db 1 ------- 
    mvcc_incremental_iterator_test.go:1069: separated intents: true
    mvcc_incremental_iterator_test.go:1031: saw key "b"/0.000000001,0 -> b value
    mvcc_incremental_iterator_test.go:1031: saw key "c"/0.000000002,0 -> c value

we see key "c" but with no indication of the intent. In retrospect, this is exactly what that comment told me (and is somewhat clear now that I understand a bit more about these APIs), I just let optimism get the better of me.

So, as-is, this definitely isn't usable here unless I'm missing something. I can look into making something like MVCCIncrementalIterator but with the semantics needed for the catch-up scan in rangefeeds (still experimenting here to see if anything needs to change. I think we just need to be careful with the inclusive vs exclusive time bounds and potentially provide some different behaviour around intent handling).

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch 2 times, most recently from 9d5f89b to d2e33f7 Compare June 15, 2021 11:07
@stevendanna stevendanna changed the title rangefeed: use time-bound iterator for catchup scans rangefeed: use time-bound iterator for some catchup scans Jun 15, 2021
@stevendanna
Copy link
Copy Markdown
Collaborator Author

I've redone this based on MVCCIncrementalIterator. Unfortunately, we can only turn this optimization on in some cases since the withDiff option to rangefeeds requires that we see values from arbitrarily in the past.

@ajwerner
Copy link
Copy Markdown
Contributor

I've redone this based on MVCCIncrementalIterator. Unfortunately, we can only turn this optimization on in some cases since the withDiff option to rangefeeds requires that we see values from arbitrarily in the past.

What's unfortunate is that the MVCCIncrementalIterator, under the hood, has ~exactly the technology we need to implement withDiff. The MVCCIncrementalIterator, internally, has iter a "sanity iterator" which is a normal iterator. In order to get the "prev" value for withDiff, we'd just want to peek at the next value of the underlying storage iterator under iter. I also am pretty certain that looking for that next value would not break the sanity iteration. All that being said, the interfaces don't look obvious for how to exploit this situation. In the shorter term, you could just run yet another iterator and seek it to for the withDiff case.

@stevendanna
Copy link
Copy Markdown
Collaborator Author

All that being said, the interfaces don't look obvious for how to exploit this situation. In the shorter term, you could just run yet another iterator and seek it to for the withDiff case.

Yup. I've started in on this but wanted to at least get this first slice of work out.

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch 2 times, most recently from 9fe409e to 5928791 Compare June 21, 2021 09:50
@stevendanna stevendanna removed the do-not-merge bors won't merge a PR with this label. label Jun 21, 2021
@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from 5928791 to 9a8248c Compare June 21, 2021 09:55
@stevendanna
Copy link
Copy Markdown
Collaborator Author

I think this should be ready for a second look. I added an additional iterator to handle the previous values and added some testing for the intent and inline policies.

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from 9a8248c to 60ae339 Compare August 9, 2021 11:47
@stevendanna stevendanna requested review from a team as code owners August 9, 2021 11:47
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to get this in for 21.2?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @nvanbenschoten)

@stevendanna
Copy link
Copy Markdown
Collaborator Author

stevendanna commented Aug 18, 2021

@ajwerner I would like that. I'm working on some other emergent bugs at the moment, but the TODO list is

  1. Push a few more benchmarks that better show the performance improvements (I have these done locally)
  2. Re-implement the withDiff support using the iterator that the MVCCIncrementalIterator already has, that way we don't need to worry about iterator consistency.
  3. Change this to be a default-off option

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from 60ae339 to f0d5f9d Compare August 18, 2021 10:30
@stevendanna stevendanna requested a review from a team as a code owner August 18, 2021 10:30
Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 7 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)

@stevendanna
Copy link
Copy Markdown
Collaborator Author

stevendanna commented Aug 20, 2021

Would love it if we could get this PR in for the next release. Given that slow catchup-scans have caused a number of customer-facing issues and given that this is default off, I feel like we could even make an argument for it going in during stability.

Also, I've opened a PR for the withDiff support: #69191

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 7 files at r10, 4 of 7 files at r12, 1 of 1 files at r13, 3 of 3 files at r14.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @stevendanna)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 119 at r14 (raw file):

		if ok, err := i.Valid(); err != nil {
			return err
		} else if !ok || !i.UnsafeKey().Less(endKey) {

Iterators are required to enforce the upper bound, so this additional comparison is unnecessary,


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 149 at r14 (raw file):

			// If write is inline, it doesn't have a timestamp so we don't
			// filter on the registration's starting timestamp. Instead, we
			// return all inline writes.

We may not see inline meta values that have been updated since the last time this rangefeed was active, as the catchup scan using TBI could miss them. Is that acceptable?
Also, do we really want to return all inline meta, even if they have not changed?


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 169 at r14 (raw file):

		// or before the registration's (exclusive) starting timestamp.
		ignore := !(ts.IsEmpty() || catchupTimestamp.Less(ts))
		if ignore && !withDiff {

Can we move this if-block before we do the copying due to sameKey being false? If most of the keys we are encountering only have old versions, we would save many unnecessary key copies.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 178 at r14 (raw file):

		var val []byte
		a, val = a.Copy(unsafeVal, 0)
		if withDiff {

if there is no update to a key within the time interval of the catchup scan, there is no diff, yes? So we are unnecessarily copying the val just to pass it to addPrevToLastEvent which will ignore it because reorderBuf is empty?


pkg/storage/mvcc_incremental_iterator.go, line 120 at r14 (raw file):

	// MVCCIncrementalIterIntentPolicyEmit will return intents to
	// the caller if they are inside the time range. Intents
	// outside of the time range will be filtered without error.

In this emit case, which is what we are using in the catchup scan, it seems we will parse MVCCMetadata twice, in here, and in the caller. Is there a reason the MVCCIncrementatlIterator implementation needs to promise to filter in this case -- it could just return the intent to the caller and let it deal with it?
It is ok to leave this optimization as a TODO for the future.


pkg/storage/mvcc_incremental_iterator.go, line 527 at r14 (raw file):

		}

		// We have a valid KV.

or an intent to emit.


pkg/util/constants.go, line 119 at r14 (raw file):

func ConstantWithMetamorphicTestBool(name string, defaultValue bool) bool {
	if metamorphicBuild {
		if rng.Float64() < metamorphicValueProbability {

metamorphicValueProbability is 0.75, so it will usually return !defaultValue. I don't quite see why the comment says "occasionally".
And can't we just use ConstantWithMetamorphicTestValue(name, defaultValue, !defaultValue) instead of defining this function?

Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @stevendanna, and @sumeerbhola)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 149 at r14 (raw file):

We may not see inline meta values that have been updated since the last time this rangefeed was active, as the catchup scan using TBI could miss them. Is that acceptable?

This is a good point. In the non-TBI case, you can miss intermediate values (if the value was updated more than once), but the range feed would at least send you the most recent version.

From the perspective of changefeeds I think it is acceptable to miss inline values because we only handle user data which should all be MVCC and so we shouldn't be seeing inline values at all.

One short-term option would be to have a per-rangefeed setting that the caller uses to opt-into TBI. Then we could turn it on for changefeeds without turning it on for other users who might care about inline values (although it isn't clear to me that there are any, see below). I had this in an earlier iteration of this and could dig it back up. I think this would be my preference.

Also, do we really want to return all inline meta, even if they have not changed?

This was the existing behaviour which I've left. Outside of changefeeds I believe the other current users of rangefeeds are: Settings Watcher (following the settings table), Lease Manager (following the descriptor table), and the Stats Cache (following the statistics table). From my reading of the code, those spans should not have inline values in them.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @stevendanna)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 149 at r14 (raw file):

Previously, stevendanna (Steven Danna) wrote…

We may not see inline meta values that have been updated since the last time this rangefeed was active, as the catchup scan using TBI could miss them. Is that acceptable?

This is a good point. In the non-TBI case, you can miss intermediate values (if the value was updated more than once), but the range feed would at least send you the most recent version.

From the perspective of changefeeds I think it is acceptable to miss inline values because we only handle user data which should all be MVCC and so we shouldn't be seeing inline values at all.

One short-term option would be to have a per-rangefeed setting that the caller uses to opt-into TBI. Then we could turn it on for changefeeds without turning it on for other users who might care about inline values (although it isn't clear to me that there are any, see below). I had this in an earlier iteration of this and could dig it back up. I think this would be my preference.

Also, do we really want to return all inline meta, even if they have not changed?

This was the existing behaviour which I've left. Outside of changefeeds I believe the other current users of rangefeeds include are: Settings Watcher (following the settings table), Lease Manager (following the descriptor table), and the Stats Cache (following the statistics table). From my reading of the code, those spans should not have inline values in them.

Thanks for the details. I don't have much understanding of the various rangefeed use cases, so as long as folks who do understand agree with this reasoning, fine with me.

I do think it is worth specifying the difference in semantics of what the rangefeed provides in these 2 cases in a code comment somewhere.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @stevendanna, and @sumeerbhola)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 149 at r14 (raw file):

Previously, sumeerbhola wrote…

Thanks for the details. I don't have much understanding of the various rangefeed use cases, so as long as folks who do understand agree with this reasoning, fine with me.

I do think it is worth specifying the difference in semantics of what the rangefeed provides in these 2 cases in a code comment somewhere.

Today all spans watched are real, live SQL tables and never see inline values.

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from 0c2d082 to 63584c7 Compare August 25, 2021 10:52
Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @sumeerbhola)

a discussion (no related file):
TFTR!

I've left the handling of inline values as-is for right now. I've opened an issue (#69357) to look into whether we can remove support for them.



pkg/kv/kvserver/rangefeed/catchup_scan.go, line 119 at r14 (raw file):

Previously, sumeerbhola wrote…

Iterators are required to enforce the upper bound, so this additional comparison is unnecessary,

Done. I'm not sure why this was added originally, but currently it looks like the main use for this is that we have a test iterator implementation that doesn't take an upper bound and there is an existing test to make sure that keys are still not emitted in this case. I've updated the testIterator to respect an upperBound if set and removed this comparison. It appears to be a nice win.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 169 at r14 (raw file):

Previously, sumeerbhola wrote…

Can we move this if-block before we do the copying due to sameKey being false? If most of the keys we are encountering only have old versions, we would save many unnecessary key copies.

Done. See the second commit message for some benchmark results of this and the other avoided copy.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 178 at r14 (raw file):

Previously, sumeerbhola wrote…

if there is no update to a key within the time interval of the catchup scan, there is no diff, yes? So we are unnecessarily copying the val just to pass it to addPrevToLastEvent which will ignore it because reorderBuf is empty?

Done. See the second commit message for some benchmark results of this and the other avoided copy.


pkg/storage/mvcc_incremental_iterator.go, line 120 at r14 (raw file):

Previously, sumeerbhola wrote…

In this emit case, which is what we are using in the catchup scan, it seems we will parse MVCCMetadata twice, in here, and in the caller. Is there a reason the MVCCIncrementatlIterator implementation needs to promise to filter in this case -- it could just return the intent to the caller and let it deal with it?
It is ok to leave this optimization as a TODO for the future.

I've left this as a TODO. ONe thing I would want to double check and test here is that we always return the following key if the intent was returned.


pkg/storage/mvcc_incremental_iterator.go, line 527 at r14 (raw file):

Previously, sumeerbhola wrote…

or an intent to emit.

Done.


pkg/util/constants.go, line 119 at r14 (raw file):

metamorphicValueProbability is 0.75, so it will usually return !defaultValue. I don't quite see why the comment says "occasionally".

I've updated the comment and changed the probabilities for bool's to 0.50 which seems more appropriate.

And can't we just use ConstantWithMetamorphicTestValue(name, defaultValue, !defaultValue) instead of defining this function?

I don't think we can without changing the argument and return types of that function which I think would be annoying for most other users? There is no implicit conversion/cast between bool and int. Or have I misunderstood something here?

@stevendanna
Copy link
Copy Markdown
Collaborator Author

Anything else we would like to see here? I'm happy to keep plugging away at it.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r11, 1 of 7 files at r12, 3 of 3 files at r15, 5 of 5 files at r16.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy, @stevendanna, and @sumeerbhola)

a discussion (no related file):

Previously, stevendanna (Steven Danna) wrote…

TFTR!

I've left the handling of inline values as-is for right now. I've opened an issue (#69357) to look into whether we can remove support for them.

:lgtm:



pkg/kv/kvserver/rangefeed/catchup_scan.go, line 155 at r16 (raw file):

			// miss inline values completely and normal
			// iterators may result in the rangefeed not
			// seeing some intermeidate values.

intermediate


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 182 at r16 (raw file):

		lastKey.Timestamp = unsafeKey.Timestamp

Would help to have something like

// INVARIANT: !ignore || withDiff
// Cases
// - !ignore: we need to copy the unsafeVal to add to the reorderBuf, regardless of the value of withDiff
// - withDiff && ignore: we need to copy the unsafeVal only if there is already something in the reorderBuf for which we need to set the previous value.
//   The common case will be an empty reorderBuf.

pkg/kv/kvserver/rangefeed/catchup_scan.go, line 188 at r16 (raw file):

		}

		if withDiff {

Can you hoist this if-block into the block where we create the copy of unsafeVal. And same for the else-below which uses val. Then we can move the declaration val into the if-block itself and it will be obvious to the reader that the subsequent uses of val are working with an initialized val. Currently one needs to look at the various predicates to be sure of correctness.


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 43 at r16 (raw file):

		EndKey: endKey,
	}
	// fmt.Println(eng.GetMetrics().String())

stale debugging code


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 99 at r16 (raw file):

		// low and also return a read only engine to prevent
		// read-based compactions after the initial data
		// generation.

can you add something like the following to this comment:
// This case is trying to simulate a larger store, but with fewer bytes. If we did not reduce LBaseMaxBytes, almost all data would be in Lbase or L6, and TBI would be ineffective. By reducing LBaseMaxBytes, the data should spread out over more levels, like in a real store. The LSM state depicted below shows that this was only partially successful.


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 241 at r16 (raw file):

	if opts.randomKeyOrder {
		// Randomize the order in which the keys are written.

can you use rng.Shuffle?


pkg/util/constants.go, line 119 at r14 (raw file):

Or have I misunderstood something here?

never mind. I am not sure what I was thinking.

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from 63584c7 to ac2365f Compare August 29, 2021 20:49
Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @miretskiy, @stevendanna, and @sumeerbhola)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 155 at r16 (raw file):

Previously, sumeerbhola wrote…

intermediate

Done.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 182 at r16 (raw file):

Previously, sumeerbhola wrote…

Would help to have something like

// INVARIANT: !ignore || withDiff
// Cases
// - !ignore: we need to copy the unsafeVal to add to the reorderBuf, regardless of the value of withDiff
// - withDiff && ignore: we need to copy the unsafeVal only if there is already something in the reorderBuf for which we need to set the previous value.
//   The common case will be an empty reorderBuf.

Done.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 188 at r16 (raw file):

Previously, sumeerbhola wrote…

Can you hoist this if-block into the block where we create the copy of unsafeVal. And same for the else-below which uses val. Then we can move the declaration val into the if-block itself and it will be obvious to the reader that the subsequent uses of val are working with an initialized val. Currently one needs to look at the various predicates to be sure of correctness.

Done.


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 43 at r16 (raw file):

Previously, sumeerbhola wrote…

stale debugging code

Done.


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 99 at r16 (raw file):

Previously, sumeerbhola wrote…

can you add something like the following to this comment:
// This case is trying to simulate a larger store, but with fewer bytes. If we did not reduce LBaseMaxBytes, almost all data would be in Lbase or L6, and TBI would be ineffective. By reducing LBaseMaxBytes, the data should spread out over more levels, like in a real store. The LSM state depicted below shows that this was only partially successful.

Done.


pkg/kv/kvserver/rangefeed/catchup_scan_bench_test.go, line 241 at r16 (raw file):

Previously, sumeerbhola wrote…

can you use rng.Shuffle?

Looks like it. Done.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r17.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy, @stevendanna, and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great change to me. I didn't scrutinize the logic in (*CatchupIterator).CatchupScan, but I trust that it has had plenty of eyes on it.

Reviewed 1 of 7 files at r10, 3 of 7 files at r12, 1 of 1 files at r13, 3 of 3 files at r15, 4 of 5 files at r16, 4 of 4 files at r17.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy, @stevendanna, and @sumeerbhola)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 73 at r15 (raw file):

}

type outputEventFn func(e *roachpb.RangeFeedEvent) error

Could we give this a comment that mentions ownership over the provided *RangeFeedEvent. CatchupScan assumes that it retains ownership of the *RangeFeedEvent after the call, but it's not clear what the contract is for interior memory.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 47 at r17 (raw file):

	if useTBI && !args.WithDiff {
		ret.SimpleMVCCIterator = storage.NewMVCCIncrementalIterator(reader, storage.MVCCIncrementalIterOptions{
			EnableTimeBoundIteratorOptimization: true,

Out of curiosity, is the branching logic in this function equivalent to passing EnableTimeBoundIteratorOptimization: useTBI && !args.WithDiff?


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 49 at r17 (raw file):

			EnableTimeBoundIteratorOptimization: true,
			EndKey:                              args.Span.EndKey,
			// StartTime is exclusive

nit: Say a bit more here. StartTime is exclusive but args.Timestamp is inclusive.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 52 at r17 (raw file):

			StartTime:    args.Timestamp.Prev(),
			EndTime:      hlc.MaxTimestamp,
			IntentPolicy: storage.MVCCIncrementalIterIntentPolicyEmit,

I think it would be valuable to justify the use of both of these policies and explain why CatchupScan wants to see both intents and inline values.


pkg/storage/mvcc_incremental_iterator.go, line 343 at r17 (raw file):

// The method sets i.err with the error for future processing.
func (i *MVCCIncrementalIterator) initMetaAndCheckForIntentOrInlineError() error {
	i.meta.Reset()

Any reason for this movement? protoutil.Unmarshal already call Reset.


pkg/storage/mvcc_incremental_iterator.go, line 397 at r17 (raw file):

			return nil
		case MVCCIncrementalIterIntentPolicyEmit:
			// We will emit this intent to the caller

nit: punctuation.

Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy, @nvanbenschoten, @stevendanna, and @sumeerbhola)


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 73 at r15 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we give this a comment that mentions ownership over the provided *RangeFeedEvent. CatchupScan assumes that it retains ownership of the *RangeFeedEvent after the call, but it's not clear what the contract is for interior memory.

I've added a few words but as a TODO. I think we likely want to think more carefully about the buffer management we do in this function: #69357. Ideally the contract would be that outputEventFn assumes the caller owns the memory and that it may become invalid after outputEventFn returns. That way we we would have a clear point at which we could reset our buffer so that it doesn't grow unbounded throughout the catchup scan.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 47 at r17 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Out of curiosity, is the branching logic in this function equivalent to passing EnableTimeBoundIteratorOptimization: useTBI && !args.WithDiff?

Yes, I believe so, but my original goal (before the subsequent improvements to the CatchupScan code was to make it clear that the old code path was relatively unchanged when this option was not enabled.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 49 at r17 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Say a bit more here. StartTime is exclusive but args.Timestamp is inclusive.

Done.


pkg/kv/kvserver/rangefeed/catchup_scan.go, line 52 at r17 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think it would be valuable to justify the use of both of these policies and explain why CatchupScan wants to see both intents and inline values.

Done.


pkg/storage/mvcc_incremental_iterator.go, line 343 at r17 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Any reason for this movement? protoutil.Unmarshal already call Reset.

Ah, did not know Unmarshal called Reset. Moved back.


pkg/storage/mvcc_incremental_iterator.go, line 397 at r17 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: punctuation.

Done.

@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch 2 times, most recently from c93d769 to d892624 Compare August 31, 2021 10:31
We've been seeing periodic failures near the end of cdc/ledger
tests. During these failures, our changefeed's high-watermark falls
too far behind realtime.  We believe that this is the result of a
range-split that happens when the `entry` table reaches > 512 MiB. The
range split results in a catch up scan which takes considerable time.

We've seen similar behaviour in customer installations where we've
observed rangefeed catch up scan taking many minutes.

Using the IncrementalIterator, our cdc/ledger test survives the
range-split with no substantial increase in our changefeed latency.

The existing IncrementalIterator assumed that any intent or inline
values inside the range were errors. To fully support range-feeds, we
want to see both of these. To facilitate this, we add two new
configuration options to the IncrementalIterator, IntentPolicy and
InlinePolicy.

The code for the catch up scan has been extracted into its own method
to make it easier to benchmark. Below are some benchmark
results. Here, we see that in our best case, TBI improves catch up
scan performance substantially; however, there are some worst-cases
where it doesn't.  We've opted to make this a cluster option that is
disable by default for now -- allowing us to recommend enabling it for
customers where we think it will have a positive impact.

```
goos: darwin
goarch: amd64
pkg: github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkCatchupScan
BenchmarkCatchupScan/linear-keys
BenchmarkCatchupScan/linear-keys/useTBI=true
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16                       2         683491479 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16                      4         328828152 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16                      7         160052262 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16                     39          28729009 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16                     57          19237201 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16                      3         408734474 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16                     4         329713989 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16                     4         286746010 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16                     4         255228312 ns/op
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16                     4         259558576 ns/op
BenchmarkCatchupScan/random-keys
BenchmarkCatchupScan/random-keys/useTBI=true
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=0.00-16                       2         697607634 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=50.00-16                      2         577198422 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=75.00-16                      2         543513832 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=95.00-16                      3         483985331 ns/op
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/random-keys/useTBI=true/withDiff=false/perc=99.00-16                      3         456677720 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=0.00-16                      3         417899254 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=50.00-16                     3         347611040 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=75.00-16                     4         310572985 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=95.00-16                     4         264969646 ns/op
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/random-keys/useTBI=false/withDiff=false/perc=99.00-16                     4         300928994 ns/op
BenchmarkCatchupScan/mixed-case
BenchmarkCatchupScan/mixed-case/useTBI=true
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=0.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=0.00-16                        2         668092698 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=50.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=50.00-16                       2         556421222 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=75.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=75.00-16                       2         501923714 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=95.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=95.00-16                       6         187828123 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=99.00
BenchmarkCatchupScan/mixed-case/useTBI=true/withDiff=false/perc=99.00-16                       6         172837738 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=0.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=0.00-16                       3         414819631 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=50.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=50.00-16                      3         336780253 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=75.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=75.00-16                      3         344123744 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=95.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=95.00-16                      4         255228776 ns/op
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=99.00
BenchmarkCatchupScan/mixed-case/useTBI=false/withDiff=false/perc=99.00-16                      4         491268277 ns/op
PASS
ok      github.com/cockroachdb/cockroach/pkg/kv/kvserver/rangefeed      75.496s
```

When using the incremental iterator, the previous value of a key might
be outside the time window of the iterator. A future PR will extend
the incremental iterator to handle this case; but here, we've disabled
the optimization if withDiff is specified for the rangefeed.

Release note (enterprise change): Improve the performance for
CHANGEFEEDs during some range-split operations.

Release justification: Performance improvement for catchup scans helps
address the cause of recent high-priority customer incidents.
This avoids two copies in the catchup-scan code.  In the best case,
this provides a nice win even without TBI enabled:

```
name                                                               old time/op    new time/op    delta
CatchupScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16      316ms ± 1%     261ms ± 2%  -17.17%  (p=0.000 n=23+24)
CatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16     257ms ± 2%     226ms ± 3%  -12.05%  (p=0.000 n=24+23)

name                                                               old alloc/op   new alloc/op   delta
CatchupScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16     79.0MB ± 0%    10.4MB ± 0%  -86.89%  (p=0.000 n=24+25)
CatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16    10.4MB ± 0%     2.5MB ± 0%  -76.03%  (p=0.000 n=20+25)

name                                                               old allocs/op  new allocs/op  delta
CatchupScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16      35.8k ± 0%     31.4k ± 0%  -12.20%  (p=0.000 n=22+25)
CatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16     31.4k ± 0%     30.8k ± 0%   -2.04%  (p=0.000 n=25+25)
```

Release justification: Performance improvement for catchup scans helps
address the cause of recent high-priority customer incidents.

Release note: None
Iterators already respect the upper bound we set at construction time
making this comparison redundant. Removing this comparison provides a
nice performance win:

```
name                                                               old time/op    new time/op    delta
CatchupScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16      261ms ± 2%     249ms ± 4%  -4.90%  (p=0.000 n=24+25)
CatchupScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16     226ms ± 3%     214ms ± 4%  -5.55%  (p=0.000 n=23+25)
```

Release note: None

Release justification: Performance improvement for catchup scans helps
address the cause of recent high-priority customer incidents.
@stevendanna stevendanna force-pushed the tbi-for-catchup-scan branch from d892624 to 0417041 Compare August 31, 2021 14:32
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r18.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @miretskiy, @stevendanna, and @sumeerbhola)

@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=[nvanbenschoten,sumeerbhola]

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 1, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants