Skip to content

kvserver: default kv.rangefeed.catchup_scan_iterator_optimization.enabled to true#73473

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/flip-tbi-setting
Dec 14, 2021
Merged

kvserver: default kv.rangefeed.catchup_scan_iterator_optimization.enabled to true#73473
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/flip-tbi-setting

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

We've been testing this at random for some time and, as far as I know,
have not seen any CDC failures caused by this optimization.

This switches the default value to true. Further, I've lowered the
probability that this boolean is flipped in the metamorphic builds
such that the default case gets test more often.

Release note (ops change): The default value of the
kv.rangefeed.catchup_scan_iterator_optimization.enabled cluster
setting is now true.

@stevendanna stevendanna requested a review from a team as a code owner December 4, 2021 11:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches the default value to true. Further, I've lowered the
probability that this boolean is flipped in the metamorphic builds
such that the default case gets test more often.

Since the default case is also implicitly tested in lots of other integration/end-to-end tests, does it really make sense to reduce the probability of non-standard settings in the few tests that actually exercise them?

…bled to true

We've been testing this at random for some time and, as far as I know,
have not seen any CDC failures caused by this optimization.

This switches the default value to true. Further, I've lowered the
probability that this boolean is flipped in the metamorphic builds
such that the default case gets test more often.

Release note (ops change): The default value of the
kv.rangefeed.catchup_scan_iterator_optimization.enabled cluster
setting is now true.
@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 14, 2021

Build succeeded:

@shermanCRL
Copy link
Copy Markdown
Contributor

I like the idea of defaulting this value to true in a v21.2 patch. Reason being upgrades from v21.1 → v21.2 in the presence of changefeeds. The IOPS spike that comes from rolling upgrade + node cycling + catchup scans can be very disruptive across the cluster, and discourage upgrades.

If the value is defaulted to true, I expect these disruptions to be reduced.

Mechanically, it would just be a backport of this PR, I suppose. Thoughts? @stevendanna @erikgrinaker

@shermanCRL
Copy link
Copy Markdown
Contributor

And while we’re at it, ditto for kv.rangefeed.separated_intent_scan.enabled.

@erikgrinaker
Copy link
Copy Markdown
Contributor

The backport policy doesn't explicitly mention changes to default settings, so it's a bit of a grey area.

I think we've gained enough confidence in separated intents that it's safe to enable kv.rangefeed.separated_intent_scan.enabled, at least. To put it another way, if we didn't have a cluster setting for it, I wouldn't hesitate to backport a change to only scan separated intents.

I'm less confident in kv.rangefeed.catchup_scan_iterator_optimization.enabled, but only because I know time-bound iterators have been problematic in the past. I haven't experienced any issues with them personally. Perhaps @stevendanna has thoughts on this.

@erikgrinaker
Copy link
Copy Markdown
Contributor

PS: we'd have to make sure these new default settings respect the relevant version gates, if they don't already, to avoid breakage in mixed-version 21.1/21.2 clusters.

@stevendanna
Copy link
Copy Markdown
Collaborator Author

I think we should definitely change the default of kv.rangefeed.separated_intent_scan.enabled. I see very little downside.

I'm less confident in kv.rangefeed.catchup_scan_iterator_optimization.enabled, but only because I know time-bound iterators have been problematic in the past. I haven't experienced any issues with them personally. Perhaps @stevendanna has thoughts on this.

We've had it default true on master for some time and rangefeeds have seen increased usage on master over this release cycle. Looking at the git history, I haven't seen any correctness changes to mvcc_incremental_iterator in the meantime either.

Both will be shipping by default in 22.1 and at this point we've have these settings turned on in our largest installations. So, while catchup_scan_iterator_optimization is riskier than separated_intent_scan, I would have very little hesitation in turning both on by default in 21.2 if we think it'll cut down on problems for our users.

PS: we'd have to make sure these new default settings respect the relevant version gates, if they don't already, to avoid breakage in mixed-version 21.1/21.2 clusters.

We respect a cluster setting for the intent scanner. I don't think there are any cluster settings we need to worry about for catchup-scans, but if I'm wrong there let me know.

onlySeparatedIntents := r.store.cfg.Settings.Version.IsActive(ctx, clusterversion.PostSeparatedIntentsMigration)
useSeparatedIntents := RangefeedSeparatedIntentScanEnabled.Get(&r.store.cfg.Settings.SV)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants