changefeedccl: change default flush interval to 5s by dt · Pull Request #49770 · cockroachdb/cockroach

dt · 2020-06-01T20:11:35Z

We observed a customer cluster's changefeeds to cloud storage 'getting stuck'
which on further investigation was determined to be happening because they
were spending too much time in flushing. This was because they were writing to
a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5)
meant it spent all of its time flushing. This default was picked testing with
lower-latency sinks and was noted in a comment as somewhat arbitrary.

This change does two things: it increases the default to the poller interval
if unspecified, instead of poller interval / 5, meaning 1s instead of 200ms
at the default setting, and if the sink being used is cloud storage, it
changes it to the greater of that or 5s. Users who truely desire lower latency
can of course specify their own 'resolved' interval, so this change in the
default is for those that are indifferent, and increasing the latency to 1s or 5s
reduces the chance of hiitting this unfortunate edge case when the sink is too slow.

Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 1s instead of 200ms, or 5s if the changefeed sink is cloud-storage.

cockroach-teamcity · 2020-06-01T20:11:41Z

This change is

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)

pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

		} else {
			timeBetweenFlushes = changefeedbase.TableDescriptorPollInterval.Get(&settings.SV)
			if _, ok := sink.(*cloudStorageSink); ok && timeBetweenFlushes < cloudStorageDefaultFlush {

Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:

https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/changefeed_processors.go#L187-L188

We observed a customer cluster's changefeeds to cloud storage 'getting stuck' which on further investigation was determined to be happening because they were spending too much time in flushing. This was because they were writing to a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5) meant it spent all of its time flushing. This default was picked testing with lower-latency sinks and was noted in a comment as somewhat arbitrary. This change increases the default to 5s. Users who truely desire lower latency can of course specify their own 'resolved' interval, so this change in the default is for those that are indifferent, and increasing the latency to 5s reduces the chance of hiitting this unfortunate edge case when the sink is too slow. Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 5s instead of 200ms to more gracefully handle higher-latency sinks.

miretskiy

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, ajwerner wrote…

Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:

https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/changefeed_processors.go#L187-L188

What do we think about extending Sink api somewhat?
I'm thinking something like:

type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}

And then adding to Sink interface:
GetTraits() SinkTraits?

No need to assertions.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

What do we think about extending Sink api somewhat?
I'm thinking something like:

type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}

And then adding to Sink interface:
GetTraits() SinkTraits?

No need to assertions.

I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.

miretskiy

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)

pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.

5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).

Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).

Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?

yeah, we're just postponing it until we make improvements so we don't try to flush again when we're still behind.

miretskiy

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

dt · 2020-06-04T18:22:14Z

bors r+

craig · 2020-06-04T18:26:23Z

Merge conflict (retrying...)

craig · 2020-06-04T19:11:37Z

Build succeeded

GitHub CI (Cockroach)

ajwerner · 2020-06-10T14:50:36Z

@dt should we backport this?

dt requested review from a team, ajwerner and miretskiy and removed request for a team June 1, 2020 20:11

ajwerner reviewed Jun 1, 2020

View reviewed changes

dt force-pushed the cdc-resolved-default branch from e5b9972 to ddb5507 Compare June 1, 2020 21:30

dt force-pushed the cdc-resolved-default branch from ddb5507 to 848abda Compare June 2, 2020 18:06

miretskiy requested review from ajwerner and miretskiy June 2, 2020 18:26

miretskiy reviewed Jun 2, 2020

View reviewed changes

dt commented Jun 2, 2020

View reviewed changes

miretskiy reviewed Jun 2, 2020

View reviewed changes

miretskiy self-requested a review June 2, 2020 19:11

dt commented Jun 2, 2020

View reviewed changes

miretskiy self-requested a review June 2, 2020 21:15

miretskiy approved these changes Jun 2, 2020

View reviewed changes

craig bot merged commit 994d306 into cockroachdb:master Jun 4, 2020

ajwerner mentioned this pull request Jun 5, 2020

changefeedccl: cope better with slow sinks and high flush rates #49720

Closed

dt mentioned this pull request Jun 16, 2020

release-20.1: changefeedccl: change default flush interval to 5s #50251

Merged

dt deleted the cdc-resolved-default branch July 12, 2020 13:37

jseldess mentioned this pull request Jul 30, 2020

changefeedccl: change default flush interval to 5s cockroachdb/docs#7829

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: change default flush interval to 5s#49770

changefeedccl: change default flush interval to 5s#49770
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:cdc-resolved-default

dt commented Jun 1, 2020

Uh oh!

cockroach-teamcity commented Jun 1, 2020

Uh oh!

ajwerner left a comment

Uh oh!

miretskiy left a comment

Uh oh!

dt left a comment

Uh oh!

miretskiy left a comment

Uh oh!

dt left a comment

Uh oh!

miretskiy left a comment

Uh oh!

dt commented Jun 4, 2020

Uh oh!

craig bot commented Jun 4, 2020

Uh oh!

craig bot commented Jun 4, 2020

Uh oh!

ajwerner commented Jun 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dt commented Jun 1, 2020

Uh oh!

cockroach-teamcity commented Jun 1, 2020

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

miretskiy left a comment

Choose a reason for hiding this comment

Uh oh!

dt left a comment

Choose a reason for hiding this comment

Uh oh!

miretskiy left a comment

Choose a reason for hiding this comment

Uh oh!

dt left a comment

Choose a reason for hiding this comment

Uh oh!

miretskiy left a comment

Choose a reason for hiding this comment

Uh oh!

dt commented Jun 4, 2020

Uh oh!

craig bot commented Jun 4, 2020

Merge conflict (retrying...)

Uh oh!

craig bot commented Jun 4, 2020

Build succeeded

Uh oh!

ajwerner commented Jun 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants