Skip to content

changefeedccl: change default flush interval to 5s#49770

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:cdc-resolved-default
Jun 4, 2020
Merged

changefeedccl: change default flush interval to 5s#49770
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:cdc-resolved-default

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Jun 1, 2020

We observed a customer cluster's changefeeds to cloud storage 'getting stuck'
which on further investigation was determined to be happening because they
were spending too much time in flushing. This was because they were writing to
a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5)
meant it spent all of its time flushing. This default was picked testing with
lower-latency sinks and was noted in a comment as somewhat arbitrary.

This change does two things: it increases the default to the poller interval
if unspecified, instead of poller interval / 5, meaning 1s instead of 200ms
at the default setting, and if the sink being used is cloud storage, it
changes it to the greater of that or 5s. Users who truely desire lower latency
can of course specify their own 'resolved' interval, so this change in the
default is for those that are indifferent, and increasing the latency to 1s or 5s
reduces the chance of hiitting this unfortunate edge case when the sink is too slow.

Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 1s instead of 200ms, or 5s if the changefeed sink is cloud-storage.

@dt dt requested review from a team, ajwerner and miretskiy and removed request for a team June 1, 2020 20:11
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)


pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

		} else {
			timeBetweenFlushes = changefeedbase.TableDescriptorPollInterval.Get(&settings.SV)
			if _, ok := sink.(*cloudStorageSink); ok && timeBetweenFlushes < cloudStorageDefaultFlush {

Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:

https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/changefeed_processors.go#L187-L188

@dt dt force-pushed the cdc-resolved-default branch from e5b9972 to ddb5507 Compare June 1, 2020 21:30
We observed a customer cluster's changefeeds to cloud storage 'getting stuck'
which on further investigation was determined to be happening because they
were spending too much time in flushing. This was because they were writing to
a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5)
meant it spent all of its time flushing. This default was picked testing with
lower-latency sinks and was noted in a comment as somewhat arbitrary.

This change increases the default to  5s. Users who truely desire lower latency
can of course specify their own 'resolved' interval, so this change in the
default is for those that are indifferent, and increasing the latency to 5s
reduces the chance of hiitting this unfortunate edge case when the sink is too slow.

Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 5s instead of 200ms to more gracefully handle higher-latency sinks.
@dt dt force-pushed the cdc-resolved-default branch from ddb5507 to 848abda Compare June 2, 2020 18:06
@miretskiy miretskiy requested review from ajwerner and miretskiy June 2, 2020 18:26
Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)


pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, ajwerner wrote…

Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:

https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/changefeedccl/changefeed_processors.go#L187-L188

What do we think about extending Sink api somewhat?
I'm thinking something like:

type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}

And then adding to Sink interface:
GetTraits() SinkTraits?

No need to assertions.

Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)


pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

What do we think about extending Sink api somewhat?
I'm thinking something like:

type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}

And then adding to Sink interface:
GetTraits() SinkTraits?

No need to assertions.

I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.

Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)


pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.

5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).

Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?

@miretskiy miretskiy self-requested a review June 2, 2020 19:11
Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)


pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).

Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?

yeah, we're just postponing it until we make improvements so we don't try to flush again when we're still behind.

@miretskiy miretskiy self-requested a review June 2, 2020 21:15
Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Jun 4, 2020

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 4, 2020

Merge conflict (retrying...)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 4, 2020

Build succeeded

@craig craig bot merged commit 994d306 into cockroachdb:master Jun 4, 2020
@ajwerner
Copy link
Copy Markdown
Contributor

@dt should we backport this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants