changefeedccl: change default flush interval to 5s#49770
changefeedccl: change default flush interval to 5s#49770craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)
pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):
} else { timeBetweenFlushes = changefeedbase.TableDescriptorPollInterval.Get(&settings.SV) if _, ok := sink.(*cloudStorageSink); ok && timeBetweenFlushes < cloudStorageDefaultFlush {
Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:
We observed a customer cluster's changefeeds to cloud storage 'getting stuck' which on further investigation was determined to be happening because they were spending too much time in flushing. This was because they were writing to a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5) meant it spent all of its time flushing. This default was picked testing with lower-latency sinks and was noted in a comment as somewhat arbitrary. This change increases the default to 5s. Users who truely desire lower latency can of course specify their own 'resolved' interval, so this change in the default is for those that are indifferent, and increasing the latency to 5s reduces the chance of hiitting this unfortunate edge case when the sink is too slow. Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 5s instead of 200ms to more gracefully handle higher-latency sinks.
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):
Previously, ajwerner wrote…
Does this type assertion work? I thought we wrap sink the sink a couple of times before it gets here. See:
What do we think about extending Sink api somewhat?
I'm thinking something like:
type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}
And then adding to Sink interface:
GetTraits() SinkTraits?
No need to assertions.
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
What do we think about extending Sink api somewhat?
I'm thinking something like:type SinkTraits struct {
defaultFlushPeriod ...
whatever else we want to add
}And then adding to Sink interface:
GetTraits() SinkTraits?No need to assertions.
I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @miretskiy)
pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):
Previously, dt (David Taylor) wrote…
I removed the assertion and just made the default 5s rather than trying to be fancy. Easier to doc that way too.
5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).
Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
pkg/ccl/changefeedccl/changefeed.go, line 345 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
5s is probably long enough... However, 5s is also rather arbitrary; I wonder if we are just postponing this issue until the next time we hit it (when e.g. cross DC connectivity is down for long periods of time).
Can we produce better error messages perhaps? "Flush took X > flush period" Or some such?
yeah, we're just postponing it until we make improvements so we don't try to flush again when we're still behind.
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
|
bors r+ |
Merge conflict (retrying...) |
Build succeeded |
|
@dt should we backport this? |
We observed a customer cluster's changefeeds to cloud storage 'getting stuck'
which on further investigation was determined to be happening because they
were spending too much time in flushing. This was because they were writing to
a cloud sink and the default flush interval of 200ms (poller interval of 1s / 5)
meant it spent all of its time flushing. This default was picked testing with
lower-latency sinks and was noted in a comment as somewhat arbitrary.
This change does two things: it increases the default to the poller interval
if unspecified, instead of poller interval / 5, meaning 1s instead of 200ms
at the default setting, and if the sink being used is cloud storage, it
changes it to the greater of that or 5s. Users who truely desire lower latency
can of course specify their own 'resolved' interval, so this change in the
default is for those that are indifferent, and increasing the latency to 1s or 5s
reduces the chance of hiitting this unfortunate edge case when the sink is too slow.
Release note (enterprise change): The default flush interval for changefeeds that do not specify a 'resolved' option is now 1s instead of 200ms, or 5s if the changefeed sink is cloud-storage.