kvcoord: Correctly handle stuck rangefeeds#92582
kvcoord: Correctly handle stuck rangefeeds#92582craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
| // ping notifies the canceler that the rangefeed has received an | ||
| // event, i.e. is making progress. | ||
| func (w *stuckRangeFeedCanceler) ping() { | ||
| func (w *stuckRangeFeedCanceler) do(fn func() error) error { |
There was a problem hiding this comment.
The way this is currently structured, we don't need to return error from either of these functions. The closure can close over err and we can check it outside of do().
There are a couple of other approaches we can consider:
-
Have
do()also do error handling, returningerrRestartStuckRangewhen the rangefeed is stuck. -
Return
trueif the rangefeed is stuck when the function returns.
These would allow us to get rid of the stuck() method and simplify the caller logic. I think I have a preference for returning errRestartStuckRange and encapsulating as much of the logic as possible inside do(), but it's not a strongly held view. The current approach is also fine.
There was a problem hiding this comment.
Ack -- I was thinking of doing that. Still, I think returning an error is better -- I prefer not to close over errors.
There was a problem hiding this comment.
Not sure I can get rid of stuck() -- we call it in few places , and I think it just reads better when error handling is explicit.
There was a problem hiding this comment.
Ok. I'm not sure why we bother with returning errStuckRangefeed from do() in that case, since none of the callers use it anyway. You call though, nbd.
There was a problem hiding this comment.
Reverted return of errStuckRange.
miretskiy
left a comment
There was a problem hiding this comment.
not sure if I can get rid of stuck() -- but other comments addressed.
erikgrinaker
left a comment
There was a problem hiding this comment.
Just some minor comments, the core logic seems fine. Preemptively approving once addressed.
We should also have a test, either for the DistSender or as a CDC roachtest, which makes sure a stalled sink won't cause undue restarts.
| // ping notifies the canceler that the rangefeed has received an | ||
| // event, i.e. is making progress. | ||
| func (w *stuckRangeFeedCanceler) ping() { | ||
| func (w *stuckRangeFeedCanceler) do(fn func() error) error { |
There was a problem hiding this comment.
Ok. I'm not sure why we bother with returning errStuckRangefeed from do() in that case, since none of the callers use it anyway. You call though, nbd.
Added canceller test. |
Fixes cockroachdb#92570 A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow. Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds.
|
bors r+ |
1 similar comment
|
bors r+ |
|
Already running a review |
|
Build succeeded: |
Fixes #92570
A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow.
Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds.