-
Notifications
You must be signed in to change notification settings - Fork 4.1k
rangefeed: dropping events (in catch up scan?) #44878
Description
Found with kvnemesis.
Repro: Stress #44769 with roachprod-stress. This can take a while (1h+ sometimes)
make roachprod-stress PKG=./pkg/kv/kvnemesis TESTS=TestKVNemesisMultiNode TESTTIMEOUT=60s TESTFLAGS='-v -show-logs' CLUSTER=dan-stress
A bunch of previous repros with increasing amounts of debug logging are documented in #44769
The issue is that a write of a committed txn appears in kv but doesn't come out of RangeFeed before the resolved timestamp is bumped. In the following repro, follow the "cdfa3544" key. This key is written 3 times, none of which come out of RangeFeed. Other repros have been on keys that have only been written once, so it's likely not interesting that this one was overwritten.
First, the range splits and rangefeeds disconnect and reconnect on the two halves. Each half gets a resolved timestamp of 0.0 and then a real resolved timestamp:
I200207 19:37:57.980265 5594 storage/replica_command.go:396 [n2,s2,r30/2:/{Table/50/"50…-Max}] initiating a split of this range at key /Table/50/"5b41bf37" [r33] (manual)
...
I200207 19:37:57.991045 5104 storage/rangefeed/registry.go:539 remove /Table/5{0-1} 1581104277.875275044,0 retry rangefeed (REASON_RANGE_SPLIT)
...
I200207 19:37:57.993231 5796 storage/rangefeed/registry.go:453 register /Table/50{-/"50060c6d"} 1581104277.947404728,0
I200207 19:37:57.993275 5796 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50-/Table/50/"50060c6d" event=/Table/50-/Table/50/"50060c6d" ts=0,0
...
I200207 19:37:57.993406 5796 storage/rangefeed/registry.go:483 PublishToOverlapping event=/Min-/Max ts=1581104277.947404728,0
I200207 19:37:57.993463 5796 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50-/Table/50/"50060c6d" event=/Table/50-/Table/50/"50060c6d" ts=1581104277.947404728,0
...
I200207 19:37:57.997228 5761 storage/rangefeed/registry.go:453 register /Table/5{0/"50060c6d"-1} 1581104277.947404728,0
I200207 19:37:57.997267 5761 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50/"50060c6d"-/Table/51 event=/Table/50/"50060c6d"-/Table/51 ts=0,0
...
I200207 19:37:57.997454 5761 storage/rangefeed/registry.go:483 PublishToOverlapping event=/Min-/Max ts=1581104277.947404728,0
I200207 19:37:57.997515 5761 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50/"50060c6d"-/Table/51 event=/Table/50/"50060c6d"-/Table/51 ts=1581104277.947404728,0
The key is written at time 1581104277.953233825,0. The WriteIntent and CommitIntent is processed by 2/3 replicas, presumably the ones the rangefeed is not connected to.
I200207 19:37:58.014006 1072 storage/replica_rangefeed.go:506 [n2,s2,r33/2:/{Table/50/"5b…-Max}] WIP MVCCWriteIntentOp /Table/50/"cdfa3544" 1581104277.953233825,0 1581104277.953233825,0 1978ce52-d8db-480a-885b-529eab65460d
...
I200207 19:37:58.014461 1363 storage/replica_rangefeed.go:506 [n3,s3,r33/3:/{Table/50/"5b…-Max}] WIP MVCCWriteIntentOp /Table/50/"cdfa3544" 1581104277.953233825,0 1581104277.953233825,0 1978ce52-d8db-480a-885b-529eab65460d
...
I200207 19:37:58.018226 1032 storage/replica_rangefeed.go:506 [n2,s2,r33/2:/{Table/50/"5b…-Max}] WIP MVCCCommitIntentOp /Table/50/"cdfa3544":1581104277.953233825,0 -> /<empty> 1978ce52-d8db-480a-885b-529eab65460d
...
I200207 19:37:58.018934 1392 storage/replica_rangefeed.go:506 [n3,s3,r33/3:/{Table/50/"5b…-Max}] WIP MVCCCommitIntentOp /Table/50/"cdfa3544":1581104277.953233825,0 -> /<empty> 1978ce52-d8db-480a-885b-529eab65460d
The key is written to a few more times. Each time 2/3 replicas process a CommitIntent and WriteIntent, but nothing is emitted from the relevant rangefeed {Table/50/"5b…-Max} for a very long time. Eventually this range splits at "5b41bf37", so our key is still on the rhs, now range /Table/5{0/"5b41bf37"-1}. This is redialed at the same 1581104277.947404728,0 timestamp it was last time (so still correctly before our write). Some writes come out of the rangefeed (past the timestamp our key was first written at, unclear if this is interesting).
I200207 19:37:59.087320 7382 kv/dist_sender_rangefeed.go:151 [n1] RangeFeed dial /Table/5{0/"5b41bf37"-1} 1581104277.947404728,0
I200207 19:37:59.088383 7384 storage/rangefeed/registry.go:453 register /Table/5{0/"5b41bf37"-1} 1581104277.947404728,0
I200207 19:37:59.088425 7384 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50/"5b41bf37"-/Table/51 event=/Table/50/"5b41bf37"-/Table/51 ts=0,0
I200207 19:37:59.088505 7382 kv/dist_sender_rangefeed.go:277 [n1] RangeFeedCheckpoint req=/Table/50/"5b41bf37"-/Table/51 event=/Table/50/"5b41bf37"-/Table/51 ts=0,0
I200207 19:37:59.088532 5099 kv/kvnemesis/watcher.go:162 rangefeed Put /Table/50/"ad44127e" 1581104277.967639047,0 -> /BYTES/v-4 (prev /<empty>)
I200207 19:37:59.088554 5099 kv/kvnemesis/watcher.go:162 rangefeed Put /Table/50/"ff815392" 1581104277.962125945,0 -> /BYTES/v-3 (prev /<empty>)
Very shortly afterward (2 lines in the logs) a resolved timestamp is emitted by this range that is past our write, but the write hasn't come out. Incorrect!
I200207 19:37:59.088671 7384 storage/rangefeed/registry.go:199 maybeStripEvent req=/Table/50/"5b41bf37"-/Table/51 event=/Table/50/"5b41bf37"-/Table/51 ts=1581104278.161414786,0
I200207 19:37:59.088699 7382 kv/dist_sender_rangefeed.go:277 [n1] RangeFeedCheckpoint req=/Table/50/"5b41bf37"-/Table/51 event=/Table/50/"5b41bf37"-/Table/51 ts=1581104278.161414786,0