Skip to content

rangefeed: mux rangefeeds failing in metamorphic test builds #100783

@AlexTalks

Description

@AlexTalks

During investigation of #99207, it was discovered that whenever mux range feeds were enabled in metamorphic test runs, the test would fail, the kvserver test TestReplicateQueueDecommissioningNonVoters/remove would fail, observing the following messages in the logs:

I230405 23:35:41.812451 205024 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:212 ⋮ [T1,n1,rangefeed=‹spanconfig-subscriber›] 308544  MuxRangeFeed starting for range ‹/Table/47/{1-2}›@1680737707.726498812,0 (rangeID 49, attempt 0)
I230405 23:35:41.812503 205378 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:212 ⋮ [T1,n1,rangefeed=‹tenant-settings-watcher›] 308546  MuxRangeFeed starting for range ‹/Table/5{0-1}›@1680737700.374403140,0 (rangeID 51, attempt 0)
I230405 23:35:41.812503 205363 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:212 ⋮ [T1,n1,job=‹AUTO SPAN CONFIG RECONCILIATION id=854154456477892609›,rangefeed=‹sql-watcher-protected-ts-records-rangefeed›] 308547  MuxRangeFeed starting for range ‹/Table/3{2-3}›@1680737704.854354224,0 (rangeID 34, attempt 0)
I230405 23:35:41.812527 205359 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:471 ⋮ [T1,n3,rangefeed=‹tenant-settings-watcher›] 308548  RangeFeed ‹/Table/5{0-1}›@1680737734.040746992,0 disconnected with last checkpoint 466871h35m41.81252653s ago: r51 was not found on s3
I230405 23:35:41.812551 204734 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:471 ⋮ [T1,n1,rangefeed=‹spanconfig-subscriber›] 308549  RangeFeed ‹/Table/47/{1-2}›@1680737707.726498812,0 disconnected with last checkpoint 466871h35m41.81255031s ago: r49 was not found on s1
I230405 23:35:41.812574 205379 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:471 ⋮ [T1,n1,rangefeed=‹tenant-settings-watcher›] 308550  RangeFeed ‹/Table/5{0-1}›@1680737700.374403140,0 disconnected with last checkpoint 466871h35m41.812573013s ago: r51 was not found on s1
I230405 23:35:41.812574 205364 kv/kvclient/kvcoord/dist_sender_mux_rangefeed.go:471 ⋮ [T1,n1,job=‹AUTO SPAN CONFIG RECONCILIATION id=854154456477892609›,rangefeed=‹sql-watcher-protected-ts-records-rangefeed›] 308551  RangeFeed ‹/Table/3{2-3}›@1680737704.854354224,0 disconnected with last checkpoint 466871h35m41.812572923s ago: r34 was not found on s1

It seems that when the metamorphic test enabled mux rangefeeds, there were frequent errors likely due to the failure to apply the updated span config received from the rangefeed. This error was seen as follows:

=== RUN   TestReplicateQueueDecommissioningNonVoters/remove
    replicate_queue_test.go:793: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/replicate_queue_test.go:793
        	Error:      	Condition never satisfied
        	Test:       	TestReplicateQueueDecommissioningNonVoters/remove
    --- FAIL: TestReplicateQueueDecommissioningNonVoters/remove (68.62s)

This ticket is to resolve these errors, as well as to reenable TestReplicateQueueDecommissioningNonVoters (and/or any others) under metamorphic builds.

Jira issue: CRDB-26619

Epic CRDB-28879

Metadata

Metadata

Assignees

Labels

A-cdcChange Data CaptureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-test-failureBroken test (automatically or manually discovered).T-cdc

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions