-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver,cdc: high rangefeed count leads to scheduler overload #96395
Description
Describe the problem
In recent incidents we've observed a high number of rangefeeds resulting in CPU scheduler overload, causing excessive scheduling latencies. These scheduling latencies in turn contribute to high foreground latencies. All this can happen even with CPU % being far from saturation (<25%). The underlying cause for all of this is the various O(ranges) goroutines we have on both the client and server side; after using MuxRangefeed, we have:
Client side:
a. 1 per range in singleRangeFeed
kvcoord dist_sender_rangefeed.go:516 (*DistSender).singleRangeFeed.func3()
kvcoord dist_sender_rangefeed_canceler.go:100 (*stuckRangeFeedCanceler).do(*, *)
kvcoord dist_sender_rangefeed.go:515 (*DistSender).singleRangeFeed(#1050, {#184, *}, {{*, *, *}, {*, *, *}}, {*, ...}, ...)
kvcoord dist_sender_rangefeed.go:336 (*DistSender).partialRangeFeed(#1050, {#184, *}, *, {{*, *, *}, {*, *, *}}, ...)
kvcoord dist_sender_rangefeed.go:120 (*DistSender).RangeFeed.func1.1({#184, *})
ctxgroup ctxgroup.go:169 Group.GoCtx.func1()
errgroup errgroup.go:57 (*Group).Go.func1()
Server side:
b. 1 per range blocked in RangeFeed
kvserver replica_rangefeed.go:254 (*Replica).rangeFeedWithRangeID(*, #505, *, {#43378, *})
kvserver replica_rangefeed.go:150 (*Replica).RangeFeed(...)
kvserver store.go:3082 (*Store).RangeFeed(#1543, *, {#505, *})
kvserver stores.go:216 (*Stores).RangeFeed(*, *, {#505, *})
server node.go:1227 (*Node).RangeFeed(#505, *, {#505, *})
rpc context.go:753 internalClientAdapter.RangeFeed.func1()
c. 1 per range in the processor
rangefeed processor.go:269 (*Processor).run(*, {#5, *}, *, *, #1177)
rangefeed processor.go:222 (*Processor).Start.func1({#5, *})
stop stopper.go:494 (*Stopper).RunAsyncTaskEx.func2()
d. 1 per range in the registration output loop
rangefeed registry.go:310 (*registration).outputLoop(*, {#485, *})
rangefeed registry.go:335 (*registration).runOutputLoop(*, {#5, *}, *)
rangefeed processor.go:297 (*Processor).run.func1({#5, *})
stop stopper.go:494 (*Stopper).RunAsyncTaskEx.func2()
Known incidents:
- https://github.com/cockroachlabs/support/issues/2036
- https://github.com/cockroachlabs/support/issues/1997
To understand, consider these metrics from an actual incident.
It ends up looking something like:
In these incidents, we also see in AC kicking in, but helplessly since it's unable to keep runnable g's stable under it's configured threshold. Consider the KV-Queue on n113 is firing during this period for example, and it observing wild swings between used and total slots (where used >> total at some points), so there's a lot of spurious goroutine wakeups. p75 AC delay is quite high (1s+), across all CPU queues. Dividing the delay rate of n113's KV queue by it's admission rate, we get a mean latency of 277.78ms per request, which is very high.
To Reproduce
Run a changefeed over a table with many underlying ranges. We should start with adding a roachtest reproducing this goroutine count induced latency hit; it's tracked here: #95236.
Describe alternatives you've considered
- Reducing the frequency of these goroutine wakeups has helped in incidents, for ex. by increasing
kv.rangefeed.closed_timestamp_refresh_intervalto be 2s instead of 200ms (read fromkv.closed_timestamp.side_transport_interval). - Reducing the number of ranges over which we need to establish rangefeeds. Depending on the usecases, if we could either leverage the predicate pushdown we've introduced in https://www.cockroachlabs.com/docs/stable/cdc-transformations.html for primary keys, or add smarts to also consider secondary indexes.
- @miretskiy has a prototype where he shuts down quiescent ranges for some period of time, reducing not only client goroutines but also the server-side ones.
Jira issue: CRDB-24101
Epic CRDB-26372



