kvserver: unbounded memory use when falling behind on sideloaded MsgApp

## Description

In https://github.com/cockroachdb/cockroach/issues/71802#issuecomment-950925984, we are seeing occasional failures due to nodes running out of memory. The heap profiles show large amounts of memory allocated by loading sideloaded SSTs into memory for appending to followers. Each individual raft leader will only pull ~one SST per append (due to our [32kb max-append-size target](https://github.com/cockroachdb/cockroach/blob/8817c28fd444727ab30332bb90580a62b358119d/pkg/base/config.go#L132-L135)) but it may do so for each follower, meaning that for every leader in the system, we can expect at most `num_followers * sst_size` to be pulled into memory *per raft cycle*. Unfortunately, outgoing messages are [buffered](https://github.com/cockroachdb/cockroach/blob/0e20fe67391d0f338f96b3624768a0e0ccfc86c6/pkg/kv/kvserver/raft_transport.go#L557-L562) and so even a single group might put a theoretical limit of [10k](https://github.com/cockroachdb/cockroach/blob/0e20fe67391d0f338f96b3624768a0e0ccfc86c6/pkg/kv/kvserver/raft_transport.go#L49) SSTs into memory.
We don't have a single group but potentially tens of thousands of them, and theoretically each of them can do the above (though they all share the 10k limit or messages will be dropped wholesale). In practice, the quota pool should, on each leader, prevent too many SSTs from entering the raft layer before they've been fully distributed to the followers. The quota pool size is [half the raft log truncation threshold](https://github.com/cockroachdb/cockroach/blob/8817c28fd444727ab30332bb90580a62b358119d/pkg/base/config.go#L395-L397) which is [16mb](https://github.com/cockroachdb/cockroach/blob/8817c28fd444727ab30332bb90580a62b358119d/pkg/base/config.go#L129-L130), i.e. we have an 8mb proposal quota, so really, assuming SSTs that are no larger than 8mb, we expect to have only 8mb*num_followers in flight at any given time, per local raft leader.

[Here](https://github.com/cockroachdb/cockroach/issues/71802#issuecomment-950925984) we saw the heap profile track 2.11GiB. Unfortunately, we don't have the artifacts any more but even with them, it might be difficult to find out whether we are dealing with a small number of extraordinarily large SSTs vs a homogeneous flood of reasonably-sized SSTs. Still, investigating another occurrence would be helpful, in particular with an eye of when during the restore the problem occurs. 

----

## Action items

- [ ] add a histogram of raft append sizes (making sure that it allows us to distinguish between few large vs many reasonable msgs)
- [ ] switch from the cardinality-based queuing [here](https://github.com/cockroachdb/cockroach/blob/0e20fe67391d0f338f96b3624768a0e0ccfc86c6/pkg/kv/kvserver/raft_transport.go#L593-L594) to msg-size-based, and selectively drop messages that don't fit into the queue (how to size the queue will be an open question)

Jira issue: CRDB-11564




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

Description

Action items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

Description

Description

Action items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions