Skip to content

storage: Raft entry cache grows (inefficiently) to huge size #13231

@bdarnell

Description

@bdarnell

Each store has a cache for raft entries, configured by default to be 16MB. On gamma, we see that on one node (at a time), we have over a gigabyte of raft entries in memory. I believe that not all of these entries are in the cache, but they are all being held in place by the cache, because Replica.Entries allocates one large array of raftpb.Entry objects, so that a reference to any one of them keeps the whole array alive.

Additionally, whenever we load a large array of entries, we try to add them all to the cache, inserting them all one by one and then evicting all but the last 16MB. This is actually the bigger concern in the gamma cluster at this time, since this process takes long enough (and blocks the server) so that it loses leases and never makes any progress.

Why do we load this monolithic block of entries? Whenever a new node becomes leader, it loads all uncommitted entries to see if there are any config changes. This is the one time in which we load raft entries without any chunking. It would be easy to add chunking here. In addition, we may want to change the behavior of the raft.Entries method to break up the arrays that it uses when adding entries to the cache (trading off the allocation overhead of smaller allocations vs the wasted memory of sibling array entries). Finally, this problem is also a result of our inability to throttle incoming raft entries. The log appears to keep growing beyond our ability to process it.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions