Skip to content

kvserver: scale Raft entry cache size with system memory#107424

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
erikgrinaker:raft-entry-cache-sizing
Jul 26, 2023
Merged

kvserver: scale Raft entry cache size with system memory#107424
craig[bot] merged 1 commit intocockroachdb:masterfrom
erikgrinaker:raft-entry-cache-sizing

Conversation

@erikgrinaker
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker commented Jul 23, 2023

The Raft entry cache size defaulted to 16 MB, which is rather small. This has been seen to cause tail latency and throughput degradation with high write volume on large nodes, correlating with a reduction in the entry cache hit rate.

This patch linearly scales the Raft entry cache size as 1/256 of total system/cgroup memory, shared evenly between all stores, with a minimum 32 MB. For example, a 32 GB 8-vCPU node will have a 128 MB entry cache.

This is a conservative default, since this memory is not accounted for in existing memory budgets nor by the --cache flag. We rarely see cache misses in production clusters anyway, and have seen significantly improved hit rates with this scaling (e.g. a 64 KB kv0 workload on 8-vCPU nodes increased from 87% to 99% hit rate).

Resolves #98666.
Epic: none

Release note (performance improvement): The default Raft entry cache size has been increased from 16 MB to 1/256 of system memory with a minimum of 32 MB, divided evenly between all stores. This can be configured via COCKROACH_RAFT_ENTRY_CACHE_SIZE.

@erikgrinaker erikgrinaker requested review from a team, nvb and pav-kv July 23, 2023 14:57
@erikgrinaker erikgrinaker requested a review from a team as a code owner July 23, 2023 14:57
@erikgrinaker erikgrinaker self-assigned this Jul 23, 2023
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jul 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

erikgrinaker commented Jul 23, 2023

I could be convinced to increase the scaling here to e.g. 1/128 with 64 MB minimum, but I decided to err on the side of caution here, and I haven't seen compelling evidence that this scaling is insufficient for the majority of workloads. It might be more relevant in multi-region clusters, where replication latencies are higher.

@nvanbenschoten Do you have any further workloads or experiments you'd like to calibrate this on? I'm currently rerunning #98666 (comment) and will post results in #98666, but after 1 hour I'm seeing ~100% hit rate with the new setting while the old default is at 98.9%.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker and @pavelkalinnikov)


pkg/kv/kvserver/store.go line 1294 at r1 (raw file):

	}
	if sc.RaftEntryCacheSize == 0 {
		sc.RaftEntryCacheSize = uint64(defaultRaftEntryCacheSize)

nit: should this just be sc.RaftEntryCacheSize = uint64(defaultRaftEntryCacheSize) / uint64(numStores)?

@erikgrinaker erikgrinaker force-pushed the raft-entry-cache-sizing branch 2 times, most recently from 31a30dc to 3ef9509 Compare July 26, 2023 08:27
Copy link
Copy Markdown
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pavelkalinnikov)


pkg/kv/kvserver/store.go line 1294 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should this just be sc.RaftEntryCacheSize = uint64(defaultRaftEntryCacheSize) / uint64(numStores)?

Figured I'd guard against division by zero. This is called unconditionally on the server startup path, and I don't know if there are cases where we can end up running without a store (e.g. SQL pods).

The Raft entry cache size defaulted to 16 MB, which is rather small.
This has been seen to cause tail latency and throughput degradation with
high write volume on large nodes, correlating with a reduction in the
entry cache hit rate.

This patch linearly scales the Raft entry cache size as 1/256 of total
system/cgroup memory, shared evenly between all stores, with a minimum
32 MB. For example, a 32 GB 8-vCPU node will have a 128 MB entry cache.

This is a conservative default, since this memory is not accounted for
in existing memory budgets nor by the `--cache` flag. We rarely see
cache misses in production clusters anyway, and have seen significantly
improved hit rates with this scaling (e.g. a 64 KB kv0 workload on
8-vCPU nodes increased from 87% to 99% hit rate).

Epic: none
Release note (performance improvement): The default Raft entry cache
size has been increased from 16 MB to 1/256 of system memory with a
minimum of 32 MB, divided evenly between all stores. This can be
configured via `COCKROACH_RAFT_ENTRY_CACHE_SIZE`.
@erikgrinaker erikgrinaker force-pushed the raft-entry-cache-sizing branch from 3ef9509 to c435472 Compare July 26, 2023 08:30
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 26, 2023

Canceled.

@erikgrinaker
Copy link
Copy Markdown
Contributor Author

bors r+

@craig craig bot merged commit f147c2b into cockroachdb:master Jul 26, 2023
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 26, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: investigate raft entry cache sizing

3 participants