perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003) by mergify[bot] · Pull Request #3083 · cometbft/cometbft

mergify · 2024-05-15T05:45:29Z

We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:

BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op

NEW:

BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op

It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs)

PR checklist

Tests written/updated
Changelog entry added in .changelog (we use unclog to manage our changelog)
Updated relevant documentation (docs/ or spec/) and code comments
Title follows the Conventional Commits spec

This is an automatic backport of pull request #3003 done by [Mergify](https://mergify.com).

…sensus (#3003) Closes #2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- #### PR checklist - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> (cherry picked from commit 46e2484) # Conflicts: # .golangci.yml # go.mod # go.sum # store/store.go # store/store_test.go # types/block.go

mergify · 2024-05-15T05:45:30Z

Cherry-pick of 46e2484 has failed:

On branch mergify/bp/v0.37.x/pr-3003
Your branch is up to date with 'origin/v0.37.x'.

You are currently cherry-picking commit 46e24848f.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .changelog/unreleased/improvements/3003-use-lru-caches-in-blockstore.md
	new file:   store/bench_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   .golangci.yml
	both modified:   go.mod
	both modified:   go.sum
	both modified:   store/store.go
	both modified:   store/store_test.go
	both modified:   types/block.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

…sensus (backport cometbft#3003) (cometbft#3083) Closes cometbft#2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3003 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

…sensus (backport cometbft#3003) (cometbft#3083) Closes cometbft#2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3003 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 0c10bd5)

…… (backport #76) (#81) * perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport cometbft#3003) (cometbft#3083) Closes cometbft#2844 We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache. The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time. With the new benchmark I added: OLD: ``` BenchmarkRepeatedLoadSeenCommit-12 24447 54691 ns/op 46495 B/op 319 allocs/op ``` NEW: ``` BenchmarkRepeatedLoadSeenCommit-12 224131 6401 ns/op 8320 B/op 2 allocs/op ``` It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want. 1 hour cpu profile that shows this appearing in prod: ![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d) The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs) --- - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments - [ ] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3003 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 0c10bd5) * Add Changelog (cherry picked from commit 4594f29) # Conflicts: # CHANGELOG.md --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Dev Ojha <dojha@berkeley.edu> Co-authored-by: PaddyMc <paddymchale@hotmail.com>

mergify bot requested a review from a team as a code owner May 15, 2024 05:45

mergify bot added the conflicts label May 15, 2024

melekes added 3 commits May 15, 2024 12:05

fix conflicts

0f85ca0

fix conflicts 2

21a1af8

fix linter error

9682e77

melekes approved these changes May 15, 2024

View reviewed changes

melekes merged commit 8a0f481 into v0.37.x May 15, 2024

melekes deleted the mergify/bp/v0.37.x/pr-3003 branch May 15, 2024 09:26

sergio-mena removed the conflicts label May 17, 2024

ValarDragon mentioned this pull request May 25, 2024

perf(blockstore): Add LRU caches to blockstore operations used in con… osmosis-labs/cometbft#76

Merged

7 tasks

mergify bot mentioned this pull request May 28, 2024

perf(blockstore): Add LRU caches to blockstore operations used in con… (backport #76) osmosis-labs/cometbft#81

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003)#3083

perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport #3003)#3083
melekes merged 4 commits intov0.37.xfrom
mergify/bp/v0.37.x/pr-3003

mergify bot commented May 15, 2024

Uh oh!

mergify bot commented May 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mergify bot commented May 15, 2024

PR checklist

Uh oh!

mergify bot commented May 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants