Skip to content

perf(blockstore): Add LRU caches to blockstore operations used in consensus#3003

Merged
melekes merged 4 commits intocometbft:mainfrom
osmosis-labs:dev/blockstore_lru
May 15, 2024
Merged

perf(blockstore): Add LRU caches to blockstore operations used in consensus#3003
melekes merged 4 commits intocometbft:mainfrom
osmosis-labs:dev/blockstore_lru

Conversation

@ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented May 3, 2024

Closes #2844

We are seeing that the blockstore loading operations get used in hot loops within gossip routines, and queryMaj23 routines. This PR reduces that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU cache is 9x faster than without (before even adding in DB overheads), due to the proto unmarshalling saved. We could imagine a setup where we avoided a lock there entirely. I don't think this is worth right now, since the new code is 9x faster, and these mostly appear in catchup code which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:

BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op

NEW:

BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op

It turns out these gossip routines don't need mutative copies, so we could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:
image

The state machine execution time here for context is 92 seconds. So this is adding up in system load (and GC! The GC load is 52GB, the entire trace is 200GB, with other parts being optimized down from recent PRs)


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested a review from a team as a code owner May 3, 2024 20:07
@ValarDragon ValarDragon requested a review from a team May 3, 2024 20:07
@ValarDragon
Copy link
Contributor Author

Seems like the PR is failing due to the use of the lru import from hashicorp. Open question I guess if thats safe as an import? As far as I understand its pretty standard, and doesn't have licensing issues: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 . Its used in many of the big blockchain repos as well: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2?tab=importedby (Filecoin, libp2p, Polygon Hermez' bridge server )

Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ValarDragon ❤️

@melekes
Copy link
Collaborator

melekes commented May 6, 2024

Seems like the PR is failing due to the use of the lru import from hashicorp. Open question I guess if thats safe as an import? As far as I understand its pretty standard, and doesn't have licensing issues: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 . Its used in many of the big blockchain repos as well: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2?tab=importedby (Filecoin, libp2p, Polygon Hermez' bridge server )

You need to add it to

allow:

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this.

Should we add a changelog entry?

func (bs *BlockStore) LoadBlockCommit(height int64) *types.Commit {
comm, ok := bs.blockCommitCache.Get(height)
if ok {
return comm.Clone()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return a clone here? I assume this data never changes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the Clone replicates existing behavior. (A caller mutation would pollute the cache)

We can change the API to remove the clone, which I think is a good idea, we would just have to look for all usages and ensure they don't mutate it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ValarDragon would you prefer changing the API here or in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR

func (bs *BlockStore) addCaches() {
var err error
// err can only occur if the argument is non-positive, so is impossible in context.
bs.blockCommitCache, err = lru.New[int64, *types.Commit](100)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that it is somehow arbitrary, but why 100? :D

Copy link
Contributor Author

@ValarDragon ValarDragon May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was very much arbitrary, it has to be bigger than like 3 for live consensus + peer catchup. Was hoping that if its a bigger value, we wouldn't get too much thrashing while also serving blocksync.

If LRU lock contention or thrashing ever becomes a bottleneck we can make more complicated lock-free caching schemes though

ValarDragon and others added 2 commits May 7, 2024 19:00
Co-authored-by: Daniel <daniel.cason@informal.systems>
@melekes melekes added backport-to-v0.37.x backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x labels May 9, 2024
@melekes melekes added this pull request to the merge queue May 15, 2024
Merged via the queue into cometbft:main with commit 46e2484 May 15, 2024
mergify bot pushed a commit that referenced this pull request May 15, 2024
…sensus (#3003)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 46e2484)
mergify bot pushed a commit that referenced this pull request May 15, 2024
…sensus (#3003)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 46e2484)

# Conflicts:
#	.golangci.yml
#	go.mod
#	go.sum
#	store/store.go
#	store/store_test.go
#	types/block.go
mergify bot pushed a commit that referenced this pull request May 15, 2024
…sensus (#3003)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 46e2484)

# Conflicts:
#	.golangci.yml
#	go.mod
#	go.sum
#	store/store.go
melekes pushed a commit that referenced this pull request May 15, 2024
…sensus (backport #3003) (#3081)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod: 

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3003 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
melekes added a commit that referenced this pull request May 15, 2024
…sensus (backport #3003) (#3083)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod: 

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3003 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
melekes added a commit that referenced this pull request May 15, 2024
…sensus (backport #3003) (#3082)

Closes #2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod: 

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3003 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request May 25, 2024
…sensus (backport cometbft#3003) (cometbft#3083)

Closes cometbft#2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3003 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mergify bot added a commit to osmosis-labs/cometbft that referenced this pull request May 28, 2024
…sensus (backport cometbft#3003) (cometbft#3083)

Closes cometbft#2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3003 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 0c10bd5)
PaddyMc added a commit to osmosis-labs/cometbft that referenced this pull request May 28, 2024
…… (backport #76) (#81)

* perf(blockstore): Add LRU caches to blockstore operations used in consensus (backport cometbft#3003) (cometbft#3083)

Closes cometbft#2844

We are seeing that the blockstore loading operations get used in hot
loops within gossip routines, and queryMaj23 routines. This PR reduces
that overhead using an LRU cache.

The LRU cache does have a mutex on every get, but the time with the LRU
cache is 9x faster than without (before even adding in DB overheads),
due to the proto unmarshalling saved. We could imagine a setup where we
avoided a lock there entirely. I don't think this is worth right now,
since the new code is 9x faster, and these mostly appear in catchup code
which should not be highly contended for across peers at the same time.

With the new benchmark I added:
OLD:
```
BenchmarkRepeatedLoadSeenCommit-12         24447             54691 ns/op           46495 B/op        319 allocs/op
```
NEW:
```
BenchmarkRepeatedLoadSeenCommit-12        224131              6401 ns/op            8320 B/op          2 allocs/op
```

It turns out these gossip routines don't need mutative copies, so we
could optimize out the large allocation in the future if we want.

1 hour cpu profile that shows this appearing in prod:

![image](https://github.com/cometbft/cometbft/assets/6440154/5a7e0f02-8385-4c01-aa6a-dba2a2bf376d)

The state machine execution time here for context is 92 seconds. So this
is adding up in system load (and GC! The GC load is 52GB, the entire
trace is 200GB, with other parts being optimized down from recent PRs)

---

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3003 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit 0c10bd5)

* Add Changelog

(cherry picked from commit 4594f29)

# Conflicts:
#	CHANGELOG.md

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Dev Ojha <dojha@berkeley.edu>
Co-authored-by: PaddyMc <paddymchale@hotmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 26, 2024
…p routines (#3342)

We speedup the gossip routines by using caches for LoadBlockPart, and
making LoadBlockCommit no longer call Clone, as the caller does
read-only operations to it.

This is a followup to #3003

This results in a net performance improvement, based on a 2hr sync on
latest osmosis release, of:
- 20% less RAM allocated over program lifetime. (76GB no longer
allocated, 380 GB alloc total)
- 50% speedup to gossipDataRoutine
- ~10% speedup to gossipVotesRoutine
- 1% less overall system CPU usage. (If GC proportional to bytes
allocated, then 5% less)

I have checked that every usage of `LoadBlockPart` and `LoadBlockCommit`
do not modify the return value, so returning this underlying cache copy
is safe for the current codebase

I used an `!` in the title, since this is technically API breaking, even
though all current usages are safe.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mergify bot pushed a commit that referenced this pull request Jul 10, 2024
…p routines (#3342)

We speedup the gossip routines by using caches for LoadBlockPart, and
making LoadBlockCommit no longer call Clone, as the caller does
read-only operations to it.

This is a followup to #3003

This results in a net performance improvement, based on a 2hr sync on
latest osmosis release, of:
- 20% less RAM allocated over program lifetime. (76GB no longer
allocated, 380 GB alloc total)
- 50% speedup to gossipDataRoutine
- ~10% speedup to gossipVotesRoutine
- 1% less overall system CPU usage. (If GC proportional to bytes
allocated, then 5% less)

I have checked that every usage of `LoadBlockPart` and `LoadBlockCommit`
do not modify the return value, so returning this underlying cache copy
is safe for the current codebase

I used an `!` in the title, since this is technically API breaking, even
though all current usages are safe.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 2c88a26)
melekes pushed a commit that referenced this pull request Jul 10, 2024
…p routines (backport #3342) (#3482)

We speedup the gossip routines by using caches for LoadBlockPart, and
making LoadBlockCommit no longer call Clone, as the caller does
read-only operations to it.

This is a followup to #3003

This results in a net performance improvement, based on a 2hr sync on
latest osmosis release, of:
- 20% less RAM allocated over program lifetime. (76GB no longer
allocated, 380 GB alloc total)
- 50% speedup to gossipDataRoutine
- ~10% speedup to gossipVotesRoutine
- 1% less overall system CPU usage. (If GC proportional to bytes
allocated, then 5% less)

I have checked that every usage of `LoadBlockPart` and `LoadBlockCommit`
do not modify the return value, so returning this underlying cache copy
is safe for the current codebase

I used an `!` in the title, since this is technically API breaking, even
though all current usages are safe.

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [ ] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3342 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add lru cache for loading commit from blockstore

3 participants