perf(consensus): Use TrySend for hasVote/HasBlockPart messages#3407
perf(consensus): Use TrySend for hasVote/HasBlockPart messages#3407
Conversation
|
Going to test this in osmosis prod to see the scheduler improvement, mostly because I'm curious if it reflects my understanding. The test I added tests TryBroadcast to same extent Broadcast is tested within switch. (I didn't see any other test) |
|
I fixed the old Go Benchmark (It hasn't been working for months). So a significant CPU cost reduction! |
|
This appears to have been an empiric 20% reduction in scheduler time! (At consensus state and mempool times being roughly equal) And a 5-10% GC time reduction! |
|
I will take a detailed look maybe tomorrow, but conceptually I agree, in particular because missing those two messages is not a problem: they are optimizations, preventing duplication, they are not core messages of the protocol. |
cason
left a comment
There was a problem hiding this comment.
Great!
Please fix the changelog before committing.
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
| use TrySend instead of Send. This saves notable amounts of performance, | ||
| as TrySend just drops sending when the channel is full, instead of creating | ||
| a timer / new channel. | ||
| ([\#3342](https://github.com/cometbft/cometbft/issues/3342)) |
There was a problem hiding this comment.
Is this link right? #3342 is something else.
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
Co-authored-by: Daniel <daniel.cason@informal.systems>
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
…-use-trysend.md Co-authored-by: Daniel <daniel.cason@informal.systems>
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3407-make-hasvote-broadcast-use-trysend.md
Outdated
Show resolved
Hide resolved
|
|
I'm at conferences this week, I can work on debugging on weekend! Sorry for delay on replies |
|
This error is odd, are you sure it was introduced by this PR? |
|
I mean, |
not sure |
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit 55493e0) # Conflicts: # internal/consensus/reactor.go # p2p/switch.go # p2p/switch_test.go
…ort #3407) (#3466) Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ort cometbft#3407) (cometbft#3466) Closes cometbft#3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3407 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #3151 The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the `NewTimer` call in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following. We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is `CheckTimers` in the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce total `mallocgc` time here by 5% by removing NewTimer, which should hopefully reduce GC load as well. --- - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Daniel <daniel.cason@informal.systems> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #3151
The broadcast routine in its entirely is the source of ~75% of all created channels in the system, all coming from the
NewTimercall in Send. (and 75% of all NewTimer) Furthermore, the NewTimer call from Broadcast is 2% of all CPU time across the system. However I think this is actually more impact, due to the following.We spend 11% of all CPU time in the scheduler, and 4% of all CPU time is
CheckTimersin the scheduler logic. I'd expect reducing most of the Timer calls reduces this. We also reduce totalmallocgctime here by 5% by removing NewTimer, which should hopefully reduce GC load as well.PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments