perf(consensus): Run broadcast routines out of process#3180
Conversation
perf(consensus): Run broadcast routines out of process cometbft#3180
…/pr-101 perf(consensus): Run broadcast routines out of process cometbft#3180 (backport #101)
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
|
So we are creating a Go routine to create |
|
Yes! This was surprisingly costly to me as well, but this is what the CPU profiles were saying! Post this change, this fireEvent time is ~entirely removed |
|
Can't it be the contention of the for loop in the |
|
No, this CPU profile is directly saying its runtime.NewProc and runtime.newObject, which is not based on mutex contentions We just broadcast alot. |
cason
left a comment
There was a problem hiding this comment.
Lets go with this.
I think the best solution is to improve the Broadcast implementation, but the benefits of this minimal change seem to be relevant.
|
@mergify backport v1.x |
✅ Backports have been createdDetails
|
Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- #### PR checklist - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec (cherry picked from commit 110817b) # Conflicts: # .changelog/unreleased/improvements/3180-lower-broadcasts-consensus-overhead.md # internal/consensus/reactor.go
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> (cherry picked from commit e731a3f) # Conflicts: # p2p/switch.go # p2p/switch_test.go
… (#3477) Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- #### PR checklist - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3180 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request #3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ometbft#3480) Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what cometbft#3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in cometbft#3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…ometbft#3480) (#134) Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what cometbft#3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in cometbft#3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3182 done by [Mergify](https://mergify.com). --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…tbft#3180) (cometbft#3477) Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- #### PR checklist - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3180 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…tbft#318… (#135) * perf(consensus): Run broadcast routines out of process (backport cometbft#3180) (cometbft#3477) Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- #### PR checklist - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request cometbft#3180 done by [Mergify](https://mergify.com). --------- Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> * add back has vote message broadcast --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
Remove the return channel and waitgroup from switch.Broadcast as its unused. This was costing us some CPU overhead (2% of broadcast routine as it stands right now, but broadcast routine time should be dropping) and one extra goroutine. This removes more overhead than what #3180 may introduce. It also better highlights the point that maybe all broadcast usecases should consider using TrySend (And we potentially adapt channel buffers), ala the point in #3151 . (This would have significant system-wide impact, by significantly lowering the number of timers/scheduler overhead we have) --- #### PR checklist - [x] Tests written/updated - Simplifies code, the behavior being removed was never tested! - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly `num_peers * process_creation_time`, which is genuinely notable! This PR reduces the consensus blocking overhead to just be `process_creation_time`. On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)  Note that WAL write time should go significantly down with open PR's. For `HasVote`, this is a meaningful increase to consensus mutex lock time, so its worth reducing. --- - [x] Tests written/updated - I can't think of any test to add - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - I don't know of any related docs here - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly
num_peers * process_creation_time, which is genuinely notable! This PR reduces the consensus blocking overhead to just beprocess_creation_time.On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)

Note that WAL write time should go significantly down with open PR's. For
HasVote, this is a meaningful increase to consensus mutex lock time, so its worth reducing.PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments - I don't know of any related docs here