Skip to content

perf(consensus): Run broadcast routines out of process#3180

Merged
melekes merged 2 commits intomainfrom
dev/make_broadcast_routines_out_of_process
Jun 19, 2024
Merged

perf(consensus): Run broadcast routines out of process#3180
melekes merged 2 commits intomainfrom
dev/make_broadcast_routines_out_of_process

Conversation

@ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented Jun 4, 2024

Run broadcast routines out of process. Right now each broadcast routine blocks the consensus mutex for roughly num_peers * process_creation_time, which is genuinely notable! This PR reduces the consensus blocking overhead to just be process_creation_time.

On the latest osmosis branch with improvements, thats 20s of blocking time out of 140s (over the course of 1 hour. This 140s includes block execution!)
image

Note that WAL write time should go significantly down with open PR's. For HasVote, this is a meaningful increase to consensus mutex lock time, so its worth reducing.


PR checklist

  • Tests written/updated - I can't think of any test to add
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments - I don't know of any related docs here
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested a review from a team as a code owner June 4, 2024 13:59
@ValarDragon ValarDragon requested a review from a team June 4, 2024 13:59
Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ValarDragon ❤️

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 5, 2024
 perf(consensus): Run broadcast routines out of process cometbft#3180
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 5, 2024
…/pr-101

 perf(consensus): Run broadcast routines out of process cometbft#3180  (backport #101)
github-merge-queue bot pushed a commit that referenced this pull request Jun 7, 2024
Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what #3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in #3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

#### PR checklist

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@cason
Copy link

cason commented Jun 11, 2024

So we are creating a Go routine to create num_peers Go routines to send a message to each peer? I am really impressed (and somehow suspicious) by the improvement this can provide.

@ValarDragon
Copy link
Contributor Author

Yes! This was surprisingly costly to me as well, but this is what the CPU profiles were saying! Post this change, this fireEvent time is ~entirely removed

@cason
Copy link

cason commented Jun 12, 2024

Can't it be the contention of the for loop in the Switch() method? Because it has to retrieve the list of connected peers, and it is synchronized....

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Jun 14, 2024

No, this CPU profile is directly saying its runtime.NewProc and runtime.newObject, which is not based on mutex contentions

We just broadcast alot.

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go with this.

I think the best solution is to improve the Broadcast implementation, but the benefits of this minimal change seem to be relevant.

@melekes melekes added this pull request to the merge queue Jun 19, 2024
Merged via the queue into main with commit 110817b Jun 19, 2024
@melekes melekes deleted the dev/make_broadcast_routines_out_of_process branch June 19, 2024 14:33
@melekes
Copy link
Collaborator

melekes commented Jul 10, 2024

@mergify backport v1.x

@mergify
Copy link
Contributor

mergify bot commented Jul 10, 2024

backport v1.x

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Jul 10, 2024
Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

#### PR checklist

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

(cherry picked from commit 110817b)

# Conflicts:
#	.changelog/unreleased/improvements/3180-lower-broadcasts-consensus-overhead.md
#	internal/consensus/reactor.go
mergify bot pushed a commit that referenced this pull request Jul 10, 2024
Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what #3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in #3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

#### PR checklist

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit e731a3f)

# Conflicts:
#	p2p/switch.go
#	p2p/switch_test.go
melekes added a commit that referenced this pull request Jul 10, 2024
… (#3477)

Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

#### PR checklist

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3180 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
melekes added a commit that referenced this pull request Jul 10, 2024
Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what #3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in #3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

#### PR checklist

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3182 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
…ometbft#3480)

Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what cometbft#3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in cometbft#3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3182 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
…ometbft#3480) (#134)

Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what cometbft#3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in cometbft#3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3182 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
…tbft#3180) (cometbft#3477)

Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

#### PR checklist

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3180 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
…tbft#318… (#135)

* perf(consensus): Run broadcast routines out of process (backport cometbft#3180) (cometbft#3477)

Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

#### PR checklist

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3180 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

* add back  has vote message broadcast

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mattac21 pushed a commit that referenced this pull request Sep 9, 2025
Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what #3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in #3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

#### PR checklist

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mattac21 pushed a commit that referenced this pull request Sep 9, 2025
Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
mattac21 pushed a commit that referenced this pull request Sep 10, 2025
Remove the return channel and waitgroup from switch.Broadcast as its
unused. This was costing us some CPU overhead (2% of broadcast routine
as it stands right now, but broadcast routine time should be dropping)
and one extra goroutine. This removes more overhead than what #3180 may
introduce.

It also better highlights the point that maybe all broadcast usecases
should consider using TrySend (And we potentially adapt channel
buffers), ala the point in #3151 . (This would have significant
system-wide impact, by significantly lowering the number of
timers/scheduler overhead we have)

---

#### PR checklist

- [x] Tests written/updated - Simplifies code, the behavior being
removed was never tested!
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mattac21 pushed a commit that referenced this pull request Sep 10, 2025
Run broadcast routines out of process. Right now each broadcast routine
blocks the consensus mutex for roughly `num_peers *
process_creation_time`, which is genuinely notable! This PR reduces the
consensus blocking overhead to just be `process_creation_time`.

On the latest osmosis branch with improvements, thats 20s of blocking
time out of 140s (over the course of 1 hour. This 140s includes block
execution!)

![image](https://github.com/cometbft/cometbft/assets/6440154/4c202988-a0d1-460e-89bc-7c1be11fd36f)

Note that WAL write time should go significantly down with open PR's.
For `HasVote`, this is a meaningful increase to consensus mutex lock
time, so its worth reducing.

---

- [x] Tests written/updated - I can't think of any test to add
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [x] Updated relevant documentation (`docs/` or `spec/`) and code
comments - I don't know of any related docs here
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants