Skip to content

perf(p2p/conn): Buffer secret connection writes#3346

Merged
cason merged 11 commits intomainfrom
dev/buffer_secretconn_writes
Jul 3, 2024
Merged

perf(p2p/conn): Buffer secret connection writes#3346
cason merged 11 commits intomainfrom
dev/buffer_secretconn_writes

Conversation

@ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented Jun 26, 2024

component of #3198 , this PR buffers writes. What happens is secret conn will often receive a write of (say) 65kb. But it will then split this into 64 1024 byte frames. It does a write on each frame, which is a syscall write. Then starts the next frame. Instead here, we buffer these writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for this is netconn.Write taking 33% of the time in sendroutine.
image
image

We should do something similar for reads, but due to how the evil_secret_connection test makes non-black-box use of conn, that will require notable test refactors.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested a review from a team as a code owner June 26, 2024 15:24
@ValarDragon ValarDragon requested a review from a team June 26, 2024 15:24
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 26, 2024
…115)

* Buffer secret connection writes

* Add changelog

* Add changelog v2
@zmanian
Copy link

zmanian commented Jun 26, 2024

I think at very least we should also make a breaking change to increase the frame size from 1k.

If we aren't going to adaptive frames....

Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ValarDragon ❤️

@melekes melekes added the p2p label Jun 27, 2024
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would request to make this PR simple by just adding the connWriter to buffer writes, which is good in general.

@ValarDragon
Copy link
Contributor Author

I think at very least we should also make a breaking change to increase the frame size from 1k.
If we aren't going to adaptive frames....

Agreed! I'm down to (in separate PRs) increase frame size, at least to have that in next coordinate release if putting in another secret transport layer doesn't work out. (Or at minimum raising frame size) The amount of chacha20poly1305 overhead was really surprising to me though, I hope that improves proportionately with larger frames

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Can we really measure the improvements that this provides?

@cason cason changed the title perf(p2p/secretconn): Buffer secret connection writes perf(p2p/conn): Buffer secret connection writes Jun 28, 2024
…on-writes.md

Co-authored-by: Daniel <daniel.cason@informal.systems>
@ValarDragon
Copy link
Contributor Author

Yes will get it! Sorry for delay, two silly issues got in the way. (Snapshot service was down when I first did it, and I didn't download result in time during my second profile so it get retented. Re-running mainnet benchmarks now to get the ratio here)

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Jul 1, 2024

This successfully eliminates the overhead coming from the write packet, but not flush (which makes sense as flush is dealing with the case where we can't fill one frame anyway. This is because the flush size is parameterized to be equal to the frame size right now. I don't know if this is coincidence or design today, they are both freely variable parameters)

We should expect the ratio of seal to file.Write in the sendPacketMsg call (the non-flush throttle case) to be indicative of what we are speeding up. So on this latest benchmark, its a ratio of 2.5 seal : 1.5 file write. Normalized, 1s of sealing needed .6s of net con write.

Originally this case was: 1s of sealing, needed 1.73s of net conn write. So this is an almost 3x speedup to this part of the bottleneck! (And now the unneeded buffer .Put and .Get matter to optimize out)

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. Lets merge this.

@cason cason added this pull request to the merge queue Jul 3, 2024
Merged via the queue into main with commit 8422f57 Jul 3, 2024
@cason cason deleted the dev/buffer_secretconn_writes branch July 3, 2024 22:36
itsdevbear pushed a commit to berachain/cometbft that referenced this pull request Jul 4, 2024
…ometbft#115)

* Buffer secret connection writes

* Add changelog

* Add changelog v2
github-merge-queue bot pushed a commit that referenced this pull request Jul 5, 2024
Closes #3198 

Similar to #3346 , buffers the secret connection reads. This is a
notable savings to CPU time. (25% of recvRoutine time, on Osmosis'
version)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
@melekes
Copy link
Collaborator

melekes commented Jul 10, 2024

@mergify backport v1.x

@mergify
Copy link
Contributor

mergify bot commented Jul 10, 2024

backport v1.x

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Jul 10, 2024
component of #3198 , this PR buffers writes. What happens is secret conn
will often receive a write of (say) 65kb. But it will then split this
into 64 1024 byte frames. It does a write on each frame, which is a
syscall write. Then starts the next frame. Instead here, we buffer these
writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for
this is netconn.Write taking 33% of the time in sendroutine.

![image](https://github.com/cometbft/cometbft/assets/6440154/b7a43188-a69b-41b1-9506-2f66a2d63a74)

![image](https://github.com/cometbft/cometbft/assets/6440154/95f15ff0-94b8-419c-8759-1155d63d32f8)

We should do something similar for reads, but due to how the
evil_secret_connection test makes non-black-box use of conn, that will
require notable test refactors.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
(cherry picked from commit 8422f57)
melekes pushed a commit that referenced this pull request Jul 10, 2024
component of #3198 , this PR buffers writes. What happens is secret conn
will often receive a write of (say) 65kb. But it will then split this
into 64 1024 byte frames. It does a write on each frame, which is a
syscall write. Then starts the next frame. Instead here, we buffer these
writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for
this is netconn.Write taking 33% of the time in sendroutine.

![image](https://github.com/cometbft/cometbft/assets/6440154/b7a43188-a69b-41b1-9506-2f66a2d63a74)

![image](https://github.com/cometbft/cometbft/assets/6440154/95f15ff0-94b8-419c-8759-1155d63d32f8)

We should do something similar for reads, but due to how the
evil_secret_connection test makes non-black-box use of conn, that will
require notable test refactors.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3346 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
mergify bot pushed a commit that referenced this pull request Jul 10, 2024
Closes #3198

Similar to #3346 , buffers the secret connection reads. This is a
notable savings to CPU time. (25% of recvRoutine time, on Osmosis'
version)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
(cherry picked from commit e1eabe0)

# Conflicts:
#	p2p/conn/evil_secret_connection_test.go
melekes added a commit that referenced this pull request Jul 10, 2024
…3419) (#3489)

Closes #3198 

Similar to #3346 , buffers the secret connection reads. This is a
notable savings to CPU time. (25% of recvRoutine time, on Osmosis'
version)

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3419 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
…) (cometbft#3485)

component of cometbft#3198 , this PR buffers writes. What happens is secret conn
will often receive a write of (say) 65kb. But it will then split this
into 64 1024 byte frames. It does a write on each frame, which is a
syscall write. Then starts the next frame. Instead here, we buffer these
writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for
this is netconn.Write taking 33% of the time in sendroutine.

![image](https://github.com/cometbft/cometbft/assets/6440154/b7a43188-a69b-41b1-9506-2f66a2d63a74)

![image](https://github.com/cometbft/cometbft/assets/6440154/95f15ff0-94b8-419c-8759-1155d63d32f8)

We should do something similar for reads, but due to how the
evil_secret_connection test makes non-black-box use of conn, that will
require notable test refactors.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3346 done by
[Mergify](https://mergify.com).

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
mattac21 pushed a commit that referenced this pull request Sep 9, 2025
component of #3198 , this PR buffers writes. What happens is secret conn
will often receive a write of (say) 65kb. But it will then split this
into 64 1024 byte frames. It does a write on each frame, which is a
syscall write. Then starts the next frame. Instead here, we buffer these
writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for
this is netconn.Write taking 33% of the time in sendroutine.

![image](https://github.com/cometbft/cometbft/assets/6440154/b7a43188-a69b-41b1-9506-2f66a2d63a74)

![image](https://github.com/cometbft/cometbft/assets/6440154/95f15ff0-94b8-419c-8759-1155d63d32f8)

We should do something similar for reads, but due to how the
evil_secret_connection test makes non-black-box use of conn, that will
require notable test refactors.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
mattac21 pushed a commit that referenced this pull request Sep 9, 2025
Closes #3198

Similar to #3346 , buffers the secret connection reads. This is a
notable savings to CPU time. (25% of recvRoutine time, on Osmosis'
version)

---

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
mattac21 pushed a commit that referenced this pull request Sep 10, 2025
component of #3198 , this PR buffers writes. What happens is secret conn
will often receive a write of (say) 65kb. But it will then split this
into 64 1024 byte frames. It does a write on each frame, which is a
syscall write. Then starts the next frame. Instead here, we buffer these
writes, and do a single syscall write at the end.

I'll come back with benchmarks from running on mainnet. The baseline for
this is netconn.Write taking 33% of the time in sendroutine.

![image](https://github.com/cometbft/cometbft/assets/6440154/b7a43188-a69b-41b1-9506-2f66a2d63a74)

![image](https://github.com/cometbft/cometbft/assets/6440154/95f15ff0-94b8-419c-8759-1155d63d32f8)

We should do something similar for reads, but due to how the
evil_secret_connection test makes non-black-box use of conn, that will
require notable test refactors.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Co-authored-by: Daniel <daniel.cason@informal.systems>
mattac21 pushed a commit that referenced this pull request Sep 10, 2025
Closes #3198

Similar to #3346 , buffers the secret connection reads. This is a
notable savings to CPU time. (25% of recvRoutine time, on Osmosis'
version)

---

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants