-
Notifications
You must be signed in to change notification settings - Fork 780
Description
Feature Request
Summary
Whenever we want to send a packet, we spend a considerable amount of time in CPU overheads (that are actually under the flowrate limitation, and thereby hurting throughput). When its time to send, we have to spend awhile:
- re-encoding data
- doing proto marshalling
- allocating buffers
This time is all under flow-rate limitation and a single threaded bottleneck for communicating to this peer (as far as I understand)
Time delays
If we look at the time here, there is a few parts:
- We are spending quite sizable serial time "triple-wrapping" and proto marshalling each packet.
- Making a new stack object here: https://github.com/cometbft/cometbft/blob/main/p2p/conn/connection.go#L836
- Putting it on heap here (the
&call does this): https://github.com/cometbft/cometbft/blob/main/p2p/conn/connection.go#L854 - Making two new wrappers here, both get heap allocated due to
&indirections: https://github.com/cometbft/cometbft/blob/main/p2p/conn/connection.go#L918-L922 - We then proto marshal this triple wrapper in the proto writer.
- Proto writer is supposed to use an internal buffer. We make a new object and then must re-allocate the internal buffer on every packet.
- New writer per packet: https://github.com/cometbft/cometbft/blob/main/p2p/conn/connection.go#L854
- The buffer we re-allocate every single time: https://github.com/cometbft/cometbft/blob/main/internal/protoio/writer.go#L59
- A bit heavy handed interface matching costs here, but I think this should be re-measured after other things are improved.
Re: The triple wrapping problem. It seems this is something we can backwards compatibly speedup with some "hand rolled" mimicking the proto-encoding tricks, or having less heap allocations. Neither feels elegant right now though. I hope other people have better solutions. Longer term, ideally we just don't triple wrap.
For the new internal buffer per packet, ideally we can just make the channel use the already allocated protowriter that exists in sendRoutine?
There are other flow rate problems that aren't part of this, but I leave that to another github issue.
