perf(consensus): Make reactor check for duplicate/old block parts#3161
perf(consensus): Make reactor check for duplicate/old block parts#3161ValarDragon wants to merge 6 commits intocometbft:mainfrom
Conversation
| // once we have the full block. | ||
| func (cs *State) addProposalBlockPart(msg *BlockPartMessage, peerID p2p.ID) (added bool, err error) { | ||
| height, round, part := msg.Height, msg.Round, msg.Part | ||
| // TODO: better handle block parts for future heights, by saving them and processing them later. |
There was a problem hiding this comment.
do you mind opening an issue for this?
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
* Backport cometbft#3211 * Fix Race * bp cometbft#3157 * Speedup tests that were hitting timeouts * bp cometbft#3161 * Fix data race * Mempool async processing * Forgot to commit important part * Add changelog
* Backport cometbft#3211 * Fix Race * bp cometbft#3157 * Speedup tests that were hitting timeouts * bp cometbft#3161 * Fix data race * Mempool async processing * Forgot to commit important part * Add changelog
cason
left a comment
There was a problem hiding this comment.
We could change our mind regarding the Block we are looking for at any given instant.
We usually initialize cs.ProposalBlockParts upon receiving the first valid Proposal message for that round. But we might reset it upon receiving 2/3+ Prevotes or Precommits for a different block.
As a result, we might drop a block part that matches the new Block we are looking for, because we didn't change the block we are looking for because we are still processing votes that might made us to change our mind.
|
@cason are you saying that because we can change our minds about I'm in favor of this PR because, in most cases, we don't change our minds, and preventing duplicate parts from entering WAL as a perf optimization makes sense to me. |
So, first, I don't think we should write block parts to the WAL at all. Except if we are using the full block. Namely, we either write the full block, or nothing. The best way to achieve this is to move block propagation outside of the consensus logic, which is the right direction to go. At the moment, what can happen, and we might even write a test unit for that is:
So, if I receive a part of block |
@ValarDragon did you roll this out in production? Are you observing any slowness or blocks failing to commit as a consequence of this PR? |
|
This should be done as part of this #2127 |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
cc @ValarDragon |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
This has been in prod on Osmosis for ~2-3 months, no issues observed :) |
* bp cometbft#3161 * Fix data race * Fix bug due to there being a consistency check buried in Verify * Fix accidental rsMtx revert
|
We are probably not seen issues derived from this optimization in stable systems. This happens in a real corner case when we change the block we are looking at after receiving prevotes for that block, that does not match the proposed block. Which essentially requires Byzantine agents. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closes #3158
I made a common method between the reactor and consensus. In the future may add another argument to indicate if it should check the block part, but decided to not try that until I got vote signature checking into the reactor, as thats the bottleneck for Osmosis right now. (I suspect WAL is dominated from here though, will be straightforward to check once this is merged / in a patch release that I can A/B test)
The consensus state.go logic should be unchanged.
PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments