perf(consensus): Make late votes outside of last block commits not get to peerMsgQueue#3157
perf(consensus): Make late votes outside of last block commits not get to peerMsgQueue#3157ValarDragon wants to merge 8 commits intomainfrom
Conversation
cason
left a comment
There was a problem hiding this comment.
We should not spend much resources on messages (any type) which are old and useless for the consensus protocol. So, the goal here is perfectly legit.
But adding the solution to the consensus reactor is not really the best approach. In fact, the consensus reactor should not take the consensus state lock for every received vote, which is super inefficient in any case. While this PR hasn't introduced this behavior, if we really want to decouple reactor and state, this solution goes towards the wrong direction.
I would instead add this check to the state implementation. The goal is anyway to prevent writing this useless vote to the WAL and keeping the consensus lock for longer than needed.
.changelog/unreleased/improvements/3157-late-votes-dont-take-cs-mtx.md
Outdated
Show resolved
Hide resolved
My thinking right now is that we can get the lock removed with #3211 , and then have this able to proceed concurrently (not under the single threaded state.go workload). So we would rather it happen in the reactor |
* Backport cometbft#3211 * Fix Race * bp cometbft#3157 * Speedup tests that were hitting timeouts * bp cometbft#3161 * Fix data race * Mempool async processing * Forgot to commit important part * Add changelog
* Backport cometbft#3211 * Fix Race * bp cometbft#3157 * Speedup tests that were hitting timeouts * bp cometbft#3161 * Fix data race * Mempool async processing * Forgot to commit important part * Add changelog
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
cason
left a comment
There was a problem hiding this comment.
While I don't really like this solution, I do agree it can have good consequences in terms of performance.
My suggestions would render this code as similar as possible as the code in addVote() method of state.go.
internal/consensus/reactor.go
Outdated
| ps.SetHasVoteFromPeer(msg.Vote, height, valSize, lastCommitSize) | ||
| ps.SetHasVoteFromPeer(msg.Vote, csHeight, valSize, lastCommitSize) | ||
|
|
||
| // if vote is late, and is not a precommit for the last block, mark it late and return. |
There was a problem hiding this comment.
I would here, copy the logic from state.go:
if vote.Height < cs.Height || (vote.Height == cs.Height && vote.Round < cs.Round) {
}
| // if vote is late, and is not a precommit for the last block, mark it late and return. | ||
| isLate := msg.Vote.Height < csHeight || (msg.Vote.Height == csHeight && msg.Vote.Round < csRound) | ||
| if isLate { | ||
| notLastBlockPrecommit := msg.Vote.Type != types.PrecommitType || msg.Vote.Height != csHeight-1 |
There was a problem hiding this comment.
Again, the logic from state.go:
vote.Height+1 == cs.Height && vote.Type == types.PrecommitType
Then in the if you negate this condition.
.changelog/unreleased/improvements/3157-late-votes-dont-take-cs-mtx.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3157-late-votes-dont-take-cs-mtx.md
Outdated
Show resolved
Hide resolved
.changelog/unreleased/improvements/3157-late-votes-dont-take-cs-mtx.md
Outdated
Show resolved
Hide resolved
cason
left a comment
There was a problem hiding this comment.
We need to check a corner case, though.
If a vote is from the current height but from a previous round, we might still need it. We can use it for deciding a block, if its a Precommit, and for accepting a new Proposal, if it is a Prevote for the PolRound.
See test units TestState_PrevotePOLFromPreviousRound and TestCommitFromPreviousRound, for instance.
Co-authored-by: Daniel <daniel.cason@informal.systems>
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@cason could you check this? Then, if we need these votes, we can close this PR. If not, merge this PR. |
We would have to make a copy of those tests that run with a reactor, instead of manually providing messages to the consensus implementation. This is not so trivial, unfortunately. But in general I am pretty convinced that we need in some cases messages from previous rounds, these tests are just some examples. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
Should we table this until theres time to investigate the corner case? |
|
We cannot discard votes from previous rounds because the algorithm may require those votes to progress. The problem here is that our test units test consensus If you manage to adapt the two mentioned tests to run with a consensus |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
Do we mind closing this PR? |
Closes #3154 . Make late votes outside of last blocks precommits not get into the peer msg queue. This lowers the delay for processing more important messages from the peerMsgQueue, and lowers WAL write delays. As noted in the github issue, the number of late votes is actually quite significant.
I don't think theres any test updates needed (all relevant behavior is covered under existing tests. And if you look into the consensus logic, aside from WAL write, all we'd otherwise do is:
We could still do that if we wanted? But doesn't seem that important, since we have the mark late vote still being sent.
PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments