Skip to content

perf(consensus): Make late votes outside of last block commits not get to peerMsgQueue#3157

Closed
ValarDragon wants to merge 8 commits intomainfrom
dev/make_late_votes_not_get_consensus_locks
Closed

perf(consensus): Make late votes outside of last block commits not get to peerMsgQueue#3157
ValarDragon wants to merge 8 commits intomainfrom
dev/make_late_votes_not_get_consensus_locks

Conversation

@ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented May 31, 2024

Closes #3154 . Make late votes outside of last blocks precommits not get into the peer msg queue. This lowers the delay for processing more important messages from the peerMsgQueue, and lowers WAL write delays. As noted in the github issue, the number of late votes is actually quite significant.

I don't think theres any test updates needed (all relevant behavior is covered under existing tests. And if you look into the consensus logic, aside from WAL write, all we'd otherwise do is:

		cs.Logger.Debug("vote ignored and not added", "vote_height", vote.Height, "cs_height", cs.Height, "peer", peerID)
                return

We could still do that if we wanted? But doesn't seem that important, since we have the mark late vote still being sent.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested a review from a team as a code owner May 31, 2024 04:19
@ValarDragon ValarDragon requested a review from a team May 31, 2024 04:19
Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request May 31, 2024
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not spend much resources on messages (any type) which are old and useless for the consensus protocol. So, the goal here is perfectly legit.

But adding the solution to the consensus reactor is not really the best approach. In fact, the consensus reactor should not take the consensus state lock for every received vote, which is super inefficient in any case. While this PR hasn't introduced this behavior, if we really want to decouple reactor and state, this solution goes towards the wrong direction.

I would instead add this check to the state implementation. The goal is anyway to prevent writing this useless vote to the WAL and keeping the consensus lock for longer than needed.

@ValarDragon
Copy link
Contributor Author

ValarDragon commented Jun 10, 2024

But adding the solution to the consensus reactor is not really the best approach. In fact, the consensus reactor should not take the consensus state lock for every received vote, which is super inefficient in any case. While this PR hasn't introduced this behavior, if we really want to decouple reactor and state, this solution goes towards the wrong direction.

My thinking right now is that we can get the lock removed with #3211 , and then have this able to proceed concurrently (not under the single threaded state.go workload). So we would rather it happen in the reactor

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 15, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 17, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 17, 2024
* Backport cometbft#3211

* Fix Race

* bp cometbft#3157

* Speedup tests that were hitting timeouts

* bp cometbft#3161

* Fix data race

* Mempool async processing

* Forgot to commit important part

* Add changelog
czarcas7ic pushed a commit to osmosis-labs/cometbft that referenced this pull request Jun 19, 2024
* Backport cometbft#3211

* Fix Race

* bp cometbft#3157

* Speedup tests that were hitting timeouts

* bp cometbft#3161

* Fix data race

* Mempool async processing

* Forgot to commit important part

* Add changelog
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Jun 26, 2024
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't really like this solution, I do agree it can have good consequences in terms of performance.

My suggestions would render this code as similar as possible as the code in addVote() method of state.go.

ps.SetHasVoteFromPeer(msg.Vote, height, valSize, lastCommitSize)
ps.SetHasVoteFromPeer(msg.Vote, csHeight, valSize, lastCommitSize)

// if vote is late, and is not a precommit for the last block, mark it late and return.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would here, copy the logic from state.go:

        if vote.Height < cs.Height || (vote.Height == cs.Height && vote.Round < cs.Round) {

        }

// if vote is late, and is not a precommit for the last block, mark it late and return.
isLate := msg.Vote.Height < csHeight || (msg.Vote.Height == csHeight && msg.Vote.Round < csRound)
if isLate {
notLastBlockPrecommit := msg.Vote.Type != types.PrecommitType || msg.Vote.Height != csHeight-1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the logic from state.go:

vote.Height+1 == cs.Height && vote.Type == types.PrecommitType

Then in the if you negate this condition.

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check a corner case, though.

If a vote is from the current height but from a previous round, we might still need it. We can use it for deciding a block, if its a Precommit, and for accepting a new Proposal, if it is a Prevote for the PolRound.

See test units TestState_PrevotePOLFromPreviousRound and TestCommitFromPreviousRound, for instance.

itsdevbear pushed a commit to berachain/cometbft that referenced this pull request Jul 4, 2024
Co-authored-by: Daniel <daniel.cason@informal.systems>
@github-actions github-actions bot removed the stale For use by stalebot label Jul 26, 2024
@github-actions
Copy link

github-actions bot commented Aug 5, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 5, 2024
@melekes
Copy link
Collaborator

melekes commented Aug 5, 2024

We need to check a corner case, though.

If a vote is from the current height but from a previous round, we might still need it. We can use it for deciding a block, if its a Precommit, and for accepting a new Proposal, if it is a Prevote for the PolRound.

See test units TestState_PrevotePOLFromPreviousRound and TestCommitFromPreviousRound, for instance.

@cason could you check this? Then, if we need these votes, we can close this PR. If not, merge this PR.

@melekes melekes removed the stale For use by stalebot label Aug 5, 2024
@cason
Copy link

cason commented Aug 5, 2024

@cason could you check this? Then, if we need these votes, we can close this PR. If not, merge this PR.

We would have to make a copy of those tests that run with a reactor, instead of manually providing messages to the consensus implementation. This is not so trivial, unfortunately.

But in general I am pretty convinced that we need in some cases messages from previous rounds, these tests are just some examples.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 16, 2024
@ValarDragon
Copy link
Contributor Author

Should we table this until theres time to investigate the corner case?

@github-actions github-actions bot removed the stale For use by stalebot label Aug 20, 2024
@cason
Copy link

cason commented Aug 20, 2024

We cannot discard votes from previous rounds because the algorithm may require those votes to progress.

The problem here is that our test units test consensus State where we deliver the messages we want it to process. Adapting these tests to rely on a reactor, i.e., messages received from peers is not simple.

If you manage to adapt the two mentioned tests to run with a consensus Reactor and they do not fail I will be surprised but accept this change.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 31, 2024
@cason
Copy link

cason commented Sep 2, 2024

Do we mind closing this PR?

@github-actions github-actions bot removed the stale For use by stalebot label Sep 3, 2024
@melekes melekes closed this Sep 9, 2024
@zrbecker zrbecker deleted the dev/make_late_votes_not_get_consensus_locks branch February 7, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make late votes not enter consensus peer msg queue

4 participants