Skip to content

perf(consensus): Make reactor check for duplicate/old block parts#3161

Closed
ValarDragon wants to merge 6 commits intocometbft:mainfrom
osmosis-labs:dev/make_reactor_check_duplicate_blockpart
Closed

perf(consensus): Make reactor check for duplicate/old block parts#3161
ValarDragon wants to merge 6 commits intocometbft:mainfrom
osmosis-labs:dev/make_reactor_check_duplicate_blockpart

Conversation

@ValarDragon
Copy link
Contributor

@ValarDragon ValarDragon commented May 31, 2024

Closes #3158

I made a common method between the reactor and consensus. In the future may add another argument to indicate if it should check the block part, but decided to not try that until I got vote signature checking into the reactor, as thats the bottleneck for Osmosis right now. (I suspect WAL is dominated from here though, will be straightforward to check once this is merged / in a patch release that I can A/B test)

The consensus state.go logic should be unchanged.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested a review from a team as a code owner May 31, 2024 14:21
@ValarDragon ValarDragon requested a review from a team May 31, 2024 14:21
@ValarDragon ValarDragon changed the title Make reactor check for duplicate/old block parts perf(consensus): Make reactor check for duplicate/old block parts May 31, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request May 31, 2024
Copy link
Collaborator

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ValarDragon ❤️

// once we have the full block.
func (cs *State) addProposalBlockPart(msg *BlockPartMessage, peerID p2p.ID) (added bool, err error) {
height, round, part := msg.Height, msg.Round, msg.Part
// TODO: better handle block parts for future heights, by saving them and processing them later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind opening an issue for this?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Jun 14, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 16, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 17, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 17, 2024
* Backport cometbft#3211

* Fix Race

* bp cometbft#3157

* Speedup tests that were hitting timeouts

* bp cometbft#3161

* Fix data race

* Mempool async processing

* Forgot to commit important part

* Add changelog
czarcas7ic pushed a commit to osmosis-labs/cometbft that referenced this pull request Jun 19, 2024
* Backport cometbft#3211

* Fix Race

* bp cometbft#3157

* Speedup tests that were hitting timeouts

* bp cometbft#3161

* Fix data race

* Mempool async processing

* Forgot to commit important part

* Add changelog
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change our mind regarding the Block we are looking for at any given instant.

We usually initialize cs.ProposalBlockParts upon receiving the first valid Proposal message for that round. But we might reset it upon receiving 2/3+ Prevotes or Precommits for a different block.

As a result, we might drop a block part that matches the new Block we are looking for, because we didn't change the block we are looking for because we are still processing votes that might made us to change our mind.

itsdevbear pushed a commit to berachain/cometbft that referenced this pull request Jul 4, 2024
@melekes melekes self-requested a review July 10, 2024 08:25
@melekes
Copy link
Collaborator

melekes commented Jul 11, 2024

@cason are you saying that because we can change our minds about cs.ProposalBlockParts at any moment, we should still write "invalid" block parts to WAL because between the time we write them and try to add them to cs.ProposalBlockParts, cs.ProposalBlockParts might have changed and "invalid" block part may "become valid"?

I'm in favor of this PR because, in most cases, we don't change our minds, and preventing duplicate parts from entering WAL as a perf optimization makes sense to me.

@cason
Copy link

cason commented Jul 12, 2024

@cason are you saying that because we can change our minds about cs.ProposalBlockParts at any moment, we should still write "invalid" block parts to WAL because between the time we write them and try to add them to cs.ProposalBlockParts, cs.ProposalBlockParts might have changed and "invalid" block part may "become valid"?

I'm in favor of this PR because, in most cases, we don't change our minds, and preventing duplicate parts from entering WAL as a perf optimization makes sense to me.

So, first, I don't think we should write block parts to the WAL at all. Except if we are using the full block. Namely, we either write the full block, or nothing. The best way to achieve this is to move block propagation outside of the consensus logic, which is the right direction to go.

At the moment, what can happen, and we might even write a test unit for that is:

  • I am looking for block X, because we received a Proposal for id(X)
  • But I am receiving votes (either type) for id(Y) != id(X)
  • When I get 2/3+ of such votes, I will drop the block parts and part set of X and replace the block part set structure to accept block Y
  • Meanwhile, I start receiving parts of block Y

So, if I receive a part of block Y at the reactor level but the votes that will make me change my mind are still on the queue to be processed, we are rejecting, basing on our current state, something we should accept, that we need by the way. Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

@melekes
Copy link
Collaborator

melekes commented Jul 12, 2024

Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

@ValarDragon did you roll this out in production? Are you observing any slowness or blocks failing to commit as a consequence of this PR?

@cason
Copy link

cason commented Jul 12, 2024

This should be done as part of this #2127

@github-actions github-actions bot removed the stale For use by stalebot label Jul 26, 2024
@github-actions
Copy link

github-actions bot commented Aug 5, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 5, 2024
@melekes melekes removed the stale For use by stalebot label Aug 5, 2024
@melekes
Copy link
Collaborator

melekes commented Aug 5, 2024

Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

@ValarDragon did you roll this out in production? Are you observing any slowness or blocks failing to commit as a consequence of this PR?

cc @ValarDragon

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 16, 2024
@ValarDragon
Copy link
Contributor Author

ValarDragon commented Aug 19, 2024

This has been in prod on Osmosis for ~2-3 months, no issues observed :)

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
* bp cometbft#3161

* Fix data race

* Fix bug due to there being a consistency check buried in Verify

* Fix accidental rsMtx revert
@github-actions github-actions bot removed the stale For use by stalebot label Aug 20, 2024
@cason
Copy link

cason commented Aug 20, 2024

We are probably not seen issues derived from this optimization in stable systems. This happens in a real corner case when we change the block we are looking at after receiving prevotes for that block, that does not match the proposed block. Which essentially requires Byzantine agents.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Aug 31, 2024
@github-actions github-actions bot closed this Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale For use by stalebot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make late block parts not enter consensus peer message queue

3 participants