perf(consensus): Make reactor check for duplicate/old block parts by ValarDragon · Pull Request #3161 · cometbft/cometbft

ValarDragon · 2024-05-31T14:21:27Z

I made a common method between the reactor and consensus. In the future may add another argument to indicate if it should check the block part, but decided to not try that until I got vote signature checking into the reactor, as thats the bottleneck for Osmosis right now. (I suspect WAL is dominated from here though, will be straightforward to check once this is merged / in a patch release that I can A/B test)

The consensus state.go logic should be unchanged.

PR checklist

Tests written/updated
Changelog entry added in .changelog (we use unclog to manage our changelog)
Updated relevant documentation (docs/ or spec/) and code comments
Title follows the Conventional Commits spec

…metbft#3161

melekes

Thanks @ValarDragon ❤️

melekes · 2024-06-03T07:50:46Z

internal/consensus/state.go

+// once we have the full block.
+func (cs *State) addProposalBlockPart(msg *BlockPartMessage, peerID p2p.ID) (added bool, err error) {
+	height, round, part := msg.Height, msg.Round, msg.Part
+	// TODO: better handle block parts for future heights, by saving them and processing them later.


do you mind opening an issue for this?

github-actions · 2024-06-14T00:14:19Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

* Backport cometbft#3211 * Fix Race * bp cometbft#3157 * Speedup tests that were hitting timeouts * bp cometbft#3161 * Fix data race * Mempool async processing * Forgot to commit important part * Add changelog

cason

We could change our mind regarding the Block we are looking for at any given instant.

We usually initialize cs.ProposalBlockParts upon receiving the first valid Proposal message for that round. But we might reset it upon receiving 2/3+ Prevotes or Precommits for a different block.

As a result, we might drop a block part that matches the new Block we are looking for, because we didn't change the block we are looking for because we are still processing votes that might made us to change our mind.

melekes · 2024-07-11T06:21:23Z

@cason are you saying that because we can change our minds about cs.ProposalBlockParts at any moment, we should still write "invalid" block parts to WAL because between the time we write them and try to add them to cs.ProposalBlockParts, cs.ProposalBlockParts might have changed and "invalid" block part may "become valid"?

I'm in favor of this PR because, in most cases, we don't change our minds, and preventing duplicate parts from entering WAL as a perf optimization makes sense to me.

cason · 2024-07-12T08:14:55Z

@cason are you saying that because we can change our minds about cs.ProposalBlockParts at any moment, we should still write "invalid" block parts to WAL because between the time we write them and try to add them to cs.ProposalBlockParts, cs.ProposalBlockParts might have changed and "invalid" block part may "become valid"?

I'm in favor of this PR because, in most cases, we don't change our minds, and preventing duplicate parts from entering WAL as a perf optimization makes sense to me.

So, first, I don't think we should write block parts to the WAL at all. Except if we are using the full block. Namely, we either write the full block, or nothing. The best way to achieve this is to move block propagation outside of the consensus logic, which is the right direction to go.

At the moment, what can happen, and we might even write a test unit for that is:

I am looking for block X, because we received a Proposal for id(X)
But I am receiving votes (either type) for id(Y) != id(X)
When I get 2/3+ of such votes, I will drop the block parts and part set of X and replace the block part set structure to accept block Y
Meanwhile, I start receiving parts of block Y

So, if I receive a part of block Y at the reactor level but the votes that will make me change my mind are still on the queue to be processed, we are rejecting, basing on our current state, something we should accept, that we need by the way. Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

melekes · 2024-07-12T08:49:05Z

Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

@ValarDragon did you roll this out in production? Are you observing any slowness or blocks failing to commit as a consequence of this PR?

cason · 2024-07-12T09:11:57Z

This should be done as part of this #2127

github-actions · 2024-08-05T00:15:58Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

melekes · 2024-08-05T05:45:08Z

Of course, if we don't mark that part as received, we should eventually get it again. But this is something to be tested, in my opinion.

@ValarDragon did you roll this out in production? Are you observing any slowness or blocks failing to commit as a consequence of this PR?

cc @ValarDragon

github-actions · 2024-08-16T00:15:13Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ValarDragon · 2024-08-19T19:58:01Z

This has been in prod on Osmosis for ~2-3 months, no issues observed :)

* bp cometbft#3161 * Fix data race * Fix bug due to there being a consistency check buried in Verify * Fix accidental rsMtx revert

cason · 2024-08-20T11:30:21Z

We are probably not seen issues derived from this optimization in stable systems. This happens in a real corner case when we change the block we are looking at after receiving prevotes for that block, that does not match the proposed block. Which essentially requires Byzantine agents.

github-actions · 2024-08-31T00:15:53Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Make reactor check for duplicate/old block parts

71e1372

ValarDragon requested a review from a team as a code owner May 31, 2024 14:21

ValarDragon requested a review from a team May 31, 2024 14:21

Add changelog

039631f

ValarDragon changed the title ~~Make reactor check for duplicate/old block parts~~ perf(consensus): Make reactor check for duplicate/old block parts May 31, 2024

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request May 31, 2024

perf(consensus): Make reactor check for duplicate/old block parts co…

71186a0

…metbft#3161

melekes approved these changes Jun 3, 2024

View reviewed changes

github-actions bot added the stale For use by stalebot label Jun 14, 2024

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 16, 2024

bp cometbft#3161

8d66a86

ValarDragon mentioned this pull request Jun 16, 2024

Backport consensus improvements osmosis-labs/cometbft#108

Merged

3 tasks

ValarDragon and others added 3 commits June 15, 2024 22:42

Merge branch 'main' into dev/make_reactor_check_duplicate_blockpart

7081ec4

Fix bug

54d8c7c

Improve comment

bef1368

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Jun 17, 2024

bp cometbft#3161

4c7c73e

Fix one more bug

b775c57

melekes approved these changes Jun 26, 2024

View reviewed changes

cason suggested changes Jul 1, 2024

View reviewed changes

itsdevbear pushed a commit to berachain/cometbft that referenced this pull request Jul 4, 2024

bp cometbft#3161

468c6f0

melekes self-requested a review July 10, 2024 08:25

github-actions bot removed the stale For use by stalebot label Jul 26, 2024

github-actions bot added the stale For use by stalebot label Aug 5, 2024

melekes removed the stale For use by stalebot label Aug 5, 2024

github-actions bot added the stale For use by stalebot label Aug 16, 2024

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024

bp cometbft#3161

a29aad3

ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024

Move blockpart redundancy checks to reactor (#138)

fd16bab

* bp cometbft#3161 * Fix data race * Fix bug due to there being a consistency check buried in Verify * Fix accidental rsMtx revert

github-actions bot removed the stale For use by stalebot label Aug 20, 2024

github-actions bot added the stale For use by stalebot label Aug 31, 2024

github-actions bot closed this Sep 4, 2024

Conversation

ValarDragon commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

melekes left a comment

Choose a reason for hiding this comment

Uh oh!

melekes Jun 3, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 14, 2024

Uh oh!

cason left a comment

Choose a reason for hiding this comment

Uh oh!

melekes commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cason commented Jul 12, 2024

Uh oh!

melekes commented Jul 12, 2024

Uh oh!

cason commented Jul 12, 2024

Uh oh!

github-actions bot commented Aug 5, 2024

Uh oh!

melekes commented Aug 5, 2024

Uh oh!

github-actions bot commented Aug 16, 2024

Uh oh!

ValarDragon commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cason commented Aug 20, 2024

Uh oh!

github-actions bot commented Aug 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ValarDragon commented May 31, 2024 •

edited

Loading

melekes commented Jul 11, 2024 •

edited

Loading

ValarDragon commented Aug 19, 2024 •

edited

Loading