Skip to content

Fix issue with double-signing causing blockage#229

Closed
jaekwon wants to merge 8 commits intomasterfrom
bft_fix_2
Closed

Fix issue with double-signing causing blockage#229
jaekwon wants to merge 8 commits intomasterfrom
bft_fix_2

Conversation

@jaekwon
Copy link
Contributor

@jaekwon jaekwon commented Jul 4, 2016

This is not complete, but I'll document the thinking process while working on it.

The problem is that, the way things are, due to the way peers communicate to each other which votes it has, if a validator double-signs for a given height/round/step, then it can prevent its own other double-votes from propagating correct, e.g. "occluding".

The case in which this was discovered to be an issue was during a BFT fuzz test, where it halted the blockchain with less than 1/3 of Byzantine validators, due to the Commit being occluded.

What are all the issues related to double-signing occlusion?

  • 2/3 Precommits, or a Commit, from being broadcast and letting the network progress in height
  • 2/3 Prevotes, or a POL, from being broadcast and letting validators unlock to let consensus progress

What about when there are more than 1 2/3 majority for a prevote or precommit step, due to 1/3+ being Byzantine? Currently we don't have logic to automatically broadcast this issue. This would be a good time to implement that logic, so we fix BFT as well as the first step in identifying an attack. This isn't the same as Tendermint V2, where a Commit is fully justified, but, it would be ideal to work toward that, while we fix this issue.

So, we want to:

  • prevent occlusion when < 1/3 have double-signed
  • ensure broadcasting of double-signing by each validator
  • ensure broadcasting of 2 conflicting 2/3 precommits (forks)
  • maybe, ensure broadcasting of 2 conflicting 2/3 prevotes

I don't have the whole solution yet, but the first step might be to refactor the VoteSet structure to allow tracking of double-signing. But we need to make sure this is done in such a way that double-signers can't sign a million votes and make nodes run out of memory, or cause some other issue.

The general solution I have in mind is to do the following:

  • By default, allow VoteSet to keep track of up to 2 conflicting votes for each validator
  • Implement a method on VoteSet to allow, for each unique peer, one 2/3-majority blockhash to be tracked as extra, beyond the 2 maximum conflicting votes. So even though we have votes from ByzVal1 for BlockA and BlockB, if Peer1 says it has a 2/3-majority blockhash for BlockC, then we allow this VoteSet to also remember a third vote for ByzVal1 for BlockC.
  • It is the responsibility of the consensus reactor to manage calling the above method, as well as any cleanup methods, upon peer connection and disconnection, and communicate 2/3-majority blockhash announcements as necessary.

As I implement these features in VoteSet, I'll make sure the functionality is reusable for Tendermint V2, so that justifications can be requested for (H,R,S) as necessary.

@ebuchman
Copy link
Contributor

ebuchman commented Jul 5, 2016

Do you plan to change the peer bit arrays to track the block hash as well? Currently we only track the votes a peer has by index using a single bitarray, but we may instead want to use one bitarray per blockhash we've seen a vote for. Of course we should limit this to two unless there is good reason to admit more

return voteSet.votesBitArray.Copy()
}

// NOTE: if validator has conflicting votes, picks random.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pick random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ebuchman
Copy link
Contributor

ebuchman commented Aug 8, 2016

This looks good other than the comments I don't understand.

Let's add the bit for removing peers and merge this to develop? I don't want it to languish if "the full bft fix" might take a while longer.

@ebuchman ebuchman mentioned this pull request Aug 11, 2016
3 tasks
@jaekwon
Copy link
Contributor Author

jaekwon commented Sep 6, 2016

Comments have been fixed.
2eba1c1 and onwards still need review.
Removing peers, I don't think that's a priority yet, lets add an issue and do it later

BlockID: maj23,
}})
time.Sleep(peerQueryMaj23SleepDuration)
rs = conR.conS.GetRoundState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these rs = and prs = necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, removed on bft_fix_rebase branch

jaekwon added a commit that referenced this pull request Sep 16, 2016
@ebuchman ebuchman mentioned this pull request Oct 12, 2016
11 tasks
ebuchman pushed a commit that referenced this pull request Nov 15, 2016
@ebuchman ebuchman mentioned this pull request Nov 15, 2016
@ebuchman
Copy link
Contributor

rebased in #314

@ebuchman ebuchman closed this Nov 15, 2016
@ebuchman ebuchman deleted the bft_fix_2 branch August 24, 2017 06:05
ebuchman added a commit that referenced this pull request Jun 20, 2018
specify DeliverTx/CheckTx Tags format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants