-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
peer.Send() can fail and return an error upon timeout. This return value is not checked, and may lead to an unnecessary halt state.
// From a good node, the peer state of the hung node
// http://52.91.18.251:46657/dump_consensus_state
{
"Height": 1,
"Round": 0,
"Step": 4,
"StartTime": "2015-12-31T04:10:13.916Z",
"Proposal": false,
"ProposalBlockPartsHeader": {
"total": 0,
"hash": ""
},
"ProposalBlockParts": null,
"ProposalPOLRound": -1,
"ProposalPOL": {
"bits": 4,
"elems": [
0
]
},
"Prevotes": {
"bits": 4,
"elems": [
0
]
},
"Precommits": {
"bits": 4,
"elems": [
0
]
},
"LastCommitRound": -1,
"LastCommit": null,
"CatchupCommitRound": 2,
"CatchupCommit": {
"bits": 4,
"elems": [
14
]
}
}and
// From the hung node, the consensus state:
RoundState{
H:1 R:0 S:RoundStepPrevote
StartTime: 2015-12-31 04:10:13.746229076 +0000 UTC
CommitTime: 0001-01-01 00:00:00 +0000 UTC
Validators: ValidatorSet{
Proposer: Validator{1EF06943F4BF19A210672D97C1AA9918D5544443 PubKeyEd25519{15C27E4A78AB260BC87903BCD3A84B88491387443B24A96B15647E3C1B430861} 0-0-0 VP:21000000 A:0}
Validators:
Validator{1EF06943F4BF19A210672D97C1AA9918D5544443 PubKeyEd25519{15C27E4A78AB260BC87903BCD3A84B88491387443B24A96B15647E3C1B430861} 0-0-0 VP:21000000 A:0}
Validator{3AF5BF8915C109223E0A009F49470D12D0419E62 PubKeyEd25519{2D2885E36D7E8D9434032892917069855F1C24DC0B76D0FF043C5032407D3F68} 0-0-0 VP:21000000 A:0}
Validator{956E95DEEBF1D80889677F86B56FEFECC1F62082 PubKeyEd25519{DC4D76559214D573B269C95EAB5CAB5F87FC549D320C1DD1C522C44BDA000AC3} 0-0-0 VP:21000000 A:0}
Validator{DAFE04C5E432C47086C7D6ACA6BFCFC8556D4E3E PubKeyEd25519{C46E8B4CEF3C1F1E78614EE6B73C65313ACCF0ADB2D8FAFEC61FAC8427979BBD} 0-0-0 VP:21000000 A:0}
}
Proposal: \u003cnil\u003e
ProposalBlock: nil-PartSet nil-Block
LockedRound: 0
LockedBlock: nil-PartSet nil-Block
Votes: HeightVoteSet{H:1 R:0~1
VoteSet{H:1 R:0 T:1 +2/3:false BA{4:X___}}
VoteSet{H:1 R:0 T:2 +2/3:false BA{4:____}}
VoteSet{H:1 R:1 T:1 +2/3:false BA{4:____}}
VoteSet{H:1 R:1 T:2 +2/3:false BA{4:____}}
VoteSet{H:1 R:2 T:1 +2/3:false BA{4:____}}
VoteSet{H:1 R:2 T:2 +2/3:false BA{4:__XX}}
}
LastCommit: nil-VoteSet
LastValidators: ValidatorSet{
Proposer: nil-Validator
Validators:
}
}
```
It seems that the second validator's H:1 R:2 T:2 vote dropped, not received, or was invalid. Note that CatchupCommit bitarray element of “14” is binary 0111 (little endian). So we should be seeing _XXX on the hung node, but we’re seeing __XX.
If the sending of that vote failed (timed out), it would mark the vote as having been sent. https://github.com/eris-ltd/eris-db/blob/master/Godeps/_workspace/src/github.com/tendermint/tendermint/consensus/reactor.go#L617
This problem would manifest in poor network conditions with sparse connections.
This problem exists in consensus/reactor, but there may be similar issues in other reactors.