Skip to content

consensus: Wait timeout precommit before starting new round#2493

Merged
ebuchman merged 4 commits intodevelopfrom
zarko/1690-wait-timeout-precommit-before-new-round
Oct 4, 2018
Merged

consensus: Wait timeout precommit before starting new round#2493
ebuchman merged 4 commits intodevelopfrom
zarko/1690-wait-timeout-precommit-before-new-round

Conversation

@milosevic
Copy link
Contributor

@milosevic milosevic commented Sep 26, 2018

Pull in ensureVote test function from #2132

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

Fix #1690.

@milosevic
Copy link
Contributor Author

Note that there is an open PR (#2132) that mentions this issue, but it's actually more related to #1745. I have pulled in ensureVote test function from #2132 (sorry for squashing it, needed to fix some stuff after merged #2132 here). #1745 will be addressed in the follow up PR so we can probably close #2132 for those two new PRs.

@milosevic milosevic self-assigned this Sep 26, 2018
@xla xla added this to the launch milestone Sep 26, 2018
@xla xla changed the title Wait timeout precommit before starting new round consensus: Wait timeout precommit before starting new round Sep 26, 2018
@xla xla added T:bug Type Bug (Confirmed) C:consensus Component: Consensus labels Sep 26, 2018
@codecov-io
Copy link

codecov-io commented Sep 26, 2018

Codecov Report

Merging #2493 into develop will increase coverage by 0.01%.
The diff coverage is 83.33%.

@@             Coverage Diff             @@
##           develop    #2493      +/-   ##
===========================================
+ Coverage    61.29%   61.31%   +0.01%     
===========================================
  Files          202      202              
  Lines        16720    16658      -62     
===========================================
- Hits         10249    10214      -35     
+ Misses        5602     5582      -20     
+ Partials       869      862       -7
Impacted Files Coverage Δ
p2p/peer.go 62.73% <ø> (ø) ⬆️
p2p/test_util.go 60.15% <0%> (ø) ⬆️
p2p/conn/connection.go 79.71% <0%> (+0.28%) ⬆️
config/config.go 69.28% <100%> (ø) ⬆️
consensus/state.go 77.69% <100%> (+0.39%) ⬆️
consensus/reactor.go 73.9% <100%> (+0.77%) ⬆️
libs/log/tmfmt_logger.go 85.71% <0%> (-2.97%) ⬇️
blockchain/pool.go 65.73% <0%> (-0.7%) ⬇️
libs/db/fsdb.go 68.47% <0%> (+0.42%) ⬆️
... and 3 more

@milosevic milosevic force-pushed the zarko/1690-wait-timeout-precommit-before-new-round branch from 6446be9 to 1d8f155 Compare September 26, 2018 12:42
Copy link
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :octocat: :shipit: 🍡

signAddVotes(cs1, types.VoteTypePrecommit, nil, types.PartSetHeader{}, vs2, vs3, vs4)

<-timeoutWaitCh
<-newRoundCh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to wrap these channel reads with timeouts to prevent the tests from hanging accidently and ease in debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will use the same pattern as in ensureX methods to read from round and timeout channels.

if cs.config.SkipTimeoutCommit && precommits.HasAll() {
// if we have all the votes now,
// go straight to new round (skip timeout commit)
// cs.scheduleTimeout(time.Duration(0), cs.Height, 0, cstypes.RoundStepNewHeight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this commented code line around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. @melekes @ebuchman?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so

// cs.scheduleTimeout(time.Duration(0), cs.Height, 0, cstypes.RoundStepNewHeight)
cs.enterNewRound(cs.Height, 0)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insignificant newline.

}
vote := edv.Vote
if vote.Height != height {
panic(fmt.Sprintf("expected height %v, got %v", height, vote.Height))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of panics this function should return errors, which makes it easier to report back in the tests.

@xla xla requested review from jaekwon and removed request for jaekwon September 26, 2018 13:30
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@milosevic milosevic force-pushed the zarko/1690-wait-timeout-precommit-before-new-round branch 4 times, most recently from 2e9e654 to 7a60808 Compare September 27, 2018 12:56
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Would be good to test the type (edv, ok := v.(types.EventDataUnlock)) in functions like ensureNewUnlock, .. etc The same thing we do in ensureVote

errorMessage string) {
timer := time.NewTimer(timeout)
select {
case <-timer.C:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<-time.After(timeout)

voteType byte) {
timer := time.NewTimer(ensureTimeout)
select {
case <-timer.C:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<-time.After(ensureTimeout)

edv, ok := v.(types.EventDataVote)
if !ok {
panic(fmt.Sprintf("expected a *types.Vote, got %v. wrong subscription channel?",
t.Error(fmt.Sprintf("expected a *types.Vote, got %v. wrong subscription channel?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Errorf(...)

@milosevic milosevic force-pushed the zarko/1690-wait-timeout-precommit-before-new-round branch from 7a60808 to 300330a Compare October 2, 2018 07:53
@milosevic milosevic force-pushed the zarko/1690-wait-timeout-precommit-before-new-round branch from 298f60b to bb8411b Compare October 2, 2018 08:26
@melekes melekes requested a review from jaekwon October 3, 2018 07:32
// cs.scheduleTimeout(time.Duration(0), cs.Height, 0, cstypes.RoundStepNewHeight)
cs.enterNewRound(cs.Height, 0)
}
if ok && len(blockID.Hash) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len(blockID.Hash) == 0, don't we need to explicitly enterPrecommitWait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always enterPrecommitWait upon receiving 2f+1 any precommits. We don't need to treat 2f+1 Precommit nil as a special case. We do it here:

tendermint/consensus/state.go

Lines 1657 to 1660 in bb8411b

} else if cs.Round <= vote.Round && precommits.HasTwoThirdsAny() {
cs.enterNewRound(height, vote.Round)
cs.enterPrecommit(height, vote.Round)
cs.enterPrecommitWait(height, vote.Round)
. Only in case we have 2f+1 precommits for some block we go straight to commit and we don't need to trigger timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok right, it's taken care of by the else if. The first condition is only for +2/3 majority for a block hash. If there's a +2/3 majority for nil, it will be treated by the second condition. Thanks.

case <-stepCh:
panic("We should be stuck waiting, not moving to the next step")
case <-ch:
t.Error(errorMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want these to be panics for now until we build something more intelligent, otherwise we won't know which line in the actual test function caused the failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively to the error check in the actual test and t.Fatalf there. I know some may be concerned about readability but it's a test after all and the execution should be clear.

Copy link
Contributor Author

@milosevic milosevic Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I would also say to leave panic. Those tests are mostly covering critical algorithm scenarios so it is good if they are readable. Tests will probably become shorter after consensus state refactor and then we can do it properly.

@ebuchman ebuchman merged commit 12675ec into develop Oct 4, 2018
@ebuchman ebuchman deleted the zarko/1690-wait-timeout-precommit-before-new-round branch October 4, 2018 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:consensus Component: Consensus T:bug Type Bug (Confirmed)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants