ValidatorSet change delayed by 1 block, and lite refactor by jaekwon · Pull Request #1756 · tendermint/tendermint

jaekwon · 2018-06-16T04:23:29Z

The transactions in the LastBlock are sufficient for validators to know what the next validator should be, because it can update the validator set based on the validator updates returned by EndBlock. But light clients don't know that, so if the application wants to change the validator set completely 100%, a general purpose tendermint light client can't validate it because it doesn't know how to interpret the txs which caused the validator changes.

Delaying validator set updates by 1 block allows the next pending validator set to be included in the block header as "NextValidators", so light clients can know how the validator set changed for a blockchain, even with arbitrary validator set changes.

This PR includes updates for package "lite" as well.

Also, fix consensus liveness issue.

jaekwon · 2018-06-16T04:29:14Z

consensus/state.go

-				cs.enterNewRound(height, vote.Round+1)
+				cs.enterNewRound(height, vote.Round)
+				cs.enterPrecommit(height, vote.Round)
+				cs.enterPrecommitWait(height, vote.Round)


This could be a separate PR but I made the change here so... LMK if you want me to fork it into a separate PR.

Can you link to the issue ? maybe @milosevic can too

jaekwon · 2018-06-16T04:30:44Z

consensus/state_test.go

 	propBlock := rs.ProposalBlock

-	<-voteCh // prevote
+	ensureVote(voteCh, h, r, types.VoteTypePrevote)


I only did this for this test case because it's the one that broke (awesome btw, tests are working as expected... pubsub or eventbus needs better debug tooling).

ensureVote etc instead of blindly receiving on the channel will help. For example, it would have helped me tremendously to know that a prevote was expected but a precommit came through.

melekes · 2018-06-16T12:49:34Z

Makefile

 test:
 	@echo "--> Running go test"
-	@go test $(PACKAGES)
+	@GOCACHE=off go test -p 1 $(PACKAGES)


Is there a certain reason we're restricting tests to just 1 CPU?

It doesn't restrict to 1 CPU afaik.
AFAIK it runs one test at a time, which mitigates issues around timeouts due to tests running too slow due to too many tests running simultaneously.

e.g. test make100 will succeed (except 1 minor issue) on my mac with this, whereas without, I see all kinds of nondeterministic test failures, which I'm sure we're seeing on Circle as well.

too slow due to too many tests running simultaneously.

I thought by default it's number of cores available https://golang.org/pkg/runtime/#GOMAXPROCS. So if you have 2 cores, it should be 2 tests running simultaneously.

You should be running with more cores than 1... and some tests test multiple reactors etc and have real-time timeouts (for lack of virtual deterministic time) that would be reasonable in a -p 1 scenario (utilizing potentially many cores if needed by the test), but not reasonable on Circle or my mac.

The target for Tendermint is for multicore machines. I see no practical problem with setting -p 1... the tests pass fairly quickly anyways. The deterministic success makes us waste less time trying to figure out why something broke.

The deterministic success makes us waste less time trying to figure out why something broke.

fair argument. thank you

melekes · 2018-06-16T12:55:09Z

consensus/replay.go

+	// If appBlockHeight == 0 it means that we are at genesis and hence should send InitChain.
 	if appBlockHeight == 0 {
-		validators := types.TM2PB.Validators(state.Validators)
+		nvals := types.TM2PB.Validators(state.Validators) // state.Validators would work too.


what n in nvals stands for? the first thing coming to my mind is number. maybe we can come up with a better name, like pbVals or just vals? Maybe we can just inline this code (i.e. no variable at all)

melekes · 2018-06-16T14:32:33Z

lite/client/provider.go

-	if !bytes.Equal(hash, vhash) {
-		return fc, liteErr.ErrCommitNotFound()
+	if maxHeight != 0 && maxHeight < minHeight {
+		err = fmt.Errorf("need maxHeight == 0 or minHeight <= maxHeight, got %v and %v",


got %v and %v" => got min: %v and max: %v"

melekes · 2018-06-17T08:31:44Z

lite/commit.go

-		return 0
+		SignedHeader:   signedHeader,
+		Validators:     valset,
+		NextValidators: nvalset,


nextValSet. n is too ambiguous

…client

ebuchman

Nice work with the cleanup.

Note there's a number of related issues tagged under 'litecli' that we'll still need to follow up on: https://github.com/tendermint/tendermint/labels/litecli

ebuchman · 2018-06-22T17:26:40Z

consensus/replay.go

+	// If appBlockHeight == 0 it means that we are at genesis and hence should send InitChain.
 	if appBlockHeight == 0 {
-		validators := types.TM2PB.Validators(state.Validators)
+		nextVals := types.TM2PB.Validators(state.Validators) // state.Validators would work too.


state.Validators would work too ?

ebuchman · 2018-06-22T17:28:50Z

consensus/state.go

-				cs.enterNewRound(height, vote.Round+1)
+				cs.enterNewRound(height, vote.Round)
+				cs.enterPrecommit(height, vote.Round)
+				cs.enterPrecommitWait(height, vote.Round)


Can you link to the issue ? maybe @milosevic can too

ebuchman · 2018-06-23T13:42:37Z

lite/client/provider.go

-	)
-	return fc, nil
+	valset = types.NewValidatorSet(res.Validators)
+	valset.TotalVotingPower() // to test deep equality.


what is this testing?

not used, removing.

ebuchman · 2018-06-23T13:44:32Z

lite/client/provider.go

-	fc.Commit = CommitFromResult(commit)
+// This does no validation.
+func (p *provider) fillFullCommit(signedHeader types.SignedHeader) (fc lite.FullCommit, err error) {
+	fc.SignedHeader = signedHeader


would be clearer if we just use NewFullCommit at the end of the function, and either remove the named return or use it for the error cases

ebuchman · 2018-06-23T13:45:44Z

lite/commit.go

+// the validator set which signed the commit, and the next validator set. The
+// next validator set (which is proven from the block header) allows us to
+// revert to block-by-block updating of lite certifier's latest validator set,
+// even in the face of arbitrarily power changes.


"arbitrarily large"

ebuchman · 2018-06-23T14:18:33Z

lite/inquiring_certifier.go

+		}
+		// Maybe we have nothing to do.
+		if tfc.Height() == h {
+			return FullCommit{}, nil


empty ? the comment says Returns nil iff we successfully verify and persist a full commit. shouldn't we return the tfc here ?

nil error... and yeah we could return tfc there.

ebuchman · 2018-06-23T14:19:36Z

lite/inquiring_certifier.go

-		return liteErr.ErrNoPathFound()
+
+	// If sfc.Height() != h, we can't do it.
+	if sfc.Height() != h {


This is just a sanity check on LatestFullCommit, right ? since we set min and max to h already there ?

yes. But not so much of a sanity check in the technical sense because it's not insane to assume that the source will give us insane responses.

ebuchman · 2018-06-23T14:21:01Z

lite/inquiring_certifier.go

+			return sfc, nil
+		} else {
+			// Handle special case when err is ErrTooMuchChange.
+			if lerr.IsErrTooMuchChange(err) {


combine it with the else to be an else if and move the other return into an else

ebuchman · 2018-06-23T14:30:12Z

rpc/core/types/responses.go

-	// one level in the json output
-	types.SignedHeader
-	CanonicalCommit bool `json:"canonical"`
+	types.SignedHeader `json:"signed_header"`


NOTE breaking

ebuchman · 2018-06-23T14:36:14Z

types/proposal.go

 		Height:           height,
 		Round:            round,
-		Timestamp:        time.Now().UTC(),
+		Timestamp:        time.Now().Round(0).UTC(),


we should introduce a Now() function in this folder that we can call everywhere

ebuchman · 2018-06-23T15:49:40Z

Oh we should also make the linter happy.

Should we retarget this PR against develop ? @cwgoes @liamsi whats happening with that branch. Also I can't run make build-linux on my mac ... get

# github.com/tendermint/tendermint/vendor/github.com/zondax/ledger-goclient
vendor/github.com/zondax/ledger-goclient/ledger.go:76:18: undefined: hid.Devices
vendor/github.com/zondax/ledger-goclient/ledger.go:99:18: undefined: hid.Devices

cwgoes · 2018-06-25T23:17:24Z

Should we retarget this PR against develop?

Yes; my PR was dropped in favor of #1782 (which has most of the same changes).

The second error should also be fixed when this PR is retargeted; we're moving that Ledger package into the SDK.

melekes · 2018-06-26T07:37:12Z

Superseded by #1815

…ns (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog

…ns (backport tendermint#1756) (tendermint#1773) * [e2e] Fixes prepareProposal not to return oversized set of transactions (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog (cherry picked from commit 0bf3f0a) # Conflicts: # test/e2e/app/app.go * Resolves conflict --------- Co-authored-by: lasaro <lasaro@informal.systems>

…ns (backport tendermint#1756) (tendermint#1774) * [e2e] Fixes prepareProposal not to return oversized set of transactions (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog (cherry picked from commit 0bf3f0a) # Conflicts: # test/e2e/app/app.go * Solve conflict --------- Co-authored-by: lasaro <lasaro@informal.systems>

jaekwon requested review from ebuchman and melekes as code owners June 16, 2018 04:23

jaekwon mentioned this pull request Jun 16, 2018

WIP: Delaying a validator set update by 1 block. #1638

Closed

jaekwon added 2 commits June 15, 2018 21:27

Delay validator set changes by 1 block.

36f4e41

Refactor "lite" to handle delayed validator set changes.

140a3e7

Also, fix consensus liveness issue.

jaekwon force-pushed the jae/literefactor3 branch from d9ecf32 to 140a3e7 Compare June 16, 2018 04:27

jaekwon commented Jun 16, 2018

View reviewed changes

melekes reviewed Jun 17, 2018

View reviewed changes

jaekwon added 2 commits June 19, 2018 23:55

Fixes from review

17c2e3c

Garbage collect DBProvider (unoptimized); Certifier creation takes a …

c1e8997

…client

melekes mentioned this pull request Jun 21, 2018

If I wanna to add a validator to a living net in tendermint with four validators, What should I do #1291

Closed

ebuchman reviewed Jun 23, 2018

View reviewed changes

jaekwon mentioned this pull request Jun 26, 2018

ValidatorSet change delayed by 1 block, and lite refactor (#2) #1815

Merged

melekes closed this Jun 26, 2018

melekes deleted the jae/literefactor3 branch June 26, 2018 07:38

Conversation

jaekwon commented Jun 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebuchman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebuchman commented Jun 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwgoes commented Jun 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melekes commented Jun 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ebuchman commented Jun 23, 2018 •

edited

Loading

cwgoes commented Jun 25, 2018 •

edited

Loading