ValidatorSet change delayed by 1 block, and lite refactor#1756
ValidatorSet change delayed by 1 block, and lite refactor#1756jaekwon wants to merge 4 commits intocwgoes/go-crypto-interface-changesfrom
Conversation
Also, fix consensus liveness issue.
d9ecf32 to
140a3e7
Compare
| cs.enterNewRound(height, vote.Round+1) | ||
| cs.enterNewRound(height, vote.Round) | ||
| cs.enterPrecommit(height, vote.Round) | ||
| cs.enterPrecommitWait(height, vote.Round) |
There was a problem hiding this comment.
This could be a separate PR but I made the change here so... LMK if you want me to fork it into a separate PR.
There was a problem hiding this comment.
Can you link to the issue ? maybe @milosevic can too
| propBlock := rs.ProposalBlock | ||
|
|
||
| <-voteCh // prevote | ||
| ensureVote(voteCh, h, r, types.VoteTypePrevote) |
There was a problem hiding this comment.
I only did this for this test case because it's the one that broke (awesome btw, tests are working as expected... pubsub or eventbus needs better debug tooling).
ensureVote etc instead of blindly receiving on the channel will help. For example, it would have helped me tremendously to know that a prevote was expected but a precommit came through.
| test: | ||
| @echo "--> Running go test" | ||
| @go test $(PACKAGES) | ||
| @GOCACHE=off go test -p 1 $(PACKAGES) |
There was a problem hiding this comment.
Is there a certain reason we're restricting tests to just 1 CPU?
There was a problem hiding this comment.
It doesn't restrict to 1 CPU afaik.
AFAIK it runs one test at a time, which mitigates issues around timeouts due to tests running too slow due to too many tests running simultaneously.
e.g. test make100 will succeed (except 1 minor issue) on my mac with this, whereas without, I see all kinds of nondeterministic test failures, which I'm sure we're seeing on Circle as well.
There was a problem hiding this comment.
too slow due to too many tests running simultaneously.
I thought by default it's number of cores available https://golang.org/pkg/runtime/#GOMAXPROCS. So if you have 2 cores, it should be 2 tests running simultaneously.
There was a problem hiding this comment.
You should be running with more cores than 1... and some tests test multiple reactors etc and have real-time timeouts (for lack of virtual deterministic time) that would be reasonable in a -p 1 scenario (utilizing potentially many cores if needed by the test), but not reasonable on Circle or my mac.
The target for Tendermint is for multicore machines. I see no practical problem with setting -p 1... the tests pass fairly quickly anyways. The deterministic success makes us waste less time trying to figure out why something broke.
There was a problem hiding this comment.
The deterministic success makes us waste less time trying to figure out why something broke.
fair argument. thank you
consensus/replay.go
Outdated
| // If appBlockHeight == 0 it means that we are at genesis and hence should send InitChain. | ||
| if appBlockHeight == 0 { | ||
| validators := types.TM2PB.Validators(state.Validators) | ||
| nvals := types.TM2PB.Validators(state.Validators) // state.Validators would work too. |
There was a problem hiding this comment.
what n in nvals stands for? the first thing coming to my mind is number. maybe we can come up with a better name, like pbVals or just vals? Maybe we can just inline this code (i.e. no variable at all)
lite/client/provider.go
Outdated
| if !bytes.Equal(hash, vhash) { | ||
| return fc, liteErr.ErrCommitNotFound() | ||
| if maxHeight != 0 && maxHeight < minHeight { | ||
| err = fmt.Errorf("need maxHeight == 0 or minHeight <= maxHeight, got %v and %v", |
There was a problem hiding this comment.
got %v and %v" => got min: %v and max: %v"
lite/commit.go
Outdated
| return 0 | ||
| SignedHeader: signedHeader, | ||
| Validators: valset, | ||
| NextValidators: nvalset, |
There was a problem hiding this comment.
nextValSet. n is too ambiguous
ebuchman
left a comment
There was a problem hiding this comment.
Nice work with the cleanup.
Note there's a number of related issues tagged under 'litecli' that we'll still need to follow up on: https://github.com/tendermint/tendermint/labels/litecli
| // If appBlockHeight == 0 it means that we are at genesis and hence should send InitChain. | ||
| if appBlockHeight == 0 { | ||
| validators := types.TM2PB.Validators(state.Validators) | ||
| nextVals := types.TM2PB.Validators(state.Validators) // state.Validators would work too. |
There was a problem hiding this comment.
state.Validators would work too ?
| cs.enterNewRound(height, vote.Round+1) | ||
| cs.enterNewRound(height, vote.Round) | ||
| cs.enterPrecommit(height, vote.Round) | ||
| cs.enterPrecommitWait(height, vote.Round) |
There was a problem hiding this comment.
Can you link to the issue ? maybe @milosevic can too
| ) | ||
| return fc, nil | ||
| valset = types.NewValidatorSet(res.Validators) | ||
| valset.TotalVotingPower() // to test deep equality. |
There was a problem hiding this comment.
not used, removing.
| fc.Commit = CommitFromResult(commit) | ||
| // This does no validation. | ||
| func (p *provider) fillFullCommit(signedHeader types.SignedHeader) (fc lite.FullCommit, err error) { | ||
| fc.SignedHeader = signedHeader |
There was a problem hiding this comment.
would be clearer if we just use NewFullCommit at the end of the function, and either remove the named return or use it for the error cases
| // the validator set which signed the commit, and the next validator set. The | ||
| // next validator set (which is proven from the block header) allows us to | ||
| // revert to block-by-block updating of lite certifier's latest validator set, | ||
| // even in the face of arbitrarily power changes. |
| } | ||
| // Maybe we have nothing to do. | ||
| if tfc.Height() == h { | ||
| return FullCommit{}, nil |
There was a problem hiding this comment.
empty ? the comment says Returns nil iff we successfully verify and persist a full commit. shouldn't we return the tfc here ?
There was a problem hiding this comment.
nil error... and yeah we could return tfc there.
| return liteErr.ErrNoPathFound() | ||
|
|
||
| // If sfc.Height() != h, we can't do it. | ||
| if sfc.Height() != h { |
There was a problem hiding this comment.
This is just a sanity check on LatestFullCommit, right ? since we set min and max to h already there ?
There was a problem hiding this comment.
yes. But not so much of a sanity check in the technical sense because it's not insane to assume that the source will give us insane responses.
| return sfc, nil | ||
| } else { | ||
| // Handle special case when err is ErrTooMuchChange. | ||
| if lerr.IsErrTooMuchChange(err) { |
There was a problem hiding this comment.
combine it with the else to be an else if and move the other return into an else
| // one level in the json output | ||
| types.SignedHeader | ||
| CanonicalCommit bool `json:"canonical"` | ||
| types.SignedHeader `json:"signed_header"` |
| Height: height, | ||
| Round: round, | ||
| Timestamp: time.Now().UTC(), | ||
| Timestamp: time.Now().Round(0).UTC(), |
There was a problem hiding this comment.
we should introduce a Now() function in this folder that we can call everywhere
|
Oh we should also make the linter happy. Should we retarget this PR against develop ? @cwgoes @liamsi whats happening with that branch. Also I can't run |
Yes; my PR was dropped in favor of #1782 (which has most of the same changes). The second error should also be fixed when this PR is retargeted; we're moving that Ledger package into the SDK. |
|
Superseded by #1815 |
…ns (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog
…ns (backport tendermint#1756) (tendermint#1773) * [e2e] Fixes prepareProposal not to return oversized set of transactions (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog (cherry picked from commit 0bf3f0a) # Conflicts: # test/e2e/app/app.go * Resolves conflict --------- Co-authored-by: lasaro <lasaro@informal.systems>
…ns (backport tendermint#1756) (tendermint#1774) * [e2e] Fixes prepareProposal not to return oversized set of transactions (tendermint#1756) * Fixes prepareProposal not to return oversized set of transactions * Update test/e2e/app/app.go * Fix linting error * add changelog entry * Avoid marshalling the tx twice * removing unneeded changelog (cherry picked from commit 0bf3f0a) # Conflicts: # test/e2e/app/app.go * Solve conflict --------- Co-authored-by: lasaro <lasaro@informal.systems>
The transactions in the LastBlock are sufficient for validators to know what the next validator should be, because it can update the validator set based on the validator updates returned by EndBlock. But light clients don't know that, so if the application wants to change the validator set completely 100%, a general purpose tendermint light client can't validate it because it doesn't know how to interpret the txs which caused the validator changes.
Delaying validator set updates by 1 block allows the next pending validator set to be included in the block header as "NextValidators", so light clients can know how the validator set changed for a blockchain, even with arbitrary validator set changes.
This PR includes updates for package "lite" as well.