cs/replay: execCommitBlock should not read from state.lastValidators#3067
cs/replay: execCommitBlock should not read from state.lastValidators#3067ebuchman merged 54 commits intotendermint:developfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #3067 +/- ##
===========================================
- Coverage 64.44% 64.39% -0.06%
===========================================
Files 213 213
Lines 17467 17507 +40
===========================================
+ Hits 11257 11274 +17
- Misses 5293 5308 +15
- Partials 917 925 +8
|
|
OK, I'll handle the test cases. |
ebuchman
left a comment
There was a problem hiding this comment.
Thanks @james-ray. Sorry for the delays. I think the core changes should be fine. Still a bit confused with the tests, and they can probably be much improved, but it doesn't need to block this.
Though it seems in the meantime the branch got out of sync. Would you mind merging in latest develop and addressing the conflicts? Then we can merge.
We can follow up on fixing the tests in #3600 and on the SaveState stuff in #1029
| cs := newConsensusState(state, privVals[0], NewCounterApplication()) | ||
| blockDB := dbm.NewMemDB() | ||
| cs := newConsensusStateWithConfigAndBlockStore(config, state, privVals[0], NewCounterApplication(), blockDB) | ||
| sm.SaveState(blockDB, state) |
There was a problem hiding this comment.
Why did this need to change at all ?
| cs := newConsensusState(state, privVals[0], app) | ||
| blockDB := dbm.NewMemDB() | ||
| cs := newConsensusStateWithConfigAndBlockStore(config, state, privVals[0], app, blockDB) | ||
| sm.SaveState(blockDB, state) |
There was a problem hiding this comment.
Why did this need to change?
| h.logger.Info("Applying block", "height", i) | ||
| block := h.store.LoadBlock(i) | ||
| appHash, err = sm.ExecCommitBlock(proxyApp.Consensus(), block, h.logger, state.LastValidators, h.stateDB) | ||
| appHash, err = sm.ExecCommitBlock(proxyApp.Consensus(), block, h.logger, h.stateDB) |
There was a problem hiding this comment.
I see. We already know what the validators are supposed to be in the Tendermint state, we just need to fetch that for the BeginBlock info.
| // Make ConsensusState | ||
| stateDB := dbm.NewMemDB() | ||
| stateDB := blockDB | ||
| sm.SaveState(stateDB, state) //for save height 1's validators info |
There was a problem hiding this comment.
not sure I follow. why can't we use stateDB := dbm.NewMemDB() here? Isn't the BlockExecutor supposed to use the stateDB?
| func TestSimulateValidatorsChange(t *testing.T) { | ||
| nPeers := 7 | ||
| nVals := 4 | ||
| css, genDoc, config, cleanup := randConsensusNetWithPeers(nVals, nPeers, "replay_test", newMockTickerFunc(true), newPersistentKVStoreWithPath) |
There was a problem hiding this comment.
No, not hard code, but it should be much easier to build a valid blockchain with the functions. No need to go through the entire consensus process like this!
# Conflicts: # CHANGELOG_PENDING.md # consensus/replay.go # consensus/replay_test.go
|
I see now there are 2 suggestions,
I think they are both good suggestions. Would you mind approve this first? I could work on them at another pr. |
libs/pubsub/pubsub.go
Outdated
| select { | ||
| case subscription.out <- Message{msg, tags}: | ||
| default: | ||
| case <-time.After(2 * time.Second): //give some time for the subscriber to handle |
There was a problem hiding this comment.
Why was this necessary?
I don't think we want this, especially not with a hard coded time. We just refactored this module to not block here.
|
Thank you for updating. I opened those two issues so we can track them without blocking this PR. This PR is basically good to merge, except for the new change to the pubsub. Is it possible to revert that? cc @melekes Also, would be great if you could clarify why the changes were necessary in the mempool_test.go |
|
As for mempool_test, do you mean these code below?
These code serve a similar purpose, to call the SaveState, otherwise I couldn't get stateDB. We could adopt your suggestion, add a function newStateDB(state) to do that, then stateDB would not have to be the same with blockDB. Why I modify pubsub is because TestStateLockPOLRelock in state_test has some probability to fail, you can try it. I debug and found
this code is the reason. It will delete the subscriber of vote event, the ensurePrecommit or ensurePrevote will timeout and fail. I think you cannot ensure the consumer is bound to handle a message prior to the next publish message, so I add a time interval to give it more chance. Or we could enlarge the capacity of the channel. Or We could set the capacity to 0 to make it block there. Anyway just because the consumer doesn't handle previous message in time we do the removal of it is not reasonable. As in the pubsub_test, I see 3 publish and 2 receive, I think the 3rd publish won't cause the state.remove(clientID, qStr, ErrOutOfCapacity) code above, so I add one more publish. |
|
I found an interesting phenomenon:
} This case, if we set voteCh := subscribeUnBuffered(..) , then there is a small probability that the test case will hang there because when the publish is executed, the ensurePrevote hasn't been executed, the publish will be forever blocked. I don't know why, the loop function in pubsub is executed in a go routine, it should not suppose to block the main routine. But if we set voteCh := subscribe(...), then there is also a small probability that the last code validateLastPrecommit(t, cs, vss[0], propBlockHash) could fail, because it may already go through several heights, once a time I ran it, it went to height 4, so this validation failed. |
|
--- FAIL: TestBadBlockStopsPeer (19.83s) Also 2 times the CI error on this. No hint of where or which variable the race is. |
|
|
||
| "github.com/go-kit/kit/log/term" | ||
|
|
||
| "path" |
There was a problem hiding this comment.
| "path" | |
| "filepath" |
path does not work on Windows
| } | ||
|
|
||
| reactors[i] = conRI | ||
| sm.SaveState(css[i].blockExec.DB(), css[i].state) //for save height 1's validators info |
There was a problem hiding this comment.
| sm.SaveState(css[i].blockExec.DB(), css[i].state) //for save height 1's validators info | |
| sm.SaveState(css[i].blockExec.DB(), css[i].state) // save height=1 validators info |
|
|
||
| func subscribeToVoter(cs *ConsensusState, addr []byte) <-chan tmpubsub.Message { | ||
| votesSub, err := cs.eventBus.Subscribe(context.Background(), testSubscriber, types.EventQueryVote) | ||
| votesSub, err := cs.eventBus.SubscribeUnbuffered(context.Background(), testSubscriber, types.EventQueryVote) |
There was a problem hiding this comment.
maybe we should leave a comment why SubscribeUnbuffered must be used here?
|
|
||
| // Make ConsensusState | ||
| stateDB := dbm.NewMemDB() | ||
| stateDB := blockDB |
There was a problem hiding this comment.
this is super confusing. please add a comment
There was a problem hiding this comment.
can't we add yet another argument to newConsensusStateWithConfigAndBlockStore function - stateDB dbm.DB
| // Make ConsensusState | ||
| stateDB := dbm.NewMemDB() | ||
| stateDB := blockDB | ||
| sm.SaveState(stateDB, state) //for save height 1's validators info |
There was a problem hiding this comment.
| sm.SaveState(stateDB, state) //for save height 1's validators info | |
| sm.SaveState(stateDB, state) // save height=1 validators info |
| } | ||
| stateDB := db.NewMemDB() | ||
| blockStoreDB := db.NewMemDB() | ||
| stateDB := blockStoreDB |
There was a problem hiding this comment.
this is super confusing. please add a comment at the very least
| assertReceive(t, "Quicksilver", subscription.Out()) | ||
| assertCancelled(t, subscription, pubsub.ErrOutOfCapacity) | ||
| case <-time.After(100 * time.Millisecond): | ||
| case <-time.After(3 * time.Second): |
| case <-published: | ||
| t.Fatal("Expected Publish(Darkhawk) to block") | ||
| case <-time.After(100 * time.Millisecond): | ||
| case <-time.After(3 * time.Second): |
| logger := log.NewNopLogger() | ||
| stateDB := dbm.NewMemDB() | ||
| blockDB := dbm.NewMemDB() | ||
| stateDB := blockDB |
There was a problem hiding this comment.
this is super confusing. please add a comment at the very least
|
|
||
| //------------------------------------------------------------------------------------------ | ||
| // Handshake Tests | ||
| type testSim struct { |
…endermint#3067) * execCommitBlock should not read from state.lastValidators * fix height 1 * fix blockchain/reactor_test * fix consensus/mempool_test * fix consensus/reactor_test * fix consensus/replay_test * add CHANGELOG * fix consensus/reactor_test * fix consensus/replay_test * add a test for replay validators change * fix mem_pool test * fix byzantine test * remove a redundant code * reduce validator change blocks to 6 * fix * return peer0 config * seperate testName * seperate testName 1 * seperate testName 2 * seperate app db path * seperate app db path 1 * add a lock before startNet * move the lock to reactor_test * simulate just once * try to find problem * handshake only saveState when app version changed * update gometalinter to 3.0.0 (tendermint#3233) in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165 also code is simplified by running gofmt -s . remove unused vars enable linters we're currently passing remove deprecated linters (cherry picked from commit d470945) * gofmt code * goimport code * change the bool name to testValidatorsChange * adjust receive kvstore.ProtocolVersion * adjust receive kvstore.ProtocolVersion 1 * adjust receive kvstore.ProtocolVersion 3 * fix merge execution.go * fix merge develop * fix merge develop 1 * fix run cleanupFunc * adjust code according to reviewers' opinion * modify the func name match the convention * simplify simulate a chain containing some validator change txs 1 * test CI error * Merge remote-tracking branch 'upstream/develop' into fixReplay 1 * fix pubsub_test * subscribeUnbuffered vote channel
…endermint#3067) * execCommitBlock should not read from state.lastValidators * fix height 1 * fix blockchain/reactor_test * fix consensus/mempool_test * fix consensus/reactor_test * fix consensus/replay_test * add CHANGELOG * fix consensus/reactor_test * fix consensus/replay_test * add a test for replay validators change * fix mem_pool test * fix byzantine test * remove a redundant code * reduce validator change blocks to 6 * fix * return peer0 config * seperate testName * seperate testName 1 * seperate testName 2 * seperate app db path * seperate app db path 1 * add a lock before startNet * move the lock to reactor_test * simulate just once * try to find problem * handshake only saveState when app version changed * update gometalinter to 3.0.0 (tendermint#3233) in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165 also code is simplified by running gofmt -s . remove unused vars enable linters we're currently passing remove deprecated linters (cherry picked from commit d470945) * gofmt code * goimport code * change the bool name to testValidatorsChange * adjust receive kvstore.ProtocolVersion * adjust receive kvstore.ProtocolVersion 1 * adjust receive kvstore.ProtocolVersion 3 * fix merge execution.go * fix merge develop * fix merge develop 1 * fix run cleanupFunc * adjust code according to reviewers' opinion * modify the func name match the convention * simplify simulate a chain containing some validator change txs 1 * test CI error * Merge remote-tracking branch 'upstream/develop' into fixReplay 1 * fix pubsub_test * subscribeUnbuffered vote channel
…endermint#3067) * execCommitBlock should not read from state.lastValidators * fix height 1 * fix blockchain/reactor_test * fix consensus/mempool_test * fix consensus/reactor_test * fix consensus/replay_test * add CHANGELOG * fix consensus/reactor_test * fix consensus/replay_test * add a test for replay validators change * fix mem_pool test * fix byzantine test * remove a redundant code * reduce validator change blocks to 6 * fix * return peer0 config * seperate testName * seperate testName 1 * seperate testName 2 * seperate app db path * seperate app db path 1 * add a lock before startNet * move the lock to reactor_test * simulate just once * try to find problem * handshake only saveState when app version changed * update gometalinter to 3.0.0 (tendermint#3233) in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165 also code is simplified by running gofmt -s . remove unused vars enable linters we're currently passing remove deprecated linters (cherry picked from commit d470945) * gofmt code * goimport code * change the bool name to testValidatorsChange * adjust receive kvstore.ProtocolVersion * adjust receive kvstore.ProtocolVersion 1 * adjust receive kvstore.ProtocolVersion 3 * fix merge execution.go * fix merge develop * fix merge develop 1 * fix run cleanupFunc * adjust code according to reviewers' opinion * modify the func name match the convention * simplify simulate a chain containing some validator change txs 1 * test CI error * Merge remote-tracking branch 'upstream/develop' into fixReplay 1 * fix pubsub_test * subscribeUnbuffered vote channel
Fixes #3054