Conversation
consensus/state.go
Outdated
| case *BlockPartMessage: | ||
| // if the proposal is complete, we'll enterPrevote or tryFinalizeCommit | ||
| _, err = cs.addProposalBlockPart(msg, peerID) | ||
| // TODO: At this point if added is true we have added new and useful block part so we can mark it in peer statistics. |
There was a problem hiding this comment.
To be able to update statistics at this point we need access to PeerState. One way to achieve this would be to pass Peer instead of p2p.ID in msgInfo. We still have issue how to call MarkGood method of AddrBook from ConsensusState. One option would be having a channel that is read by Switch in which we can write MarkGood messages. Channel can be obtained either through Peer or directly from ConsensusState. Any other idea?
consensus/state.go
Outdated
| // Attempt to add the vote. if its a duplicate signature, dupeout the validator | ||
| func (cs *ConsensusState) tryAddVote(vote *types.Vote, peerID p2p.ID) error { | ||
| _, err := cs.addVote(vote, peerID) | ||
| // TODO: at this point we should update statistics if addVote return value added = true. Maybe tryAddVote can return |
There was a problem hiding this comment.
Similar as above.
072ffcf to
845c367
Compare
845c367 to
98e1be5
Compare
Codecov Report
@@ Coverage Diff @@
## develop #2388 +/- ##
===========================================
- Coverage 61.72% 61.29% -0.43%
===========================================
Files 198 198
Lines 16292 16403 +111
===========================================
- Hits 10057 10055 -2
- Misses 5396 5515 +119
+ Partials 839 833 -6
|
xla
left a comment
There was a problem hiding this comment.
Really like the approach taken for the refactoring, Some inline comments that be good to see addressed.
consensus/state.go
Outdated
| // msgs sent to reactor to compute statistics on peer activity | ||
| type peerInfo struct { | ||
| MsgType MsgType `json:"msg_type"` | ||
| PeerID p2p.ID `json:"peer_key"` |
There was a problem hiding this comment.
Is this struct every serialised? If not we should omit struct tags as it indicates that.
consensus/reactor.go
Outdated
| case msg := <-conR.conS.statsMsgQueue: | ||
| // Get peer | ||
| peer := conR.Switch.Peers().Get(msg.PeerID) | ||
| if peer != nil { |
There was a problem hiding this comment.
What if the peer can't be found?
There was a problem hiding this comment.
Should we reset peer stats on disconnect?
There was a problem hiding this comment.
Good point. It could happen that at the point peerInfo arrives the peer is removed; I guess we can't do much in that case.
There was a problem hiding this comment.
I don't think we should reset in on disconnect as this is more consensus level stats. It does not care if underlying connection is temporarily broken or not; this could be important metric for peer quality but it's different concern.
There was a problem hiding this comment.
We should reverse the condition so we minimize the amount of code in the conditional. ie
if peer == nil {
log.Debug("Attempt to update stats for non-existent peer", "peer", peerID)
continue
}
consensus/reactor.go
Outdated
| conR.Switch.MarkPeerAsGood(peer) | ||
| } | ||
| case BlockPart: | ||
| if numParts := ps.RecordBlockPart(); numParts%blocksToContributeToBecomeGoodPeer == 0 { |
There was a problem hiding this comment.
Is parts equivalent to blocks? The constant indicates we expect a certain amount of blocks.
There was a problem hiding this comment.
Good point. I was also confused with this. My understanding is that a peer can send several parts for the same block and we will count all parts. @ebuchman Could you provide input here?
There was a problem hiding this comment.
Great point - it's about parts! At least in the current design, if the same peer sends us many unique parts for the same block, we count all of them. Could be good to fix the var name
consensus/reactor.go
Outdated
| ps := peer.Get(types.PeerStateKey).(*PeerState) | ||
| switch msg.MsgType { | ||
| case Vote: | ||
| if numVotes := ps.RecordVote(); numVotes%votesToContributeToBecomeGoodPeer == 0 { |
There was a problem hiding this comment.
Why modulo and not check if the number of votes is greater than the constant we check against?
There was a problem hiding this comment.
Because we want to mark peer as good every 10000 blocks/votes.
There was a problem hiding this comment.
And also because we have a sense of overall peer contribution. This info could potentially be used break ties between two peers in case we need to choose. If we reset every 10k we lose this information.
consensus/reactor.go
Outdated
| // It returns the total number of votes (1 per block). This essentially means | ||
| // the number of blocks for which peer has been sending us votes. | ||
| func (ps *PeerState) RecordVote(vote *types.Vote) int { | ||
| // RecordVote increments internal votes related statistics for this peer. It returns the total number of added votes. |
There was a problem hiding this comment.
Can we break long lines by the 80 char limit, please.
There was a problem hiding this comment.
Do we have guideline regarding line limit? 80 is maybe too small in my view. I though that it's 90 in my editor but it was actually 120. I normally prefer 90 but we are not consistent in the code regarding this. Would be happy to have strict guideline on this.
| "testing" | ||
| "time" | ||
|
|
||
| "github.com/stretchr/testify/require" |
There was a problem hiding this comment.
Imports should be separated in three different groups: stdlib, external packages, internal packages.
consensus/state_test.go
Outdated
| select { | ||
| case <-cs.statsMsgQueue: | ||
| t.Errorf("Should not output stats message after receiving the known block part!") | ||
| case <-time.After(1 * time.Second): |
There was a problem hiding this comment.
Can we lower this timeout? We already suffer from high timeouts in our test suite and we should be adamant to keep the test timings as tight as possible.
consensus/state_test.go
Outdated
| select { | ||
| case <-cs.statsMsgQueue: | ||
| t.Errorf("Should not output stats message after receiving the known vote or vote from bigger height") | ||
| case <-time.After(1 * time.Second): |
consensus/state.go
Outdated
| // Attempt to add the vote. if its a duplicate signature, dupeout the validator | ||
| func (cs *ConsensusState) tryAddVote(vote *types.Vote, peerID p2p.ID) error { | ||
| _, err := cs.addVote(vote, peerID) | ||
| func (cs *ConsensusState) tryAddVote(vote *types.Vote, peerID p2p.ID) (added bool, err error) { |
There was a problem hiding this comment.
Can we avoid named return values if possible? It doesn't seem to be needed here.
There was a problem hiding this comment.
Sure. Is there a best practice about using versus not named return values?
consensus/state.go
Outdated
| type MsgType int | ||
|
|
||
| const ( | ||
| Vote MsgType = iota |
There was a problem hiding this comment.
A defensive mechanism which helps increae the usability of "enums" in go is to either start of with iota+1 or have the first value be the an unkown. Reasoning being that the default value for int which is the underlying type is 0, which could lead to accidental use as Vote. Also those consts should be prefixed as such for readibilty and to reduce the likelihood of collision with other types, my proposal would be:
type MsgType int
const (
MsgTypeUnknown MsgType = iota
MsgTypeVote
MsgTypeBlockPart
)This would create symmetry with the solution to the String method on the same type you provided.
There was a problem hiding this comment.
Thanks a lot for this comment.
| for { | ||
| // Manage disconnects from self or peer. | ||
| if !conR.IsRunning() { | ||
| conR.Logger.Info("Stopping peerStatsRoutine") |
There was a problem hiding this comment.
Think this needs to be Debug or removed completely.
There was a problem hiding this comment.
Why's that ? Other "Stopping" logs are Info. This should only happen once, during shutdown
There was a problem hiding this comment.
What purpose does this line serve? If that is to inform developer (validators shouldn't care about it) about a successful exit, then I'd prefer to have leaktest to make sure go-routine exits instead of relying on human eye post-failure.
consensus/reactor.go
Outdated
| case msg := <-conR.conS.statsMsgQueue: | ||
| // Get peer | ||
| peer := conR.Switch.Peers().Get(msg.PeerID) | ||
| if peer != nil { |
There was a problem hiding this comment.
Should we reset peer stats on disconnect?
consensus/reactor.go
Outdated
| ps := peer.Get(types.PeerStateKey).(*PeerState) | ||
| switch msg.MsgType { | ||
| case Vote: | ||
| if numVotes := ps.RecordVote(); numVotes%votesToContributeToBecomeGoodPeer == 0 { |
There was a problem hiding this comment.
Because we want to mark peer as good every 10000 blocks/votes.
ebuchman
left a comment
There was a problem hiding this comment.
Nice work! LGTM but a few minor nits to address. Especially I think the new MsgType/peerInfo stuff is redundant with the existing msgInfo
consensus/reactor.go
Outdated
| case msg := <-conR.conS.statsMsgQueue: | ||
| // Get peer | ||
| peer := conR.Switch.Peers().Get(msg.PeerID) | ||
| if peer != nil { |
There was a problem hiding this comment.
We should reverse the condition so we minimize the amount of code in the conditional. ie
if peer == nil {
log.Debug("Attempt to update stats for non-existent peer", "peer", peerID)
continue
}
consensus/reactor.go
Outdated
|
|
||
| func (conR *ConsensusReactor) peerStatsRoutine() { | ||
| for { | ||
| // Manage disconnects from self or peer. |
There was a problem hiding this comment.
comment is incorrect for this routine - it applies to the per-peer routines but this routine is global to all peers, so this is about shutting down the whole system.
| for { | ||
| // Manage disconnects from self or peer. | ||
| if !conR.IsRunning() { | ||
| conR.Logger.Info("Stopping peerStatsRoutine") |
There was a problem hiding this comment.
Why's that ? Other "Stopping" logs are Info. This should only happen once, during shutdown
| } | ||
| } | ||
| } | ||
| case <-conR.conS.Quit(): |
There was a problem hiding this comment.
do we also need <-conR.Quit ? I don't think there's a way for the reactor to exit without the state exiting, but just noting.
There was a problem hiding this comment.
Was thinking the same. Probably not needed, but added <-conR.Quit to be more defensive.
| // RecordVote increments internal votes related statistics for this peer. | ||
| // It returns the total number of added votes. | ||
| func (ps *PeerState) RecordVote() int { | ||
| ps.mtx.Lock() |
There was a problem hiding this comment.
I was going to suggest we could remove these locks since we only access the Stats synchronously, but that's not true because they're included in an RPC output. Cool.
consensus/state.go
Outdated
| return fmt.Sprintf("%v ; %d/%d %v", ti.Duration, ti.Height, ti.Round, ti.Step) | ||
| } | ||
|
|
||
| type MsgType int |
There was a problem hiding this comment.
Maybe we can just use the existing msgInfo instead of introducing these new types?
There was a problem hiding this comment.
Good point. I will remove peerInfo for msgInfo.
consensus/state.go
Outdated
| timeoutTicker TimeoutTicker | ||
|
|
||
| // information about about added votes and block parts are written on this channel | ||
| // so statistics can be computed by switch |
consensus/state.go
Outdated
| _, err = cs.addProposalBlockPart(msg, peerID) | ||
| added, err := cs.addProposalBlockPart(msg, peerID) | ||
| if added { | ||
| cs.statsMsgQueue <- peerInfo{MsgTypeBlockPart, peerID} |
There was a problem hiding this comment.
note its not safe to do this on the internalMsgQueue because reading off it happens synchronously with execution here, so we could have deadlocks (hence https://github.com/tendermint/tendermint/blob/develop/consensus/state.go#L437). but since this channel has its own independent routine that doesnt involve logic over here, this won't cause deadlock.
| } | ||
|
|
||
| // Test we record block parts from other peers | ||
| func TestReactorRecordsBlockParts(t *testing.T) { |
There was a problem hiding this comment.
Is there a way to bring back the test in a simpler version like for the Votes ?
There was a problem hiding this comment.
It's very tricky in general to write tests for reactor side as I haven't found a simple way to add a peer so we can check if posting a message by a peer on statsMsgQueue trigger stats update. I will extend this test to include similar check for block part stats, but in general don't like this test.
consensus/reactor.go
Outdated
| func (conR *ConsensusReactor) OnStart() error { | ||
| conR.Logger.Info("ConsensusReactor ", "fastSync", conR.FastSync()) | ||
|
|
||
| // start routine that computes peer statistics |
There was a problem hiding this comment.
add for evaluating peer quality
9bdaabb to
36b6087
Compare
Refs #2048