make persistent prop independent of conn direction#3593
Conversation
Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes #3362
Codecov Report
@@ Coverage Diff @@
## develop #3593 +/- ##
===========================================
+ Coverage 64.25% 64.59% +0.33%
===========================================
Files 213 213
Lines 17463 17507 +44
===========================================
+ Hits 11221 11308 +87
+ Misses 5308 5255 -53
- Partials 934 944 +10
|
liamsi
left a comment
There was a problem hiding this comment.
Just a comment about the interface method that returns []error. Will do a 2nd pass later today.
There was a problem hiding this comment.
Looks great! Left a few nits and merge conflicts need resolving.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Any reasons why this behaviour could be undesired?
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
No. malicious node could report different address (say address our node has in |
…SeedMode We don't need it actually.
ebuchman
left a comment
There was a problem hiding this comment.
Really nice change, thanks for this Anton, but seems we need a few fixes.
| } | ||
| } | ||
| // parsing errors are handled above by AddPersistentPeers | ||
| _ = n.sw.DialPeersAsync(splitAndTrimEmpty(n.config.P2P.PersistentPeers, ",", " ")) |
There was a problem hiding this comment.
We should still handle this anyways for completeness, no?
Also is there a way we can avoid having to use the splitAndTrimEmpty again?
There was a problem hiding this comment.
We should still handle this anyways for completeness, no?
ok
Also is there a way we can avoid having to use the splitAndTrimEmpty again?
A bit annoying, but I think we can live with it for now.
| addr, err = peer.NodeInfo().NetAddress() | ||
| if err != nil { | ||
| sw.Logger.Error("Wanted to reconnect to inbound peer, but self-reported address is wrong", | ||
| "peer", peer, "addr", addr, "err", err) |
There was a problem hiding this comment.
Won't we have addr = nil here?
| // Used to dial peers from config on startup or from unsafe-RPC (trusted sources). | ||
| // TODO: remove addrBook arg since it's now set on the switch | ||
| func (sw *Switch) DialPeersAsync(addrBook AddrBook, peers []string, persistent bool) error { | ||
| // It ignores ErrNetAddressLookup. However, if there are other errors, first |
There was a problem hiding this comment.
Imagine you're starting a k8s cluster and DNS does not work yet => lookup fails => Tendermint fails to start.
If we ignore ErrNetAddressLookup, we won't be able to dial peers immediately, but reconnect will connect us to them eventually (only to persistent peers though).
Maybe we should add a flag ignoreNetAddressLookupErrors
There was a problem hiding this comment.
but reconnect will connect us to them eventually (only to persistent peers though).
this won't work in the current implementation because we do not try to reconnect. hmmm
There was a problem hiding this comment.
I think NetAddress should resolve IP right before dialing + resolve it again when IP is no longer reachable.
| err := p2pPeers.DialPeersAsync(addrBook, peers, persistent) | ||
| if err != nil { | ||
| logger.Info("DialPeers", "peers", peers, "persistent", persistent) | ||
| if err := p2pPeers.AddPersistentPeers(peers); err != nil { |
There was a problem hiding this comment.
Shouldn't this only run if persistent == true ?
| return &ctypes.ResultDialPeers{}, err | ||
| } | ||
| // parsing errors are handled above by AddPersistentPeers | ||
| _ = p2pPeers.DialPeersAsync(peers) |
There was a problem hiding this comment.
We should still handle the error for completeness. And if persistent==false then the AddPersistentPeers shouldn't run and we'll need to check the errors here anyways.
|
|
||
| // 2. attemptDisconnects should not disconnect because the peer is persistent | ||
| pexR.attemptDisconnects() | ||
| assert.Equal(t, 1, sw.Peers().Size()) |
There was a problem hiding this comment.
This passes even if we comment out the AddPersistenPeers line. I think the attemptDisconnects isn't causing a disconnect because of the timing constraints. If we sleep before it, then it fails without the AddPersistentPeers, but maybe there's a better way ?
| // return first non-ErrNetAddressLookup error | ||
| for _, err := range errs { | ||
| if _, ok := err.(ErrNetAddressLookup); ok { | ||
| continue |
There was a problem hiding this comment.
Doesn't this mean we might include nil in the list? What is the value of ignoring this error here if we don't even get a netAddr out of it?
There was a problem hiding this comment.
I'll need to think about it. See my other comment #3593 (comment)
There was a problem hiding this comment.
Doesn't this mean we might include nil in the list?
no
https://github.com/tendermint/tendermint/blob/develop/p2p/netaddress.go#L134-L138
| return nil | ||
| } | ||
|
|
||
| func (sw *Switch) isPeerPersistentFn() func(*NetAddress) bool { |
There was a problem hiding this comment.
I wonder if this should just check the ID instead. Since for instance if there's DNS involved the IP address could change between when it got added to persistent peers and when we connected to it and then this would fail.
There was a problem hiding this comment.
Also why is this a closure?
There was a problem hiding this comment.
I wonder if this should just check the ID instead. Since for instance if there's DNS involved the IP address could change between when it got added to persistent peers and when we connected to it and then this would fail.
This deserves a separate issue. One argument is favor of IP addresses is that it's more static. Maybe there are some DNS attacks we're not considering.
There was a problem hiding this comment.
Also why is this a closure?
I thought it would be weird to have persistent_peers list in peerConfig.
| require.NotNil(sw.Peers().Get(rp.ID())) | ||
| err = sw.DialPeerWithAddress(rp.Addr()) | ||
| require.Nil(t, err) | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Can we do better than a sleep here?
| err = sw.addOutboundPeerWithConfig(rp.Addr(), conf, true) | ||
| require.NotNil(err) | ||
|
|
||
| err = sw.addOutboundPeerWithConfig(rp.Addr(), conf) |
There was a problem hiding this comment.
don't we need to make this peer persistent to preserve the test case?
There was a problem hiding this comment.
No. Because we'll reconnect anyway if TestDialFail is set
Line 653 in 5051a1f
I think we can address this by just using IDs for persistent peers. |
also - handle errors from DialPeersAsync - remove nil addr from log msg - fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode This is a follow-up from #3593 (review)
## Description
also
handle errors from DialPeersAsync
remove nil addr from log msg
fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
#3593 (review)
Fixes most of the #3617, except #3593 (comment)
## Commits
* rpc: /dial_peers: only mark peers as persistent if flag is on
also
- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
#3593 (review)
* remove a call to AddPersistentPeers
TestDialFail will trigger a reconnect
) ## Description Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. This part is actually optional and can be reverted. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 ## Commits * make persistent prop independent of conn direction Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 * fix TestPEXReactorDialPeer test * add a changelog entry * update changelog * add two tests * reformat code * test UnsafeDialPeers and UnsafeDialSeeds * add TestSwitchDialPeersAsync * spec: update p2p/config spec * fixes after Ismail's review * Apply suggestions from code review Co-Authored-By: melekes <anton.kalyaev@gmail.com> * fix merge conflict * remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode We don't need it actually.
) ## Description Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. This part is actually optional and can be reverted. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 ## Commits * make persistent prop independent of conn direction Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 * fix TestPEXReactorDialPeer test * add a changelog entry * update changelog * add two tests * reformat code * test UnsafeDialPeers and UnsafeDialSeeds * add TestSwitchDialPeersAsync * spec: update p2p/config spec * fixes after Ismail's review * Apply suggestions from code review Co-Authored-By: melekes <anton.kalyaev@gmail.com> * fix merge conflict * remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode We don't need it actually.
…mint#3620) ## Description also handle errors from DialPeersAsync remove nil addr from log msg fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode This is a follow-up from tendermint#3593 (review) Fixes most of the tendermint#3617, except tendermint#3593 (comment) ## Commits * rpc: /dial_peers: only mark peers as persistent if flag is on also - handle errors from DialPeersAsync - remove nil addr from log msg - fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode This is a follow-up from tendermint#3593 (review) * remove a call to AddPersistentPeers TestDialFail will trigger a reconnect
) ## Description Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. This part is actually optional and can be reverted. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 ## Commits * make persistent prop independent of conn direction Previously only outbound peers can be persistent. Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost, Tendermint will try to reconnect. Plus, seed won't disconnect from inbound peer if it's marked as persistent. Fixes tendermint#3362 * fix TestPEXReactorDialPeer test * add a changelog entry * update changelog * add two tests * reformat code * test UnsafeDialPeers and UnsafeDialSeeds * add TestSwitchDialPeersAsync * spec: update p2p/config spec * fixes after Ismail's review * Apply suggestions from code review Co-Authored-By: melekes <anton.kalyaev@gmail.com> * fix merge conflict * remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode We don't need it actually.
Previously only outbound peers can be persistent.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362