Skip to content

make persistent prop independent of conn direction#3593

Merged
melekes merged 15 commits intodevelopfrom
3362-persistent-inbound-peers
May 3, 2019
Merged

make persistent prop independent of conn direction#3593
melekes merged 15 commits intodevelopfrom
3362-persistent-inbound-peers

Conversation

@melekes
Copy link
Contributor

@melekes melekes commented Apr 25, 2019

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

melekes added 2 commits April 25, 2019 14:23
Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362
@melekes melekes self-assigned this Apr 25, 2019
@codecov-io
Copy link

codecov-io commented Apr 25, 2019

Codecov Report

Merging #3593 into develop will increase coverage by 0.33%.
The diff coverage is 75.64%.

@@             Coverage Diff             @@
##           develop    #3593      +/-   ##
===========================================
+ Coverage    64.25%   64.59%   +0.33%     
===========================================
  Files          213      213              
  Lines        17463    17507      +44     
===========================================
+ Hits         11221    11308      +87     
+ Misses        5308     5255      -53     
- Partials       934      944      +10
Impacted Files Coverage Δ
rpc/core/pipe.go 28.84% <ø> (ø) ⬆️
rpc/core/net.go 41.66% <100%> (+41.66%) ⬆️
p2p/pex/pex_reactor.go 81.6% <100%> (-1.79%) ⬇️
p2p/transport.go 85.3% <100%> (+0.49%) ⬆️
node/node.go 64.44% <50%> (+0.37%) ⬆️
p2p/switch.go 68.73% <70.68%> (+4.66%) ⬆️
libs/db/debug_db.go 15.78% <0%> (-4.53%) ⬇️
libs/clist/clist.go 66.66% <0%> (-1.52%) ⬇️
state/errors.go 0% <0%> (ø) ⬆️
... and 9 more

@melekes melekes marked this pull request as ready for review April 25, 2019 13:00
@melekes melekes requested review from ebuchman and xla as code owners April 25, 2019 13:00
Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about the interface method that returns []error. Will do a 2nd pass later today.

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few nits and merge conflicts need resolving.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Any reasons why this behaviour could be undesired?

Co-Authored-By: melekes <anton.kalyaev@gmail.com>
@melekes
Copy link
Contributor Author

melekes commented Apr 30, 2019

Any reasons why this behaviour could be undesired?

No.

malicious node could report different address (say address our node has in persistent_peers), but this won't give it any advantages.

@melekes melekes merged commit 8711af6 into develop May 3, 2019
@melekes melekes deleted the 3362-persistent-inbound-peers branch May 3, 2019 13:22
Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice change, thanks for this Anton, but seems we need a few fixes.

}
}
// parsing errors are handled above by AddPersistentPeers
_ = n.sw.DialPeersAsync(splitAndTrimEmpty(n.config.P2P.PersistentPeers, ",", " "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still handle this anyways for completeness, no?

Also is there a way we can avoid having to use the splitAndTrimEmpty again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still handle this anyways for completeness, no?

ok

Also is there a way we can avoid having to use the splitAndTrimEmpty again?

A bit annoying, but I think we can live with it for now.

addr, err = peer.NodeInfo().NetAddress()
if err != nil {
sw.Logger.Error("Wanted to reconnect to inbound peer, but self-reported address is wrong",
"peer", peer, "addr", addr, "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we have addr = nil here?

// Used to dial peers from config on startup or from unsafe-RPC (trusted sources).
// TODO: remove addrBook arg since it's now set on the switch
func (sw *Switch) DialPeersAsync(addrBook AddrBook, peers []string, persistent bool) error {
// It ignores ErrNetAddressLookup. However, if there are other errors, first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we ignore these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you're starting a k8s cluster and DNS does not work yet => lookup fails => Tendermint fails to start.

If we ignore ErrNetAddressLookup, we won't be able to dial peers immediately, but reconnect will connect us to them eventually (only to persistent peers though).

Maybe we should add a flag ignoreNetAddressLookupErrors

Copy link
Contributor Author

@melekes melekes May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but reconnect will connect us to them eventually (only to persistent peers though).

this won't work in the current implementation because we do not try to reconnect. hmmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NetAddress should resolve IP right before dialing + resolve it again when IP is no longer reachable.

err := p2pPeers.DialPeersAsync(addrBook, peers, persistent)
if err != nil {
logger.Info("DialPeers", "peers", peers, "persistent", persistent)
if err := p2pPeers.AddPersistentPeers(peers); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this only run if persistent == true ?

return &ctypes.ResultDialPeers{}, err
}
// parsing errors are handled above by AddPersistentPeers
_ = p2pPeers.DialPeersAsync(peers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still handle the error for completeness. And if persistent==false then the AddPersistentPeers shouldn't run and we'll need to check the errors here anyways.


// 2. attemptDisconnects should not disconnect because the peer is persistent
pexR.attemptDisconnects()
assert.Equal(t, 1, sw.Peers().Size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes even if we comment out the AddPersistenPeers line. I think the attemptDisconnects isn't causing a disconnect because of the timing constraints. If we sleep before it, then it fails without the AddPersistentPeers, but maybe there's a better way ?

// return first non-ErrNetAddressLookup error
for _, err := range errs {
if _, ok := err.(ErrNetAddressLookup); ok {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we might include nil in the list? What is the value of ignoring this error here if we don't even get a netAddr out of it?

Copy link
Contributor Author

@melekes melekes May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to think about it. See my other comment #3593 (comment)

Copy link
Contributor Author

@melekes melekes May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean we might include nil in the list?

no

https://github.com/tendermint/tendermint/blob/develop/p2p/netaddress.go#L134-L138

return nil
}

func (sw *Switch) isPeerPersistentFn() func(*NetAddress) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just check the ID instead. Since for instance if there's DNS involved the IP address could change between when it got added to persistent peers and when we connected to it and then this would fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this a closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just check the ID instead. Since for instance if there's DNS involved the IP address could change between when it got added to persistent peers and when we connected to it and then this would fail.

This deserves a separate issue. One argument is favor of IP addresses is that it's more static. Maybe there are some DNS attacks we're not considering.

Copy link
Contributor Author

@melekes melekes May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this a closure?

I thought it would be weird to have persistent_peers list in peerConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.NotNil(sw.Peers().Get(rp.ID()))
err = sw.DialPeerWithAddress(rp.Addr())
require.Nil(t, err)
time.Sleep(50 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do better than a sleep here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

err = sw.addOutboundPeerWithConfig(rp.Addr(), conf, true)
require.NotNil(err)

err = sw.addOutboundPeerWithConfig(rp.Addr(), conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to make this peer persistent to preserve the test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Because we'll reconnect anyway if TestDialFail is set

go sw.reconnectToPeer(addr)

@ebuchman
Copy link
Contributor

ebuchman commented May 3, 2019

malicious node could report different address (say address our node has in persistent_peers), but this won't give it any advantages.

I think we can address this by just using IDs for persistent peers.

melekes added a commit that referenced this pull request May 4, 2019
also

- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)
melekes added a commit that referenced this pull request May 7, 2019
## Description

also

    handle errors from DialPeersAsync
    remove nil addr from log msg
    fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)

Fixes most of the #3617, except #3593 (comment)

## Commits

* rpc: /dial_peers: only mark peers as persistent if flag is on

also

- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)

* remove a call to AddPersistentPeers

TestDialFail will trigger a reconnect
@melekes melekes mentioned this pull request May 30, 2019
44 tasks
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
)

## Description

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

## Commits

* make persistent prop independent of conn direction

Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

* fix TestPEXReactorDialPeer test

* add a changelog entry

* update changelog

* add two tests

* reformat code

* test UnsafeDialPeers and UnsafeDialSeeds

* add TestSwitchDialPeersAsync

* spec: update p2p/config spec

* fixes after Ismail's review

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix merge conflict

* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

We don't need it actually.
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
)

## Description

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

## Commits

* make persistent prop independent of conn direction

Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

* fix TestPEXReactorDialPeer test

* add a changelog entry

* update changelog

* add two tests

* reformat code

* test UnsafeDialPeers and UnsafeDialSeeds

* add TestSwitchDialPeersAsync

* spec: update p2p/config spec

* fixes after Ismail's review

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix merge conflict

* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

We don't need it actually.
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
…mint#3620)

## Description

also

    handle errors from DialPeersAsync
    remove nil addr from log msg
    fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
tendermint#3593 (review)

Fixes most of the tendermint#3617, except tendermint#3593 (comment)

## Commits

* rpc: /dial_peers: only mark peers as persistent if flag is on

also

- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
tendermint#3593 (review)

* remove a call to AddPersistentPeers

TestDialFail will trigger a reconnect
brapse pushed a commit to brapse/tendermint that referenced this pull request Jun 5, 2019
)

## Description

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

## Commits

* make persistent prop independent of conn direction

Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes tendermint#3362

* fix TestPEXReactorDialPeer test

* add a changelog entry

* update changelog

* add two tests

* reformat code

* test UnsafeDialPeers and UnsafeDialSeeds

* add TestSwitchDialPeersAsync

* spec: update p2p/config spec

* fixes after Ismail's review

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix merge conflict

* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

We don't need it actually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants