Skip to content

p2p: add prototype peer lifecycle manager#5882

Merged
erikgrinaker merged 9 commits intomasterfrom
erik/p2p-peer-mgmt
Jan 18, 2021
Merged

p2p: add prototype peer lifecycle manager#5882
erikgrinaker merged 9 commits intomasterfrom
erik/p2p-peer-mgmt

Conversation

@erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jan 9, 2021

This adds a prototype peer lifecycle manager, peerManager, which stores peer data in an internal peerStore. The overall idea here is to have methods for peer lifecycle events which exchange a very narrow subset of peer data, and to keep all of the peer metadata (i.e. the peerInfo struct) internal, to decouple this from the router and simplify concurrency control. See peerManager GoDoc for more information.

The router is still responsible for actually dialing and accepting peer connections, and routing messages across them, but the peer manager is responsible for determining which peers to dial next, preventing multiple connections being established for the same peer (e.g. both inbound and outbound), and making sure we don't dial the same peer several times in parallel. Later it will also track retries and exponential backoff, as well as peer and address quality. It also assumes responsibility for peer updates subscriptions.

It's a bit unclear to me whether we want the peer manager to take on the responsibility of actually dialing and accepting connections as well, or if it should only be tracking peer state for the router while the router is responsible for all transport concerns. Let's revisit this later.

@erikgrinaker erikgrinaker changed the title p2p: app prototype peer lifecycle manager p2p: add prototype peer lifecycle manager Jan 9, 2021
@codecov
Copy link

codecov bot commented Jan 9, 2021

Codecov Report

Merging #5882 (3f65e83) into master (5cbb826) will increase coverage by 0.11%.
The diff coverage is 64.96%.

@@            Coverage Diff             @@
##           master    #5882      +/-   ##
==========================================
+ Coverage   59.93%   60.04%   +0.11%     
==========================================
  Files         269      269              
  Lines       24548    24591      +43     
==========================================
+ Hits        14712    14765      +53     
+ Misses       8237     8229       -8     
+ Partials     1599     1597       -2     
Impacted Files Coverage Δ
config/config.go 76.26% <ø> (ø)
config/toml.go 60.86% <ø> (ø)
p2p/router.go 64.22% <53.57%> (-5.04%) ⬇️
p2p/peer.go 56.35% <71.28%> (+3.23%) ⬆️
libs/events/events.go 92.94% <0.00%> (-5.89%) ⬇️
statesync/snapshots.go 91.59% <0.00%> (-1.69%) ⬇️
mempool/reactor.go 80.83% <0.00%> (-1.67%) ⬇️
blockchain/v2/reactor.go 32.73% <0.00%> (-1.44%) ⬇️
proxy/multi_app_conn.go 48.05% <0.00%> (ø)
consensus/state.go 67.75% <0.00%> (+0.09%) ⬆️
... and 13 more

@erikgrinaker erikgrinaker reopened this Jan 9, 2021
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

looks good to me, but I have questions about all those FIXME's

// done, since delivery is guaranteed and will block peer
// connection/disconnection otherwise.
//
// FIXME: Consider having callers just use peerManager.Subscribe() directly, if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

// is too expensive and also causes difficulties in tests where we may want to
// consume peer updates and send messages sequentially).
//
// FIXME: This possibly indicates an architectural problem. Should the peerManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you expand on this?

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready() exists only because the flow has to be like this:

  1. Call peerManager.Connected() to check whether the connection is allowed (i.e. whether we already have a connection to this peer).
  2. Set up router queue.
  3. Call peerManager.Ready() to trigger peer update.

Otherwise, if we send the peer update during Connected() (which would be more logical), the reactor can send a message to the peer before the router has set up its queue, thus dropping that message. This detailed interplay between the manager and router suggests that the abstraction may be wrong, since they really shouldn't care too much about each other's internals. However, I don't think there's any way around it, and this approach is probably simpler to understand than e.g. introducing an onConnected() callback that the router could use for queue registration.

@alexanderbez
Copy link
Contributor

It's a bit unclear to me whether we want the peer manager to take on the responsibility of actually dialing and accepting connections as well, or if it should only be tracking peer state for the router while the router is responsible for all transport concerns. Let's revisit this later.

I'm inclined to stick with the latter. Currently reviewing ...

}
}

// Add adds a peer to the manager, given as an address. If the peer already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the peer address is (attempted to be) added regardless if the peer exists in the store.

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is how we get new peers into the store in the first place. peerManager.Add() is used to add peers to the store, or to add an address to an existing peer, and the rest of Tendermint identifies peers by a peer address (e.g. in configuration files and via PEX).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but I was referring to the last sentence in the godoc :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes, that's also accurate I believe -- if the peer exists, we try to add the address to it (in case it's a new address for an existing peer).

defer s.mtx.Unlock()
delete(s.claimed, id)
// List retrieves all peers.
func (s *peerStore) List() ([]*peerInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to return references to peerInfo objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid copying the struct all over the place. This needs to be revisited, since we have to make sure the caller can't inadvertently change data in an in-memory database or in other callers, but we'll need to explicitly copy in that case anyway since e.g. slices and maps aren't copied automatically.

@erikgrinaker erikgrinaker added the S:automerge Automatically merge PR when requirements pass label Jan 18, 2021
@erikgrinaker erikgrinaker merged commit 96215a0 into master Jan 18, 2021
@erikgrinaker erikgrinaker deleted the erik/p2p-peer-mgmt branch January 18, 2021 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S:automerge Automatically merge PR when requirements pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants