p2p: add prototype peer lifecycle manager#5882
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5882 +/- ##
==========================================
+ Coverage 59.93% 60.04% +0.11%
==========================================
Files 269 269
Lines 24548 24591 +43
==========================================
+ Hits 14712 14765 +53
+ Misses 8237 8229 -8
+ Partials 1599 1597 -2
|
melekes
left a comment
There was a problem hiding this comment.
👍
looks good to me, but I have questions about all those FIXME's
| // done, since delivery is guaranteed and will block peer | ||
| // connection/disconnection otherwise. | ||
| // | ||
| // FIXME: Consider having callers just use peerManager.Subscribe() directly, if |
| // is too expensive and also causes difficulties in tests where we may want to | ||
| // consume peer updates and send messages sequentially). | ||
| // | ||
| // FIXME: This possibly indicates an architectural problem. Should the peerManager |
There was a problem hiding this comment.
Ready() exists only because the flow has to be like this:
- Call
peerManager.Connected()to check whether the connection is allowed (i.e. whether we already have a connection to this peer). - Set up router queue.
- Call
peerManager.Ready()to trigger peer update.
Otherwise, if we send the peer update during Connected() (which would be more logical), the reactor can send a message to the peer before the router has set up its queue, thus dropping that message. This detailed interplay between the manager and router suggests that the abstraction may be wrong, since they really shouldn't care too much about each other's internals. However, I don't think there's any way around it, and this approach is probably simpler to understand than e.g. introducing an onConnected() callback that the router could use for queue registration.
I'm inclined to stick with the latter. Currently reviewing ... |
| } | ||
| } | ||
|
|
||
| // Add adds a peer to the manager, given as an address. If the peer already |
There was a problem hiding this comment.
Looks like the peer address is (attempted to be) added regardless if the peer exists in the store.
There was a problem hiding this comment.
Yes, this is how we get new peers into the store in the first place. peerManager.Add() is used to add peers to the store, or to add an address to an existing peer, and the rest of Tendermint identifies peers by a peer address (e.g. in configuration files and via PEX).
There was a problem hiding this comment.
Yup, but I was referring to the last sentence in the godoc :)
There was a problem hiding this comment.
Ah. Yes, that's also accurate I believe -- if the peer exists, we try to add the address to it (in case it's a new address for an existing peer).
| defer s.mtx.Unlock() | ||
| delete(s.claimed, id) | ||
| // List retrieves all peers. | ||
| func (s *peerStore) List() ([]*peerInfo, error) { |
There was a problem hiding this comment.
Is there a reason to return references to peerInfo objects?
There was a problem hiding this comment.
Just to avoid copying the struct all over the place. This needs to be revisited, since we have to make sure the caller can't inadvertently change data in an in-memory database or in other callers, but we'll need to explicitly copy in that case anyway since e.g. slices and maps aren't copied automatically.
This adds a prototype peer lifecycle manager,
peerManager, which stores peer data in an internalpeerStore. The overall idea here is to have methods for peer lifecycle events which exchange a very narrow subset of peer data, and to keep all of the peer metadata (i.e. thepeerInfostruct) internal, to decouple this from the router and simplify concurrency control. SeepeerManagerGoDoc for more information.The router is still responsible for actually dialing and accepting peer connections, and routing messages across them, but the peer manager is responsible for determining which peers to dial next, preventing multiple connections being established for the same peer (e.g. both inbound and outbound), and making sure we don't dial the same peer several times in parallel. Later it will also track retries and exponential backoff, as well as peer and address quality. It also assumes responsibility for peer updates subscriptions.
It's a bit unclear to me whether we want the peer manager to take on the responsibility of actually dialing and accepting connections as well, or if it should only be tracking peer state for the router while the router is responsible for all transport concerns. Let's revisit this later.