p2p refactor: state sync reactor by alexanderbez · Pull Request #5671 · tendermint/tendermint

alexanderbez · 2020-11-16T16:28:30Z

Description

Prepares the state-sync reactor for the newly designed p2p changes per ADR 062.

Introduce new p2p types and interfaces.
Remove legacy Service implementation for state-sync. It still fulfills the Service interface, but now using the p2p Channel semantics.
Cleanup and refactor tests.

codecov · 2020-11-16T16:42:42Z

Codecov Report

Merging #5671 (8ee0d8c) into master (89e908e) will increase coverage by 0.06%.
The diff coverage is 59.34%.

@@            Coverage Diff             @@
##           master    #5671      +/-   ##
==========================================
+ Coverage   60.50%   60.57%   +0.06%     
==========================================
  Files         259      261       +2     
  Lines       23345    23618     +273     
==========================================
+ Hits        14126    14306     +180     
- Misses       7743     7829      +86     
- Partials     1476     1483       +7

Impacted Files	Coverage Δ
p2p/shim.go	`49.29% <49.29%> (ø)`
statesync/chunks.go	`86.36% <50.00%> (-0.18%)`	⬇️
statesync/reactor.go	`54.58% <50.00%> (+14.45%)`	⬆️
proto/tendermint/statesync/message.go	`72.34% <72.34%> (ø)`
node/node.go	`58.02% <76.47%> (+0.15%)`	⬆️
p2p/peer.go	`59.02% <80.00%> (+2.43%)`	⬆️
statesync/snapshots.go	`91.59% <87.50%> (ø)`
statesync/syncer.go	`78.96% <88.88%> (+0.48%)`	⬆️
p2p/channel.go	`93.75% <93.75%> (ø)`
crypto/sr25519/pubkey.go	`43.47% <0.00%> (-8.70%)`	⬇️
... and 16 more

p2p/channel.go

erikgrinaker

Shim looks great overall! Left a few minor comment.

We'll have to look closer at the details here later, but this is great for now. E.g. we need to look into blocking/buffering/dropping and such - the shim is now dropping messages while I think the current P2P stack assumes it will block. We may want to block for now too, to preserve the current system behavior, until we design the router. But let's do a pass later for stuff like this.

p2p/shim.go

p2p/channel.go

p2p/shim.go

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

melekes

👍

p2p/channel.go

p2p/shim.go

erikgrinaker · 2020-12-08T09:12:18Z

Occurred to me that we'll need better panic handling here too. In this PR, a panic during message processing causes the entire channel processing goroutine to terminate. In the current P2P stack, a panic causes the peer to be disconnected. We should probably disconnect the peer but otherwise continue message processing, similarly to how a web-server doesn't terminate if a HTTP request panics.

alexanderbez · 2020-12-08T14:33:06Z

Occurred to me that we'll need better panic handling here too. In this PR, a panic during message processing causes the entire channel processing goroutine to terminate. In the current P2P stack, a panic causes the peer to be disconnected. We should probably disconnect the peer but otherwise continue message processing, similarly to how a web-server doesn't terminate if a HTTP request panics.

Agree, but this is dependent on the individual reactors. Specifically, for state-sync, we do not have any explicit panic invocations. However, this doesn't mean a panic can't happen (e.g. nil pointer). Are you suggesting we have a recover block? If so, we should devise a way where each reactor doesn't have to do this themselves (if possible, maybe not).

erikgrinaker · 2020-12-08T14:58:41Z

Agree, but this is dependent on the individual reactors. Specifically, for state-sync, we do not have any explicit panic invocations. However, this doesn't mean a panic can't happen (e.g. nil pointer). Are you suggesting we have a recover block? If so, we should devise a way where each reactor doesn't have to do this themselves (if possible, maybe not).

Yeah, it follows from using channels that it's the consumer's responsibility to handle panics and other errors during processing. Since we're receiving messages from random people on the Internet here, we need to be resistant to adversarial inputs, and that includes panic recovery.

There's a bunch of different options here, for example:

Move message processing into a handleMessage(from p2p.PeerID, msg proto.Message) error method and do panic recovery there or in the caller (this also improves rightwards code drift and error handling).
Add a convenience method like Channel.Process(context.Context, func (Channel, p2p.Envelope) error) error that consumes the channel and calls the callback for each message until the context (or closer or whatever) is cancelled, handling panics and other issues.
Keep running the channel processing method in a loop until the reactor is shut down.

These are just a few ideas off the top of my head, there's probably other good options as well. I think this mostly depends on what's most convenient, understandable, and flexible so it's probably a good idea to experiment with a few approaches and see what works best.

alexanderbez · 2020-12-08T18:02:10Z

I went with something that is more like (1) because I do not want to make any premature abstractions or optimizations without knowing how all the reactors will look like. This is something we can cleanup in the future in, most likely, a non-breaking manner.

So now we have a simple handleMessage that handles panic recovery. handleMessage will then handle the message accordingly.

erikgrinaker

Looks great! A couple of nits, otherwise I think this is good to go. 🚀

As we've discussed, the reactor ergonomics with panic handling and such isn't optimal, let's revisit this later. Also, there are probably still potential deadlocks, halts, and impedance mismatches with the current P2P stack, but we've covered most things that I can think of -- we should be on the lookout for these sorts of issues as we do the remaining reactors.

proto/tendermint/statesync/message_test.go

statesync/messages.go

statesync/messages_test.go

alexanderbez added 3 commits November 16, 2020 11:26

init commit

9daa708

fix build

8d6b27c

fix build

9cc52a6

alexanderbez mentioned this pull request Nov 16, 2020

p2p: internal refactor and architecture redesign #5670

Closed

57 tasks

alexanderbez added 6 commits November 16, 2020 12:23

p2p: update godoc + add wrapper interface

5cc0df2

p2p: wrapper msg logic for state sync

07f7970

p2p: add stringer for PeerID type

9c6c767

state sync: update Start

0166eeb

p2p: implement Equal method for PeerID

ed127c4

p2p: update and cleanup state sync chunk logic

0333acf

alexanderbez commented Nov 16, 2020

View reviewed changes

p2p/channel.go Outdated Show resolved Hide resolved

alexanderbez added 15 commits November 16, 2020 14:30

state sync: update chunk tests

2e1a0b5

p2p: add PeerIDFromString

e0389bf

state sync: update reactor

8082b45

state sync: update reactor

92319c9

p2p: update Channel#Close

fad9090

p2p: add PeerUpdate

e7090b3

p2p: split up Run

35f0fdd

p2p: fix lock and send error

79a3de0

p2p: lint++

47d6081

p2p: add handlePeerUpdate

7e5cb3c

p2p: update state syncer

b78b8cb

state sync: add p2p shim

6d189e7

state sync: add return

a0516ee

p2p: add channel constructor

f01011a

p2p: implement reactor shim

a671a4b

erikgrinaker reviewed Nov 19, 2020

View reviewed changes

p2p/shim.go Outdated Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

p2p/shim.go Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

alexanderbez added 2 commits November 19, 2020 15:25

p2p: update shim

d4a6c91

p2p: godoc++

2820754

melekes reviewed Dec 3, 2020

View reviewed changes

p2p/channel.go Show resolved Hide resolved

p2p/channel.go Outdated Show resolved Hide resolved

p2p/shim.go Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

alexanderbez and others added 5 commits December 3, 2020 09:12

Update p2p/channel.go

c86be6e

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>

p2p: godoc++

9602c35

p2p: remove panic from handlePeerErrors

858505a

p2p: remove remaining panics from shim

c260daf

Merge branch 'master' into bez/p2p-refactor-state-sync-reactor

5d1bd7c

alexanderbez requested review from erikgrinaker, melekes and tac0turtle December 3, 2020 14:24

melekes approved these changes Dec 3, 2020

View reviewed changes

erikgrinaker reviewed Dec 7, 2020

View reviewed changes

p2p/channel.go Outdated Show resolved Hide resolved

p2p/channel.go Show resolved Hide resolved

p2p/channel.go Outdated Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

p2p/shim.go Outdated Show resolved Hide resolved

alexanderbez added 6 commits December 7, 2020 09:14

p2p: rename once field

1414744

p2p: godoc++

638b63a

p2p: refactor Channel API

af7752a

p2p: lint++

1939dd0

p2p: refactor peer updates logic

e48a545

Merge branch 'master' into bez/p2p-refactor-state-sync-reactor

4202f6d

state sync: refactor closing logic

18023aa

alexanderbez and others added 3 commits December 8, 2020 12:46

state sync: refactor message processing to handle panics

8850f0d

p2p: lint++

3d72cab

Merge branch 'master' into bez/p2p-refactor-state-sync-reactor

337f92a

erikgrinaker approved these changes Dec 9, 2020

View reviewed changes

proto/tendermint/statesync/message_test.go Show resolved Hide resolved

statesync/messages.go Show resolved Hide resolved

statesync/messages_test.go Show resolved Hide resolved

Merge branch 'master' into bez/p2p-refactor-state-sync-reactor

8ee0d8c

alexanderbez merged commit a879eb4 into master Dec 9, 2020

alexanderbez deleted the bez/p2p-refactor-state-sync-reactor branch December 9, 2020 14:31

alexanderbez mentioned this pull request Dec 9, 2020

state sync: cleanup #5776

Merged

Conversation

alexanderbez commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melekes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikgrinaker commented Dec 8, 2020

Uh oh!

alexanderbez commented Dec 8, 2020

Uh oh!

erikgrinaker commented Dec 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexanderbez commented Dec 8, 2020

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexanderbez commented Nov 16, 2020 •

edited

Loading

codecov bot commented Nov 16, 2020 •

edited

Loading

erikgrinaker commented Dec 8, 2020 •

edited

Loading