p2p: blockchain dismiss request channel delay#3459
p2p: blockchain dismiss request channel delay#3459melekes merged 2 commits intotendermint:developfrom
Conversation
|
There's a data race |
| for { | ||
| select { | ||
| case request := <-bcR.requestsCh: | ||
| go func() { |
There was a problem hiding this comment.
Not sure how extracting requestsCh into its own goroutine solves anything... but I don't understand the issue well enough
There was a problem hiding this comment.
The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.
|
Should we instead forbid any ABCI queries while we're in fast sync mode? |
Heavy ABCI queries is a condition to make this problem easily to be reproduced. If the time to apply a block increase, this issue is more easily to be exposed, but we can't control the time in tendermint layer. |
|
I will try to fix data-race issue tomorrow. thks for remind. |
|
updated @melekes . Let us see the CI result then. |
Codecov Report
@@ Coverage Diff @@
## develop #3459 +/- ##
===========================================
+ Coverage 64.2% 64.34% +0.14%
===========================================
Files 213 213
Lines 17362 17447 +85
===========================================
+ Hits 11147 11227 +80
+ Misses 5290 5287 -3
- Partials 925 933 +8
|
|
Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers. |
you are right. The code is updated now, I put |
|
@guagualvcha could you please add a changelog entry ( |
Fixes tendermint#3457 The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us"). Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely. Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Fixes tendermint#3457 The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us"). Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely. Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
Fix issue #3457