[WIP] Aleth now waits for 2 secs after sending disconnect packet #5650#5658
[WIP] Aleth now waits for 2 secs after sending disconnect packet #5650#5658twinstar26 wants to merge 1 commit intoethereum:masterfrom twinstar26:master
Conversation
|
You're getting a build error: |
|
@twinstar26 Can you please add a changelog entry? Please see item 2 here for details: https://github.com/ethereum/aleth/blob/master/CONTRIBUTING.md |
|
@twinstar26 Can you please update the PR description to reference the issue you're fixing (so that the issue will be automatically updated with a link to the PR and closed when the PR is closed)? Be sure to include a hash char (#) before the issue ID. |
libp2p/Session.cpp
Outdated
|
|
||
| void Session::send(bytes&& _msg) | ||
| { | ||
| if(m_dropped) |
There was a problem hiding this comment.
(Nit) Formatting...can you please format your changes using git clang-format as described in item 3 here: https://github.com/ethereum/aleth/blob/master/CONTRIBUTING.md
libp2p/Session.cpp
Outdated
| { | ||
| bytes const* out = nullptr; | ||
|
|
||
| bool disconnectImminent = false; |
There was a problem hiding this comment.
This can be simplified to:
bool disconnectImminent = (writeQueue[0][0] == DisconnectPacket);
Also, since the type of the disconnectImminent value is obvious, we can make disconnectImmiment auto (and const since we're not going to modify its value after initially setting it).
However, you can remove the disconnectImmiment check completely since @gumb0 brought up a good way to simplify the changes - start the timer in Session::disconnect after the sealAndSend call. The resulting wait won't be exactly 2 seconds after the disconnect packet is actually sent out but it should be close enough.
libp2p/Session.cpp
Outdated
| m_writeQueue.pop_front(); | ||
| if(m_writeQueue.empty() && disconnectImminent) | ||
| { | ||
| ba::io_context ioc; |
There was a problem hiding this comment.
I think you should re-use the Host's io_context (Host::m_ioContext) - one way to do this would be to create the timer in Host::startPeerSession and pass it into the Session ctor. @gumb0 - thoughts?
There was a problem hiding this comment.
Yes, timer should better use io_context of Host.
Not sure about passing timer to constructor. Session has a pointer to Host, maybe better to introduce a public method in Host that will create a timer? This way we won't create it until it's really needed in disconnect.
There was a problem hiding this comment.
Yes, timer should better use
io_contextofHost.Not sure about passing timer to constructor.
Sessionhas a pointer toHost, maybe better to introduce a public method inHostthat will create a timer? This way we won't create it until it's really needed indisconnect.
@gumb0 : Good idea, this way we can also easily create timers in other classes if needed.
libp2p/Session.cpp
Outdated
| ba::steady_timer disconnectTimer(ioc); | ||
| disconnectTimer.expires_from_now(chrono::seconds(2)); | ||
| disconnectTimer.async_wait([this](boost::system::error_code const& ) {}); | ||
| ioc.run(); |
There was a problem hiding this comment.
You don't need to explicitly call run, Host already does this (see Host::doWork)
libp2p/Session.cpp
Outdated
| ba::io_context ioc; | ||
| ba::steady_timer disconnectTimer(ioc); | ||
| disconnectTimer.expires_from_now(chrono::seconds(2)); | ||
| disconnectTimer.async_wait([this](boost::system::error_code const& ) {}); |
There was a problem hiding this comment.
You'll want to close the socket in the timer handler (since we want to close the socket after the timer has expired)...your code currently kicks off the timer wait but then immediately closes the socket (if it's already open).
Note that the timer wait is asynchronous, not synchronous, which means that async_wait call returns immediately (the way this works behind the scenes at a high level is that the wait operation is kicked off and once the wait has completed the handler is placed into a completion handler queue, and the handlers in the queue are executed on whichever thread called io_context::run, which in this case is the network thread. Note that the io_context::run call blocks until all asynchronous operations have completed and associated handlers have been executed)
There was a problem hiding this comment.
Oh that's what the anomaly was in my testing before and after changes. So stupid of me.
There was a problem hiding this comment.
Oh that's what the anomaly was in my testing before and after changes. So stupid of me.
@twinstar26 No worries accidents happen 😄
|
Okay so I will be shifting the timer into Session::Disconnect. And we will wait for 2 secs. We woundn't care if the packet has been sent or not. Just wait for 2 secs and let the async writing happen (may happen in worst case that timer runs out before writing happens and we just close the socket in the middle of the operation). But I think we should make some kinda provision to let us know for sure that the operation has been completed. I mean the async is really cool but before closing we should make sure everything is fine so that we will have a goodnight's sleep. :-) Also you mentioned that we should use Host::io_context. I did think about that at first but then I realized that we will be sharing it with something and if so will it cause problems like some code is using it for its async ops and we are using it in the middle of it?? |
Distinct objects: Safe. Lines 206 to 214 in 007d89a Host do call restart(). So I think we are better off with new io_context. Let me know. |
|
@twinstar26 This We should reuse |
|
@gumb0 |
@twinstar26 I would name the Host function which returns a new timer something more general e.g. Also, You can also take a look at |
|
@gumb0 @halfalicious This is my log when using m_ioContext except socket closes immediately and doesn't wait for 2 secs. |
Can you please push your latest changes so one of us can take a look and try to repro your issue locally? (I took a look at your branch and it doesn't have the changes you're referring to) Take a look at this prototype I threw together, it only closes each socket once and waits 2 seconds after disconnect: 74b04c9 |
|
@halfalicious Possible solution might be to create another method inside My branch is https://github.com/twinstar26/aleth/tree/aleth-test |
|
log_wait_handler_not _executed.txt Here in the logs there is no log |
Codecov Report
@@ Coverage Diff @@
## master #5658 +/- ##
==========================================
- Coverage 62.97% 62.91% -0.07%
==========================================
Files 350 349 -1
Lines 29991 29734 -257
Branches 3361 3347 -14
==========================================
- Hits 18886 18706 -180
+ Misses 9885 9819 -66
+ Partials 1220 1209 -11 |
| else | ||
| drop(_reason); | ||
| }); | ||
| m_server->pollAsyncTimerHandler(); |
There was a problem hiding this comment.
What happens if you don't do this?
There was a problem hiding this comment.
@gumb0
Generally we poll when we are calling the function to which the handler belongs multiple times (in a recursion or a loop) so that any handlers ready to run will start its execution.
The original motivation was when in a way I knew unit-tests were calling disconnect multiple times which later you confirmed that they create and destroy host multiple times.
But I realized that there are various parts of code that do call disconnect in a loop for eg.
Lines 247 to 263 in 4946ed6
So I kept
pollAsyncTimerHandler public in Host definition of which is herehttps://github.com/twinstar26/aleth/blob/e83913941ace68e2b41597ddae7fa5ba8e6c3b6c/libp2p/Host.h#L266-L269
There was a problem hiding this comment.
Also I realize that there is a bug that we should be calling _drop(DisconnectReason) in here too right?
https://github.com/twinstar26/aleth/blob/e83913941ace68e2b41597ddae7fa5ba8e6c3b6c/libp2p/Session.cpp#L312-L314
So that if timer fails for some reason we will issue a drop. What do you think @gumb0 ?
There was a problem hiding this comment.
I actually don't think you need to call drop() there...I think that the handler will only ever called with a boost error code if the timer is cancelled but you never call cancel on the timer.
|
ToDo list
|
This should be |
|
I don't think you need to check the error code in the timer handler, you can just check if the socket is open and if so, close the socket. There's no risk of use-after-free since the handler only accesses Session state and the Session instance is guaranteed to still be available since the handler takes a shared_from_this. |
|
Good call on the timer handlers not being executed when the timers are cancelled (on shutdown) - I don't think we care about this when Aleth is shutting down since I think that the sockets which are left open will be reclaimed when the process exits, but I do think we care about this when running tests since I think that the same process is used to run all tests. I think the best way to ensure that the handlers are executed is to modify This would prevent the disconnect peers loop from exiting until all of the I've tested this change locally and the handlers are executed in the relevant network tests. The down side of this change is that some of the network tests will take a little longer to run since they wait 2 seconds for the disconnect timers to expire but there aren't many tests which do this so think that's okay (@gumb0 : please correct me if I'm wrong here) |
|
@halfalicious Would this result in 2 seconds (or more) wait at aleth exit? I don't think it's very necessary to wait at exit. And sockets will be anyway closed in |
@gumb0 Yes, it would. Thanks for calling out that the sockets are closed in the dtor, @twinstar26 they are closed here: Line 44 in 5bf79e5 We also poll after calling disconnect in |
|
Thanks for the reviews and suggestions!! @gumb0 and @halfalicious So to conclude:
After these changes aleth will not wait for 2 secs during tests but actual working of aleth is quite different and tests don't actually simulate that. |
|
@twinstar26 Just checking in to see if you're still working on this? |
@halfalicious . Sorry! Yes I am working. I was busy with my college stuff. I will push a commit in 3-4 hours from now. |
No worries, just wanted to see if you were still interested in wrapping this up 😄 I've reviewed your new changes, left comments in the new PR. |
|
Replaced by #5707 |
Fixes #5650