p2p: Fix error logging for connection stop#3644
p2p: Fix error logging for connection stop#3644defunctzombie wants to merge 4 commits intotendermint:masterfrom Polychain:fix-p2p-connection-stop
Conversation
| <-c.doneSendRoutine | ||
|
|
||
| // Send and flush all pending msgs. | ||
| // By now, IsRunning == false, |
There was a problem hiding this comment.
I deleted this comment because it is false. In my testing IsRunning was still true - that is why I needed to add the additional signaling. Maybe a better fix is to identify why IsRunning is still true.
There was a problem hiding this comment.
I looked closer and the reason is that FlushStop is outside of the Stop base service routine - this the base does not yet know that it is being stopped.
|
Thoughts on this change? While cosmetic - it does help cleanup log output which is useful for node and validator operators that scrape logs for errors and anomalies. |
p2p/conn/connection.go
Outdated
| } | ||
|
|
||
| // inform the recvRouting that we are shutting down | ||
| select { |
There was a problem hiding this comment.
does this need to be sent on here or can we just close it ?
There was a problem hiding this comment.
This needs to be sent here to inform the receive routine that it needs to stop. The is no mechanism in go (that I have found) to receive socket events therefore you need to roll your own notifications to the go routine for receive that was setup previously.
There was a problem hiding this comment.
But can the channel just be closed here? Or even later? It's not immediately clear why we'd need to send on the channel, and then close it shortly there after (note close(ch) has the effect of unblocking anything receiving on the channel).
There was a problem hiding this comment.
Ah - I did not know that about channels. I have removed this select that manually sent a bool in favor of closing the channel (already done lower in the same method). I have changed the channel to a struct{} type since nothing is sent on it anymore.
| } | ||
|
|
||
| // stopServices was invoked and we are shutting down | ||
| // receiving is excpected to fail since we will close the connection |
There was a problem hiding this comment.
So this happens when err != io.EOF ? What is the error here?
There was a problem hiding this comment.
This is to handle graceful shutdown via stopServices(). Go does not have a nice way to see that we have closed the socket - only when we go to read the socket and it fails do we need to see if we expected this failure or not. This logic queries the channel for the quit signal to notify us that we are expecting to exit and do not care about the error.
This changeset fixes two types of false-positive errors occurring during connection shutdown. The first occurs when the process invokes FlushStop() or Stop() on a connection. While the previous behavior did properly wait for the sendRoutine to finish, it did not notify the recvRoutine that the connection was shutting down. This would cause the recvRouting to receive and error when reading and log this error. The changeset fixes this by notifying the recvRoutine that the connection is shutting down. The second occurs when the connection is terminated (gracefully) by the other side. The recvRoutine would get an EOF error during the read, log it, and stop the connection with an error. The changeset detects EOF and gracefully shuts down the connection.
xla
left a comment
There was a problem hiding this comment.
This change looks good and isolated enough, that the root cause is with the layering of service abstractions is just another strong signal to fix peer lifecycle managment - another day.
👍
💃
| quitSendRoutine chan struct{} | ||
| doneSendRoutine chan struct{} | ||
|
|
||
| // Closing quitRecvRouting will cause the recvRouting to evnetually quit. |
There was a problem hiding this comment.
| // Closing quitRecvRouting will cause the recvRouting to evnetually quit. | |
| // Closing quitRecvRouting will cause the recvRouting to eventually quit. |
| // so we dont race on calling sendSomePacketMsgs | ||
| <-c.doneSendRoutine | ||
|
|
||
| // Send and flush all pending msgs. |
There was a problem hiding this comment.
why remove // Send and flush all pending msgs.? looks like we're still flushing the messages
There was a problem hiding this comment.
Yea - I will put this back. The IsRunning comment is the part that was wrong. I accidentally removed this when I should not have.
|
@defunctzombie will you have time to fix #3644 (comment) and #3644 (comment)? @ebuchman ok to merge? |
|
Replaced with #3824 |
This changeset fixes two types of false-positive errors occurring during
connection shutdown.
The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.
The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.
I am not intimately familiar with idiomatic go practices so if there is a better approach for performing the signaling I did please let me know and I will update the PR. I used a channel buffer size of 1 to ensure the signal is received and non-blocking messaging to avoid breaking existing logic.