Extend TcpMspConnector idle timeout #3437
Merged
+8
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Extend the timeout of the socket MSP connection from 2s to 10s. This is to prevent disconnects when the link is underutilized by the flight controller Configurator apps.
Fixes #3410? This can also be backported to 3.x.x-maintenance with just a 2 byte change in TCPSOCKET.
Details
In the 3.5.5 socket MSP system, the code would just memory leak the socket pointer if the connection was idle more than 2s. If new data would come in, the pointer would be reattached to the pump process. This meant there was effectively no timeout on incoming data as the connection would just sort of fix itself on the next packet.
When I refactored the code and fixed the memory leaks, this made some Configurators unreliable. In particular the test case of Rotorflight 4.5.1 with Configurator 2.2.1 has pauses you can drive a truck through. Any time the Configurator would stop loading mid-way through a tab, we'd drop the connection and force it closed. You can see these happening in Wireshark
The first one there was just 90ms of idle time away from us dropping the connection while we (x.x.x.151) waited on the next command request from the Configurator (x.x.x.145). The one further down was over a second waiting to get the MSP data from the FC to reply with.
This RF Configurator shows tab load times all over the place and I see a lot of redundant data being requested so it is likely this could be improved on the Configurator side to increase the performance. The upcoming betaflight PWA Android app loads things much more consistently and did not need any timeout extension. Here's RF tab load times before this PR
Quick Fix
I just upped the timeout from 2s to 10s and now can not reproduce the issue. The 2s was chosen because it matched the old leaky code, but it seems like 2s might also be the Rotorflight Configurator's internal timeout or something because I saw quite a few ~2s pauses during my testing. We can even make this longer, but because we don't maintain the currently active socket list I'd be wary of setting it too long and having multiple connections getting their data mixed somehow, as we only have one pipe to the flight controller. We also could force a connection closed if a new one comes in, but let's go with this for now.
Debug
I also returned some debug messages from before the 4.0 refactor and cleaned up the existing ones to be shorter and consistent.