-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add OtaNonce to OtaCrcInitializer #3294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // For rates where the TOA is longer than half the packet period schedule the TOCK for rougly 1x TOA before | ||
| // the TX's end of the period so telemetry is received by the TX in the correct period. For all others, | ||
| // schedule TOCK to be PACKET_TO_TOCK_SLACK (us) after RX packet reception. | ||
| int32_t slack = std::max(ExpressLRS_currAirRate_Modparams->interval - 2 * ExpressLRS_currAirRate_RFperfParams->TOA, (int32_t)PACKET_TO_TOCK_SLACK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to check is how well this plays with the transmitters Rxnb timeout, which is set to 1 interval.
With this change and at close range the transmitter will have finished receiving tlm right at the end of the timeout, which is ok because the timeout is cancelled on preamble/sync. Add delay due to ToF and you have artificially caused a tlm loss due to the preamble not being received before the timeout.
I know I know... its F500 and who TF is flying at any kind of range. But this is the type of thing to bite us later. So it may be worthwhile looking at the transmitters timeout interval and having it cancel a period of TOA before the next packet Txnb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah damn, you're right about that, I had forgotten about the Rx timeout.
For F500 the transmit ends at ~500us, meaning there's a deadline of approximately 500us into the next cycle for a preamble to be received and cancel the timer. Given that the F500 telemetry RXdoneISR fires around 350us into the cycle (this PR) and subtracting the 389us TOA, that means the preamble started at roughly -40us which would be 540us of "range margin" before telemetry stopped working (~80km round trip for vacuum light).
It's been a while, but remind me what the RX timeout does for us? For telemetry it seems the timeout could only be a hindrance to achieving long range since it would always fire too early. It should be set for 2 * interval - 2 * TOA - 2 * Overhead for it to fire at the last possible start-of-reception in the telemetry packet period. For F500 that's 2*2000 - 2*389 - 2*100 = 3022 or more than 50% additional time left on the table just from the timeout. For the RX it seems we have other mechanisms of timing which would fire before the timeout as well. Is it just a safety mechanism to prevent stalling, and if so can we up it to 1.5x interval?
|
Nice addition of jamming nonce in the crc. Some SPI trances showing the IRQs would be great to confirm the changes around SLACK. |
|
There hasn't been any movement on this because I was unsure how we can calculate an appropriate timeout before the first packet is even transmitted. The timeout is strictly used on the TX side to take the RF out of receive mode, I believe for the purpose of reducing the turnaround time for the next period's transmit and keep it on a steady schedule. The complexity is that there's a relatively fixed amount of time between TOCK and starting the transmit, and possibly an LBA Clear Channel Assessment, and only then timeout starts at the end of the transmit. Table below shows measured timing of how far from the end of a period a typical telemetry packet is received using the existing
-- Deletes a bunch of other tables of calculated data and theory to get to the point --
ResultsFor rates where TOA is less than half the packet period, use the theoretical time remaining this period ( F500 - Timeout increased to 2000 -> 2611us. It now times out with 34% remaining compared to 65% on master (measured both), increasing the range margin from 540us (~80km) to 1160us (~173km) but still less than the roughly 1700us (~255km) achievable using master's system. That value is impossible by any fixed value logic in which the telemetry safely arrives in the current period. Just like the F500 change from the OP, all other rates are unchanged. |
|
Thanks for the help keeping this up to date, @mha1 ❤️ ! |
|
@CapnBry: Sure thing. Re-tested it after the merge commit and everything works fine. This PR could use some reviews and approvals. |
|
I was going to ask @CapnBry, is this ready to review and for merging? |
Yes it is. I've tested it six ways to Sunday on all the modes for SX127x, SX1280, and LR1121 (although I can't remember now why it was OK Team900 200Hz didn't work at original time of testing). All the timing was tested in software measuring from the TOCK ISR to the appropriate RF ISR and the only one that has any change beyond a handful of microseconds is F500. Funny that the whole point of this PR was just to put the OtaNonce in the CRC seed, which Mickey did pretty fast, and then it became dozens of hours of figuring out then shifting the F500 telemetry to be received in the correct period 🤣 EDIT: I haven't done a full run through of all the testing after the merges but if we're going to move on getting this merged I can make another pass to confirm! |
|
I think we will need to get the 3.x.x merge done to master, then merge master to this branch and test that LR1121 & LBT is still working correctly before merging this to master. |
|
Yeah that's a good idea. I'll wait for 3.x.x-maint to get rolled in, merge to this pr and run the full suite again. |
|
Thanks, PK! I'm out of town through Sunday but I'll have to run through the 800 different test cases again on Monday and report back |
Testing methodology
Results
Other than that, everything related to this PR checks out, so long as DEBUG_LOG isn't defined. With DEBUG_LOG, 2.4GHz K modes sometimes could get out of sync, but the CRC was correct on the packets received, which is baffling to me. Why just the 2.4 K modes only? Could not reproduce it without DEBUG_LOG. Test DetailsLR1121 Gemini 915/FCC (4 full tests)
LR1121 Gemini 868/LBT
SX1280 2.4 FCC
SX1280 2.4 LBT
SX127x 915 FCC
DEBUG_LOG diffdiff --git a/src/lib/CrsfProtocol/CRSFEndpoint.cpp b/src/lib/CrsfProtocol/CRSFEndpoint.cpp
index c6cf682b..f95e3b4a 100644
--- a/src/lib/CrsfProtocol/CRSFEndpoint.cpp
+++ b/src/lib/CrsfProtocol/CRSFEndpoint.cpp
@@ -304,9 +304,9 @@ void CRSFEndpoint::parameterUpdateReq(const crsf_addr_e origin, const bool isElr
switch (parameterType)
{
case CRSF_FRAMETYPE_PARAMETER_WRITE:
- DBGLN("Set parameter [%s]=%u", parameter->name, parameterArg);
if (parameterIndex < MAX_CRSF_PARAMETERS && paramCallbacks[parameterIndex])
{
+ DBGLN("Set parameter [%s]=%u", parameter->name, parameterArg);
// While the command is executing, the handset will send `WRITE state=lcsQuery`.
// paramCallbacks will set the value when nextStatusChunk == 0, or send any
// remaining chunks when nextStatusChunk != 0
diff --git a/src/lib/Handset/CRSFHandset.cpp b/src/lib/Handset/CRSFHandset.cpp
index 2313a6b7..6dbdf269 100644
--- a/src/lib/Handset/CRSFHandset.cpp
+++ b/src/lib/Handset/CRSFHandset.cpp
@@ -585,9 +585,9 @@ bool CRSFHandset::UARTwdt()
retval = true;
}
#ifdef DEBUG_OPENTX_SYNC
- if (abs((int)((1000000 / (ExpressLRS_currAirRate_Modparams->interval * ExpressLRS_currAirRate_Modparams->numOfSends)) - (int)GoodPktsCount)) > 1)
+ //if (abs((int)((1000000 / (ExpressLRS_currAirRate_Modparams->interval * ExpressLRS_currAirRate_Modparams->numOfSends)) - (int)GoodPktsCount)) > 1)
#endif
- DBGLN("UART STATS Bad:Good = %u:%u", BadPktsCount, GoodPktsCount);
+ //DBGLN("UART STATS Bad:Good = %u:%u", BadPktsCount, GoodPktsCount);
UARTwdtLastChecked = now;
if (retval)
diff --git a/src/src/tx_main.cpp b/src/src/tx_main.cpp
index 3563d0e8..efd811b2 100644
--- a/src/src/tx_main.cpp
+++ b/src/src/tx_main.cpp
@@ -743,11 +743,13 @@ void ICACHE_RAM_ATTR nonceAdvance()
}
}
+uint32_t tockUs;
/*
* Called as the TOCK timer ISR when there is a CRSF connection from the handset
*/
void ICACHE_RAM_ATTR timerCallback()
{
+ tockUs = micros();
/* If we are busy writing to EEPROM (committing config changes) then we just advance the nonces, i.e. no SPI traffic
*/
if (commitInProgress)
{
@@ -927,6 +929,13 @@ bool ICACHE_RAM_ATTR RXdoneISR(SX12xxDriverCommon::rx_status const status)
{
return false; // Already received tlm, do not run ProcessTLMpacket() again.
}
+ static uint32_t last;
+ uint32_t now = millis();
+ if (now - last >= 10)
+ {
+ DBGLN("REM %u LQ %u", (ExpressLRS_currAirRate_Modparams->interval - (micros() - tockUs)) * 100 / ExpressLRS_currAirRate_Modparams->interval, LQCalc.getLQ());
+ last = now;
+ }
bool packetSuccessful = ProcessTLMpacket(status);
#if defined(Regulatory_Domain_EU_CE_2400) |
|
Bit of a problem. Both Mickey and I experience different issues with this and LBT in long term testing, so hold off on this until we can determine where the problem lies. He experiences telemetry drop off that stays dropped off and may not recover, while the uplink remains steady. I experienced 3x instances of the receiver losing connection for ~2.6s as if the TX stopped transmission. Further testing and logging is underway. |
|
Two weeks of testing later, the issue was determined to be present in the RF drivers in master and 3.x.x-maint, so I'd like to move forward on this PR. Changes to resolve the connection loss and jitter forthcoming from my dedicated peers. |
* initial commit * Adjust RX TOCK to later in period for low duty cycle rates * Move all FHSS to TOCK only * Extend TXs telemetry timeout for short TOA packets * Remove incorrect comment --------- Co-authored-by: mha1 Co-authored-by: pkendall64
Changes OtaCrcInitializer to also include the OtaNonce to attempt to guarantee RX and TX sync. The purpose is to resolve "low LQ" connections where an RX is close but is only randomly receiving packets by chance due to the TX restarting and having a completely different OtaNonce.
Much thanks go to @mha1 for all the technical discussion, proof of concept testing, initial work, and finding the F500 telemetry issue.
Fixes #3258
Perhaps addresses #3157
Details
If a TX stops for a moment due to being power cycled, the user hitting Bind while connected, or some config commits, a receiver may not fully time out and go to the disconnected state, meaning that if it manages to receive packets by random chance it will continue to stay in a "connected" state but at a very low LQ. The receiver usually will not get a sync packet to resolve the issue as the RX and TX will never be on the sync channel at the same time. This PR aims to resolve this by initializing the packet's CRC using the OtaNonce. CRC errors effectively prevent those random packet receptions from keeping the link alive, and the RX will return to the disconnected state and listen for a new SYNC to restart.
Technical
OtaCrcInitializer = (UID[4] << 8) | UID[5]) ^ (OTA_VERSION_ID << 8) ^ OtaNonceThe OTA version has been shifted up into the high byte of the initializer (was in the low byte), which will allow for up to 16 OTA versions before there is a version collision on the standard CRC-14. The Nonce is then XORed into the low byte if the packet is not a SYNC-type packet. SYNC packets always use 0 as the Nonce for the purpose of CRC, as the receiver does not know the OtaNonce before SYNC is received. This "unprotected" SYNC packet does not break the purpose of the PR as the SYNC packet would resolve the low-LQ connection even without this additional check.
OTA changes
RX code changes
OtaNonce + 1in all its calculations.TOCK issue
There's a major architecture issue with F500 that caused telemetry to stop working once the OtaNonce integrity check was added. This is because the RX receives the packet roughly 512us into the 2000us packet period. The old code schedules the next TOCK for 200us after the packet reception, which meant that the F500 telemetry would actually be received in the previous packet period on the TX. Excuse the crudity of this drawing but note how for F500 the RX's next packet period begins so early that it is received in the previous packet period
To resolve this, the RX code now will schedule the TOCK roughly 1x TOA from where it thinks the end of the TX's packet period is. This estimate does not include the roughly 2x 100us of time required on the TX and the RX to begin a transmission, so this guarantees that the RX's telemetry is received by the TX in the appropriate packet period. Affected packet rates:
Tested
Tested on every available packet rate on each and every one of the following hardware combinations