Add OtaNonce to OtaCrcInitializer #3294

CapnBry · 2025-08-04T16:08:09Z

Changes OtaCrcInitializer to also include the OtaNonce to attempt to guarantee RX and TX sync. The purpose is to resolve "low LQ" connections where an RX is close but is only randomly receiving packets by chance due to the TX restarting and having a completely different OtaNonce.

Much thanks go to @mha1 for all the technical discussion, proof of concept testing, initial work, and finding the F500 telemetry issue.

Fixes #3258
Perhaps addresses #3157

Details

If a TX stops for a moment due to being power cycled, the user hitting Bind while connected, or some config commits, a receiver may not fully time out and go to the disconnected state, meaning that if it manages to receive packets by random chance it will continue to stay in a "connected" state but at a very low LQ. The receiver usually will not get a sync packet to resolve the issue as the RX and TX will never be on the sync channel at the same time. This PR aims to resolve this by initializing the packet's CRC using the OtaNonce. CRC errors effectively prevent those random packet receptions from keeping the link alive, and the RX will return to the disconnected state and listen for a new SYNC to restart.

Technical

OtaCrcInitializer = (UID[4] << 8) | UID[5]) ^ (OTA_VERSION_ID << 8) ^ OtaNonce

The OTA version has been shifted up into the high byte of the initializer (was in the low byte), which will allow for up to 16 OTA versions before there is a version collision on the standard CRC-14. The Nonce is then XORed into the low byte if the packet is not a SYNC-type packet. SYNC packets always use 0 as the Nonce for the purpose of CRC, as the receiver does not know the OtaNonce before SYNC is received. This "unprotected" SYNC packet does not break the purpose of the PR as the SYNC packet would resolve the low-LQ connection even without this additional check.

OTA changes

Wide packet mode no longer uses the FHSS index to attempt to guarantee RX / TX sync. This is redundant with these changes and while the Wide CRC init is helpful in preventing a single Nonce slip, it clearly did not work when the two were further apart.

RX code changes

Changed to move the OtaNonce advancement to at the start of the packet period, rather than doing it at the end and always using OtaNonce + 1 in all its calculations.
Always FHSS at the beginning of the packet period rather than maybe doing it after packet reception. There's no safety afforded by doing this in advance, as if a packet is missed it will have to do the FHSS before it can receive anyway so there's no point to having this redundant code.
Change scheduling of the RX's packet period start (TOCK) to much later for F500, and any other future low duty cycle packet rates.

TOCK issue

There's a major architecture issue with F500 that caused telemetry to stop working once the OtaNonce integrity check was added. This is because the RX receives the packet roughly 512us into the 2000us packet period. The old code schedules the next TOCK for 200us after the packet reception, which meant that the F500 telemetry would actually be received in the previous packet period on the TX. Excuse the crudity of this drawing but note how for F500 the RX's next packet period begins so early that it is received in the previous packet period

Rate	TOA (us)	TXdone (us)	Telem Arrives at TX (us)
50Hz	10798	10885	2460
100Hz Full	7605	7695	5875
150Hz	5871	5961	5718
250Hz	3300	3422	3188
333Hz Full	2374	2469	2256
500Hz	1507	1600	1492
F500	389	512	1335 in Nonce-1
F1000	389	512	331

To resolve this, the RX code now will schedule the TOCK roughly 1x TOA from where it thinks the end of the TX's packet period is. This estimate does not include the roughly 2x 100us of time required on the TX and the RX to begin a transmission, so this guarantees that the RX's telemetry is received by the TX in the appropriate packet period. Affected packet rates:

F500 - Telemetry received ~950us later, around 350us into proper period
F1000, D500, D250 - Telemetry received ~20us later, around 350us into proper period

Tested

Tested on every available packet rate on each and every one of the following hardware combinations

Ranger Micro + HappyModel EP2 (FCC)
Ranger Micro + HappyModel EP2 (LBT)
HappyModel ES900 TX + HappyModel ES900 RX
HappyModel ES900 TX + DAKEFPV 900 (fail on 200Hz)
BetaFPV SuperG Gemini + HappyModel EP Dual (in Gemini, Switch, Ant 2 modes)
RadioMaster TX15 + HappyModel EP2 (FCC)
RadioMaster TX15 + DAKEFPV 900 (fail on 200Hz Team900)

JyeSmith · 2025-08-04T22:43:40Z

src/src/rx_main.cpp

+    // For rates where the TOA is longer than half the packet period schedule the TOCK for rougly 1x TOA before
+    // the TX's end of the period so telemetry is received by the TX in the correct period. For all others,
+    // schedule TOCK to be PACKET_TO_TOCK_SLACK (us) after RX packet reception.
+    int32_t slack = std::max(ExpressLRS_currAirRate_Modparams->interval - 2 * ExpressLRS_currAirRate_RFperfParams->TOA, (int32_t)PACKET_TO_TOCK_SLACK);


Something to check is how well this plays with the transmitters Rxnb timeout, which is set to 1 interval.

With this change and at close range the transmitter will have finished receiving tlm right at the end of the timeout, which is ok because the timeout is cancelled on preamble/sync. Add delay due to ToF and you have artificially caused a tlm loss due to the preamble not being received before the timeout.

I know I know... its F500 and who TF is flying at any kind of range. But this is the type of thing to bite us later. So it may be worthwhile looking at the transmitters timeout interval and having it cancel a period of TOA before the next packet Txnb.

Ah damn, you're right about that, I had forgotten about the Rx timeout.

For F500 the transmit ends at ~500us, meaning there's a deadline of approximately 500us into the next cycle for a preamble to be received and cancel the timer. Given that the F500 telemetry RXdoneISR fires around 350us into the cycle (this PR) and subtracting the 389us TOA, that means the preamble started at roughly -40us which would be 540us of "range margin" before telemetry stopped working (~80km round trip for vacuum light).

It's been a while, but remind me what the RX timeout does for us? For telemetry it seems the timeout could only be a hindrance to achieving long range since it would always fire too early. It should be set for 2 * interval - 2 * TOA - 2 * Overhead for it to fire at the last possible start-of-reception in the telemetry packet period. For F500 that's 2*2000 - 2*389 - 2*100 = 3022 or more than 50% additional time left on the table just from the timeout. For the RX it seems we have other mechanisms of timing which would fire before the timeout as well. Is it just a safety mechanism to prevent stalling, and if so can we up it to 1.5x interval?

JyeSmith · 2025-08-04T22:44:59Z

Nice addition of jamming nonce in the crc.

Some SPI trances showing the IRQs would be great to confirm the changes around SLACK.

CapnBry · 2025-08-17T20:21:31Z

There hasn't been any movement on this because I was unsure how we can calculate an appropriate timeout before the first packet is even transmitted. The timeout is strictly used on the TX side to take the RF out of receive mode, I believe for the purpose of reducing the turnaround time for the next period's transmit and keep it on a steady schedule. The complexity is that there's a relatively fixed amount of time between TOCK and starting the transmit, and possibly an LBA Clear Channel Assessment, and only then timeout starts at the end of the transmit.

Table below shows measured timing of how far from the end of a period a typical telemetry packet is received using the existing interval timeout. Note that the F500 value is using the RX sync adjustment from this PR; without it its value is >100% due to arriving in the previous period. "Percentage of interval remaining" is the units used to allow the values from different packet rates to be compared. This was measured by turning on the ISR for RX_TX_TIMEOUT. LBT was not used for this measurement as LBT almost always throws a timeout during the CCA, making measurement more cumbersome.

Rate (non-LBT)	Telem Arrives (% rem)	Timeout (% rem)
50Hz	87	44
100Hz Full	40	21
150Hz	13	7
250Hz	19	9
333Hz Full	23	11
500Hz	23	10
F500	80	65
F1000	61	30

-- Deletes a bunch of other tables of calculated data and theory to get to the point --

We don't seem to be leaving any telemetry range performance on the the table with the existing timeout (F500 excluded). Great!
We're fine with the timeout firing way later in the period than it would be to safely receive telemetry, giving a lot of leeway in what's an appropriate value

Results

For rates where TOA is less than half the packet period, use the theoretical time remaining this period (interval - TOA) plus half packet period. This is at least 1 full interval, but has enough padding that it shouldn't get close to the next TOCK.

F500 - Timeout increased to 2000 -> 2611us. It now times out with 34% remaining compared to 65% on master (measured both), increasing the range margin from 540us (~80km) to 1160us (~173km) but still less than the roughly 1700us (~255km) achievable using master's system. That value is impossible by any fixed value logic in which the telemetry safely arrives in the current period.

Just like the F500 change from the OP, all other rates are unchanged.

CapnBry · 2025-09-15T17:27:24Z

Thanks for the help keeping this up to date, @mha1 ❤️ !

mha1 · 2025-09-15T18:57:40Z

@CapnBry: Sure thing.

Re-tested it after the merge commit and everything works fine.

This PR could use some reviews and approvals.

pkendall64 · 2025-09-15T20:35:12Z

I was going to ask @CapnBry, is this ready to review and for merging?

CapnBry · 2025-09-15T20:59:06Z

I was going to ask @CapnBry, is this ready to review and for merging?

Yes it is. I've tested it six ways to Sunday on all the modes for SX127x, SX1280, and LR1121 (although I can't remember now why it was OK Team900 200Hz didn't work at original time of testing). All the timing was tested in software measuring from the TOCK ISR to the appropriate RF ISR and the only one that has any change beyond a handful of microseconds is F500.

Funny that the whole point of this PR was just to put the OtaNonce in the CRC seed, which Mickey did pretty fast, and then it became dozens of hours of figuring out then shifting the F500 telemetry to be received in the correct period 🤣

EDIT: I haven't done a full run through of all the testing after the merges but if we're going to move on getting this merged I can make another pass to confirm!

pkendall64 · 2025-09-16T09:12:09Z

I think we will need to get the 3.x.x merge done to master, then merge master to this branch and test that LR1121 & LBT is still working correctly before merging this to master.

CapnBry · 2025-09-16T10:44:29Z

Yeah that's a good idea. I'll wait for 3.x.x-maint to get rolled in, merge to this pr and run the full suite again.

CapnBry · 2025-09-20T00:30:16Z

Thanks, PK! I'm out of town through Sunday but I'll have to run through the 800 different test cases again on Monday and report back

CapnBry · 2025-09-23T16:01:42Z

Testing methodology

RX connected to standalone_test.py to monitor RQly
TX DEBUG with timing info added / monitor TQly
Wide switch mode
Std telemetry ratio, some additionally tested at 1:2
At least 20 seconds per rate
- Verify 100% LQ on RX
- Verify 100% LQ on TX
- Note 1RSS/2RSS for anomalies
- Note telemetry arrival time

Results

2.4GHz LBT SX1280 shows sub 100 LQ on all rates. This seems far to consistent to be LBT doing its thing. bdc340b (random commit from early Aug) and 3.5.6 also affected so I suppose this is just LBT? But why only SX1280 and not LR1121
master issue "No more deferred function slots available" on rate switch 2.4GHz SX1280 LBT and lock up. Incredibly rare (3x in maybe 50 switches)
?!??!??! TX on bdc340 (4.0) can connect to 3.5.6 RX?

Other than that, everything related to this PR checks out, so long as DEBUG_LOG isn't defined. With DEBUG_LOG, 2.4GHz K modes sometimes could get out of sync, but the CRC was correct on the packets received, which is baffling to me. Why just the 2.4 K modes only? Could not reproduce it without DEBUG_LOG.

Test Details

LR1121 Gemini 915/FCC (4 full tests)

TX Radiomaster Nomad
RX BAYCK Dual Band Gemini
Also tested with LR1121 "Ant 1" with practically identical results
RX Test2 SX1280 HappyModel EP2 (Overlapping 2.4)
RX Test3 SX127x BetaFPV SuperP 900 (Overlapping 900)
Unable to reproduce 2.4 K mode issues with DEBUG_LOG off

Rate	Telem Arrives (% rem)	Note
900 D50	8
900 25Hz	45
900 50Hz	8
900 100Hz	14
900 100 Full	57
900 200Hz	7
900 200 Full	36
900 250Hz	31
900 K1000 Full	26	TLM CRC Error rate increased?
2.4 50Hz	86
2.4 100 Full	38
2.4 150Hz	10
2.4 250Hz	12
2.4 333 Full	14
2.4 500Hz	7
2.4 DK250	22	FAIL - Connect drops / partial sync
2.4 DK500	22	FAIL ^^
2.4 K1000	15	FAIL ^^
X 100 Full	31
X 150Hz	32

LR1121 Gemini 868/LBT

TX Radiomaster Nomad
RX BAYCK Dual Band Gemini
Frequent Non-TLM CRC Errors (Noise?) on 868 100Hz Full, 200Hz
RX: Guru Meditation Error: Core 1 panic once when changing rates on 868, repeated on 2.4, repeated on X. Various places between timer locked / matching tx power. Unable to reproduce with DEBUG_LOG off
Unable to reproduce 2.4 K mode issues with DEBUG_LOG off

Rate	Telem Arrives (% rem)	Note
868 D50	8
868 25Hz	45
868 50Hz	8
868 100Hz	14
868 100 Full	57
868 200Hz	7
868 200 Full	36
868 250Hz	31
868 K1000	26
2.4 50Hz	85
2.4 100 Full	38
2.4 150Hz	10
2.4 250Hz	11
2.4 333 Full	13
2.4 500Hz	7
2.4 DK250	19	FAIL - Connect drops / partial sync
2.4 DK500	18	FAIL ^^
2.4 K1000	13	FAIL ^^
X 100 Full	32
X 150Hz	33

SX1280 2.4 FCC

TX Radiomaster Ranger Micro
RX HappyModel EP2

Rate	Telem Arrives (% rem)	Note
50Hz	87
100 Full	40
150Hz	12
250Hz	17
333 Full	21
500Hz	19
D250	61
D500	61
F500	77
F1000	54

SX1280 2.4 LBT

TX Radiomaster Ranger Micro
RX HappyModel EP2
"No more deferred function slots available" a ton then locked up on switch 50Hz -> 100, 150Hz -> 250Hz, incredibly rare, rate doesn't matter

Rate	Telem Arrives (% rem)	Note
50Hz	86	94% RQly
100 Full	39	90s% RQly
150Hz	11	85-95% RQly
250Hz	15	90s% RQly
333 Full	18	90s% RQly
500Hz	15	90s% RQly
D250	52
D500	52
F500	72	95-99% RQly
F1000	44	97-100% RQly

SX127x 915 FCC

TX HappyModel ES900
RX BetaFPV SuperD 900

Rate	Telem Arrives (% rem)	Note
D50	9
25Hz	45
50Hz	8
100Hz	15
100 Full	58
200Hz	8

DEBUG_LOG diff

diff --git a/src/lib/CrsfProtocol/CRSFEndpoint.cpp b/src/lib/CrsfProtocol/CRSFEndpoint.cpp
index c6cf682b..f95e3b4a 100644
--- a/src/lib/CrsfProtocol/CRSFEndpoint.cpp
+++ b/src/lib/CrsfProtocol/CRSFEndpoint.cpp
@@ -304,9 +304,9 @@ void CRSFEndpoint::parameterUpdateReq(const crsf_addr_e origin, const bool isElr
     switch (parameterType)
     {
     case CRSF_FRAMETYPE_PARAMETER_WRITE:
-        DBGLN("Set parameter [%s]=%u", parameter->name, parameterArg);
         if (parameterIndex < MAX_CRSF_PARAMETERS && paramCallbacks[parameterIndex])
         {
+            DBGLN("Set parameter [%s]=%u", parameter->name, parameterArg);
             // While the command is executing, the handset will send `WRITE state=lcsQuery`.
             // paramCallbacks will set the value when nextStatusChunk == 0, or send any
             // remaining chunks when nextStatusChunk != 0
diff --git a/src/lib/Handset/CRSFHandset.cpp b/src/lib/Handset/CRSFHandset.cpp
index 2313a6b7..6dbdf269 100644
--- a/src/lib/Handset/CRSFHandset.cpp
+++ b/src/lib/Handset/CRSFHandset.cpp
@@ -585,9 +585,9 @@ bool CRSFHandset::UARTwdt()
             retval = true;
         }
 #ifdef DEBUG_OPENTX_SYNC
-        if (abs((int)((1000000 / (ExpressLRS_currAirRate_Modparams->interval * ExpressLRS_currAirRate_Modparams->numOfSends)) - (int)GoodPktsCount)) > 1)
+        //if (abs((int)((1000000 / (ExpressLRS_currAirRate_Modparams->interval * ExpressLRS_currAirRate_Modparams->numOfSends)) - (int)GoodPktsCount)) > 1)
 #endif
-            DBGLN("UART STATS Bad:Good = %u:%u", BadPktsCount, GoodPktsCount);
+            //DBGLN("UART STATS Bad:Good = %u:%u", BadPktsCount, GoodPktsCount);

         UARTwdtLastChecked = now;
         if (retval)
diff --git a/src/src/tx_main.cpp b/src/src/tx_main.cpp
index 3563d0e8..efd811b2 100644
--- a/src/src/tx_main.cpp
+++ b/src/src/tx_main.cpp
@@ -743,11 +743,13 @@ void ICACHE_RAM_ATTR nonceAdvance()
   }
 }

+uint32_t tockUs;
 /*
  * Called as the TOCK timer ISR when there is a CRSF connection from the handset
  */
 void ICACHE_RAM_ATTR timerCallback()
 {
+  tockUs = micros();
   /* If we are busy writing to EEPROM (committing config changes) then we just advance the nonces, i.e. no SPI traffic
*/
   if (commitInProgress)
   {
@@ -927,6 +929,13 @@ bool ICACHE_RAM_ATTR RXdoneISR(SX12xxDriverCommon::rx_status const status)
   {
     return false; // Already received tlm, do not run ProcessTLMpacket() again.
   }
+  static uint32_t last;
+  uint32_t now = millis();
+  if (now - last >= 10)
+  {
+    DBGLN("REM %u LQ %u", (ExpressLRS_currAirRate_Modparams->interval - (micros() - tockUs)) * 100 / ExpressLRS_currAirRate_Modparams->interval, LQCalc.getLQ());
+    last = now;
+  }

   bool packetSuccessful = ProcessTLMpacket(status);
 #if defined(Regulatory_Domain_EU_CE_2400)

CapnBry · 2025-09-23T21:43:17Z

Bit of a problem. Both Mickey and I experience different issues with this and LBT in long term testing, so hold off on this until we can determine where the problem lies.

He experiences telemetry drop off that stays dropped off and may not recover, while the uplink remains steady. I experienced 3x instances of the receiver losing connection for ~2.6s as if the TX stopped transmission. Further testing and logging is underway.

CapnBry · 2025-10-08T12:30:47Z

Two weeks of testing later, the issue was determined to be present in the RF drivers in master and 3.x.x-maint, so I'd like to move forward on this PR. Changes to resolve the connection loss and jitter forthcoming from my dedicated peers.

* initial commit * Adjust RX TOCK to later in period for low duty cycle rates * Move all FHSS to TOCK only * Extend TXs telemetry timeout for short TOA packets * Remove incorrect comment --------- Co-authored-by: mha1 Co-authored-by: pkendall64

mha1 and others added 3 commits July 29, 2025 21:25

initial commit

a7cb9c4

Adjust RX TOCK to later in period for low duty cycle rates

3ac22ff

Move all FHSS to TOCK only

23f9e6e

CapnBry added the V4.0 🍔 label Aug 4, 2025

JyeSmith reviewed Aug 4, 2025

View reviewed changes

mha1 and others added 2 commits August 17, 2025 12:07

Merge branch 'master' into Exp_Desync_C

f216dd5

Extend TXs telemetry timeout for short TOA packets

6267e20

mha1 and others added 2 commits September 7, 2025 19:12

Merge branch 'master' into Exp_Desync_C

b02fd40

Merge branch 'master' into Exp_Desync_C

8988d55

Merge branch 'master' into Exp_Desync_C

20fa45f

Merge branch 'master' into Exp_Desync_C

e6181ff

CapnBry added 2 commits October 7, 2025 09:08

Remove incorrect comment

7651d73

Merge branch 'master' into Exp_Desync_C

aa57cb1

mha1 approved these changes Oct 8, 2025

View reviewed changes

pkendall64 approved these changes Oct 8, 2025

View reviewed changes

CapnBry merged commit eb376dc into ExpressLRS:master Oct 8, 2025
25 checks passed

CapnBry deleted the Exp_Desync_C branch October 25, 2025 21:00

Uh oh!

Add OtaNonce to OtaCrcInitializer #3294

Add OtaNonce to OtaCrcInitializer #3294

Uh oh!

Conversation

CapnBry commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Technical

OTA changes

RX code changes

TOCK issue

Tested

Uh oh!

JyeSmith Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

CapnBry Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JyeSmith commented Aug 4, 2025

Uh oh!

CapnBry commented Aug 17, 2025

Results

Uh oh!

CapnBry commented Sep 15, 2025

Uh oh!

mha1 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkendall64 commented Sep 15, 2025

Uh oh!

CapnBry commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkendall64 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CapnBry commented Sep 16, 2025

Uh oh!

CapnBry commented Sep 20, 2025

Uh oh!

CapnBry commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing methodology

Results

Test Details

LR1121 Gemini 915/FCC (4 full tests)

LR1121 Gemini 868/LBT

SX1280 2.4 FCC

SX1280 2.4 LBT

SX127x 915 FCC

DEBUG_LOG diff

Uh oh!

CapnBry commented Sep 23, 2025

Uh oh!

CapnBry commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CapnBry commented Aug 4, 2025 •

edited

Loading

CapnBry Aug 5, 2025 •

edited

Loading

mha1 commented Sep 15, 2025 •

edited

Loading

CapnBry commented Sep 15, 2025 •

edited

Loading

pkendall64 commented Sep 16, 2025 •

edited

Loading

CapnBry commented Sep 23, 2025 •

edited

Loading