Skip to content

Conversation

@CapnBry
Copy link
Member

@CapnBry CapnBry commented May 16, 2025

Attempts to resolve issues since #3210 in Lua loading Other Devices and the underlying OTA issues which cause them when running in Wide switch mode at 1:16 ratio or slower.

Issue 1: Too many syncs / too short TlmBoost

Changes to tx_main.cpp. TlmBoost is correctly activating since the previous PR, but the logic that requests a SYNC packet to signal the RX to change ratios expected to go to 1:2 ratio, and therefore would request a ratio change for each piece of data it wanted to send. The extra sync packets would then immediately disable boost mode and set the ratio back to the user-configured value as soon as the upload was complete, leaving the downlink in slow mode. This caused the DEVICE_INFO packets from the RX to take 20-30s to arrive at 1:128, and even longer for any flight controller's DEVICE_INFO.

Issue 2: Lua expects all DEVICE_INFO to arrive before it finishes loading

Changes to elrsv3.lua. If you wait long enough, "Other Devices" will appear in the Lua, but clicking on it has 0 items in the list. This is because it only populates the list at the end of loading the main set of parameters, and the overly slowed remote info wouldn't arrive in time. I've rearranged the code so that the list is populated when the Other Devices folder is populated, as well as if a new DEVICE_INFO is received while in that folder.

The field re-request timeout has also been extended from 0.5s to 5.0s, as requesting every 0.5s at slow 50Hz 1:8 telemetry caused an abundance of repeated parameter chunks to be downloaded from the remote side.

These changes are not required as after the other fixes, the list almost always loads the Other Devices properly. However, loading of the RX's lua is roughly half the speed due to all the duplicated requests, and I believe the Other Devices list should more reliably populate.

Issue 3: Requesting next parameter before the previous has completed downloading

Changes to telemetry.cpp / lua.cpp. When stubborn_receiver completes a download from the RX, the data is forwarded to the lua as soon as the transfer is completed. However, the RX still has the telemetry slot locked until it receives the final ACK. With Wide switch mode, the telemetry ack bit is only sent every 8 channels packets, which has a 50/50 chance of falling on the telemetry slot, which causes it to get pushed to every 16 packets. Before that 16 packet timeframe, the lua will have requested the next parameter chunk. However, since the receiver still has the slot locked, there is nowhere to put it and the response is discarded. This leads to the loading stalling until a re-request of the parameter chunk.

The receiver code has been updated to retry queueing the parameter every 4 packets until it is successful, or another request has come in (which will replace the pending request).

New Behavior

If the RX ever queues a PARAM_INFO message all other telemetry is now paused (dropped) for 10 seconds. During this time, only DEVICE_INFO and PARAM_INFO type messages will be allowed into the queue. This is to speed up the loading of the remote lua while under load coming from a connected flight controller (such as rotorflight which does a good job of using the entirety of the downstream bandwidth). While the remote lua is loading, users will experience "sensor lost" messages from EdgeTX.

Bugfixes

  • There was a stack overrun issue in sendCRSFparam() on the RX where the buffer (paramInformation) for the is only DEVICE_INFORMATION_LENGTH (~32 bytes). This can not hold the full 64 bytes as needed for a full parameter chunk.
  • telemetry.cpp allocates a 64 byte slot for DEVICE_INFO which is not used. All DEVICE_INFO packets go into the 2-slot slots, with the RX forced to use slot 2, and the Flight Controller using whichever slot is open. The extra buffer has been removed.
  • Telemetry on the RX was using overlapping buffer areas for its queue, leading to the 2-slot buffer actually reusing the fixed slots. This caused corruption of DEVICE_INFO and PARAM_INFO packets from the RX, or corruption and dropping of standard telemetry items when the 2-slot was used. This overlap has been resolved.

Going mad with power

This also adds a version number to the lua so users can tell what they are running. It appears in the EXIT line and we'll have to manually tag it in the source file.
IMG_20250518_125812

@CapnBry CapnBry changed the title Boost18sync Slow Other Devices loading if at all on 3.5.5 May 16, 2025
Copy link
Collaborator

@pkendall64 pkendall64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

@Mick51
Copy link

Mick51 commented May 18, 2025

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

On jumper T15 it didn't work at all. I hope it will work.

@mha1
Copy link
Contributor

mha1 commented May 18, 2025

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

On jumper T15 it didn't work at all. I hope it will work.

You talking about3.5.5 or this PR?

@CapnBry
Copy link
Member Author

CapnBry commented May 18, 2025

Pushed changes to lua which caused it to behave weirdly when selecting a remote device with 0 fields defined and trying to back out, along with causing exiting the Other Devices display if a duplicate device info packet is received. Thanks Mickey for finding this.

Also added v15 to the menu to display version number

@Mick51
Copy link

Mick51 commented May 18, 2025

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

On jumper T15 it didn't work at all. I hope it will work.

You talking about3.5.5 or this PR?

3.5.5, I haven't tested this pr yet, I'll test it tomorrow. 😉

@mha1
Copy link
Contributor

mha1 commented May 18, 2025

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

On jumper T15 it didn't work at all. I hope it will work.

You talking about3.5.5 or this PR?

3.5.5, I haven't tested this pr yet, I'll test it tomorrow. 😉

We know about the 3.5.5 issue. You should test this PR. Flash rx and module and don't forget to update the elrs lua script with the one this PR changed.

@Mick51
Copy link

Mick51 commented May 18, 2025

Certainly a lot slower loading 150Hz at 1:8 telemetry, but at least it works.

On jumper T15 it didn't work at all. I hope it will work.

You talking about3.5.5 or this PR?

3.5.5, I haven't tested this pr yet, I'll test it tomorrow. 😉

We know about the 3.5.5 issue. You should test this PR. Flash rx and module and don't forget to update the elrs lua script with the one this PR changed.

Where can I find the new lua ?

@mha1
Copy link
Contributor

mha1 commented May 18, 2025

https://github.com/ExpressLRS/ExpressLRS/blob/054859c1a104e38ec469dc776557f86333ca37b8/src/lua/elrsV3.lua

Hit the three dots to download

@CapnBry
Copy link
Member Author

CapnBry commented May 21, 2025

Flagged Do Not Merge as we decide what the f we're going to do about the crumbling house of cards of a lossy telemetry system operating in a low speed mode.

@CapnBry
Copy link
Member Author

CapnBry commented May 23, 2025

Pushed new changes

New Behavior

If the RX ever queues a PARAM_INFO message all other telemetry is now paused (dropped) for 10 seconds. During this time, only DEVICE_INFO and PARAM_INFO type messages will be allowed into the queue. This is to speed up the loading of the remote lua while under load coming from a connected flight controller (such as rotorflight which does a good job of using the entirety of the downstream bandwidth). While the remote lua is loading, users will experience "sensor lost" messages from EdgeTX.

Note that anything already in the telemetry queue will be sent, so this means typically the first PARAM_INFO message will be delayed, but subsequent messages should be received much more quickly. Reduces load time of an EP2 from 35s to 10s on 150Hz Std if the receiver is being fully loaded with CRSF telemetry, and makes it actually work without timeouts if something is utilizing the 2-slot telemetry.

This mechanism is simply a stopgap solution, as a refactor of this unit has already been performed by ❤️ PK ❤️ and may not be needed.

Bugfixes

  • MAJOR: Telemetry on the RX was using overlapping buffer areas for its queue, leading to the 2-slot buffer actually reusing the fixed slots. This caused corruption of DEVICE_INFO and PARAM_INFO packets from the RX, or corruption and dropping of standard telemetry items when the 2-slot was used. This overlap has been resolved.

Do Not Merge

Waiting to remove this tag until this can be retested by all relevant parties.

@CapnBry CapnBry merged commit ace7797 into ExpressLRS:3.x.x-maintenance May 26, 2025
48 checks passed
@pkendall64 pkendall64 mentioned this pull request May 26, 2025
@CapnBry CapnBry deleted the boost18sync branch June 3, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants