Fix intermittent busyRx on Portduino SX1262 (stale preamble IRQ)#9939
Fix intermittent busyRx on Portduino SX1262 (stale preamble IRQ)#9939NearlCrews wants to merge 1 commit into
Conversation
@NearlCrews, Welcome to Meshtastic!Thanks for opening your first pull request. We really appreciate it. We discuss work as a team in discord, please join us in the #firmware channel. Welcome to the team 😄 |
58dc30a to
afce676
Compare
|
|
There was a problem hiding this comment.
Pull request overview
This PR addresses intermittent busyRx transmission blocks on Linux-native (Portduino) builds using SX1262 radios by preventing stale PREAMBLE_DETECTED IRQ flags from falsely indicating an active reception, and by avoiding SX1262 duty-cycle auto-receive mode on Portduino.
Changes:
- Switch Portduino SX1262 receive mode from duty-cycle auto-receive to continuous receive.
- In
isActivelyReceiving()(Portduino only), clear and re-checkPREAMBLE_DETECTEDwhen it appears latched withoutHEADER_VALIDto avoid blocking TX on stale IRQ state.
You can also share your feedback on Copilot code review. Take the survey.
| constexpr uint32_t STALE_PREAMBLE_RECHECK_MS = 5; | ||
| uint16_t irq = lora.getIrqFlags(); | ||
| if ((irq & RADIOLIB_SX126X_IRQ_PREAMBLE_DETECTED) && !(irq & RADIOLIB_SX126X_IRQ_HEADER_VALID)) { | ||
| lora.clearIrqFlags(RADIOLIB_SX126X_IRQ_PREAMBLE_DETECTED); | ||
| delay(STALE_PREAMBLE_RECHECK_MS); | ||
| irq = lora.getIrqFlags(); |
| constexpr uint32_t STALE_PREAMBLE_RECHECK_MS = 5; | ||
| uint16_t irq = lora.getIrqFlags(); | ||
| if ((irq & RADIOLIB_SX126X_IRQ_PREAMBLE_DETECTED) && !(irq & RADIOLIB_SX126X_IRQ_HEADER_VALID)) { | ||
| lora.clearIrqFlags(RADIOLIB_SX126X_IRQ_PREAMBLE_DETECTED); |
There was a problem hiding this comment.
You mention "Real receptions re-assert the flag within a few ms.". This may often be true, but I don't think it's guaranteed. What happens if you clear the IRQ just before the preamble ends and it continues decoding the LoRa header?
Even then, STALE_PREAMBLE_RECHECK_MS can not be a constant, it will depend on preambleTimeMsec, which depends on the LoRa settings. For slow presets, 5ms will not be enough to detect (part of) a preamble.
On Linux-native (Portduino) platforms, the SX1262 radio intermittently gets stuck in RX state with "Can not send yet, busyRx" errors followed by "Ignore false preamble detection". This prevents packet transmission and eventually fills the TX queue. Root cause: the duty-cycle auto-receive mode periodically runs internal CAD (channel activity detection) checks, and each CAD cycle can latch a PREAMBLE_DETECTED IRQ flag in the SX1262 even when no real packet is arriving. On bare-metal MCUs these stale flags are cleared or overwritten within a few symbol times before anything reads them, because IRQ-line polling happens at nanosecond scale. On Linux, GPIO reads through gpiod take microseconds per access, so getIrqFlags() frequently catches those stale CAD-induced flags. isActivelyReceiving() then falsely reports the radio as busy and blocks TX until the existing 2 * preambleTimeMsec fallback in RadioLibInterface::receiveDetected() times out. Fix: on Portduino only, use startReceive(RX_TIMEOUT_INF, ...) instead of startReceiveDutyCycleAuto(). Continuous receive has no sleep/CAD cycles, so the stale-flag source is eliminated at the root. Power saving from duty cycling is irrelevant on mains-powered Linux-class devices (Raspberry Pi, x86 gateways, etc.). No change to isActivelyReceiving() itself — the existing receiveDetected() logic (with its 2 * preambleTimeMsec stale-flag fallback) handles any residual cases correctly on all platforms. Tested on Raspberry Pi 5 with SX1262/E22-900M30S (PiMesh 1W) across multiple configs including multi-channel setups with MQTT, boosted RX gain, and full module config. Before fix: busyRx on 60-80% of TX attempts under high radio activity, occasional hangs at low activity. After fix: no spurious busyRx observed; packet flow is stable. No behavior change on non-Portduino (ESP32 / nRF52 / RP2040 / STM32WL) builds.
afce676 to
797601f
Compare
|
@GUVWAF you are right on both counts — thanks for the careful read. I've pushed an update ( On your in-flight-reception concern: the SX1262 fires On the preset-dependent timing: right too — Why the remaining change is enough. The root cause is @Copilot — the 5 ms vs 10 ms mismatch is moot now, that code is gone. CLA is still unsigned; I'll sign before merge. |
|
I don't see how not using the duty cycle receiving would solve the issue, since this only affects the radio and has nothing to do with handling the IRQ flag. Moreover, with our current preamble length, the I think the reason it works for you now is because #9895 is merged in the meantime. |
|
You're right on both counts. We pass Plausible that #9895 explains why my soak stopped reproducing the busyRx. I haven't isolated that. Closing this. |
Summary
On Linux-native (Portduino) builds, the SX1262 radio gets stuck in RX with
Can not send yet, busyRx/Ignore false preamble detectionerrors, blocking TX until the existing stale-flag fallback inRadioLibInterface::receiveDetectedtimes out (2 * preambleTimeMsec). Under high radio activity I measured this on 60-80% of TX attempts; at low activity it was sporadic.Fixes #9933. Related: #9580 (same symptoms on RPi Zero 2W), #4298 (GPIO pin issues with SX1262 on RPi 5).
Root cause
startReceiveDutyCycleAutoputs the SX1262 into a sleep/CAD/sleep cycle. Every CAD pass can latch aPREAMBLE_DETECTEDIRQ flag even when no real packet is arriving. On bare-metal MCUs those stale flags get cleared/overwritten within a few symbol times before anything sees them, because IRQ polling happens at nanosecond scale. On Linux,gpiodreads are microseconds each, sogetIrqFlags()regularly catches a stale CAD-induced flag.isActivelyReceiving()then returnstrue, TX is gated, and we wait out the2 * preambleTimeMsecstale-flag fallback before unblocking — by which timesetTransmitDelay()has already pushed the TX attempt into a new random window where the same race can recur.Fix
On Portduino only, replace
startReceiveDutyCycleAuto(preambleLength, 8, ...)withstartReceive(RADIOLIB_SX126X_RX_TIMEOUT_INF, ...). Continuous receive has no sleep/CAD cycles, so the stale-flag source is eliminated at the root. Power saving from duty cycling is irrelevant on mains-powered Linux-class devices (Raspberry Pi, x86 gateways, femtofox).isActivelyReceiving()is unchanged — the existingreceiveDetectedlogic with its2 * preambleTimeMsecfallback remains the correct place to handle any residual stale flags on any platform.What this PR no longer does
An earlier revision also tried to accelerate stale-flag recovery inside
isActivelyReceiving()on Portduino by clearingPREAMBLE_DETECTEDand re-reading after 5 ms. That was wrong on two counts (thanks to @GUVWAF for catching it):PREAMBLE_DETECTEDlater in the same reception — preamble detection is a one-shot IRQ, after which the chip moves to sync word → header → payload. If the flag was cleared mid-reception (e.g. between end-of-preamble andHEADER_VALID), the recheck would see "no preamble," declare the reception stale, and TX could fire into an in-progress RX.preambleLength * (2^sf) / bw). LongFast is ~131 ms; LongSlow is ~524 ms. 5 ms was 25–100× too short to distinguish "real preamble, still decoding" from "stale flag."That code is removed in the current commit. The continuous-receive change alone is sufficient on my hardware — I did not observe any residual
busyRxafter applying it.Behavior change
startReceiveDutyCycleAuto(16-symbol preamble, 8-symbol RX window)startReceive(RX_TIMEOUT_INF, ...)startReceiveDutyCycleAuto(...)No change to
isActivelyReceivingon any platform. No change on battery-powered builds.Test plan
Tested on Raspberry Pi 5 with SX1262 (E22-900M30S) HAT, full config (4 channels, MQTT, boosted RX gain, GPS, neighbor info, telemetry).
busyRxon 60-80% of TX attempts under heavy traffic (firmware 2.7.15 and 2.7.20)busyRxacross 10 consecutive 45-second test runsbusyRxin a ~24 h sustained soak with mixed trafficIgnore false preamble detectionlog no longer appears in steady stateSX126x init result 0)CLA will be signed before merge.