fix(firmware): harden ESP32 OTA + nRF DFU update paths (hardware-validated)#5915
Merged
Conversation
The BLE OTA transport streamed fixed 512-byte chunks, but the Android BLE layer negotiates MTU 512 (requestMtu(512)) -> a 509-byte write payload. Each 512-byte chunk fragmented into two GATT writes ([509, 3]) and the loop waited for one ACK per fragment, while the device (esp32-unified-ota) is a byte stream that coalesces the writes and emits a single ACK per drain. The second waitForResponse then timed out (10s), failing BLE OTA in the common case. Clamp each streamed chunk to the negotiated write payload so one chunk is exactly one GATT write eliciting exactly one device response, and drive the loop on response type (ACK->continue, OK->complete on the last chunk, ERR->fail) instead of a fragment count. Success now requires an explicit terminal OK and any ERR fails the transfer, so a late device error can never be reported as success. WiFi OTA (no per-chunk ACK) and the nRF DFU paths are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply review findings on the BLE OTA streaming rewrite: - Throw TransferFailed on an OK received before the final chunk instead of silently treating it as an ACK (the device sends OK only at completion, so an early OK signals a size disagreement). - Mark isConnected @volatile: it is written from the connectionState collector and read by the streaming loop's connection-loss guard on a different dispatcher, matching the @volatile idiom in KableBleConnection. - Guard writeData against a non-positive negotiated write length so the write loop cannot spin without advancing (fall back to a single whole-buffer write). Add tests for the post-loop terminal path (ACK-then-OK success, ACK-then-ERR failure) and the premature-OK failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
FirmwareManifest declared `hwModel: String`, but published manifests emit it as an integer (the HardwareModel enum value), so every real manifest threw JsonDecodingException and FirmwareRetriever silently fell back to filename heuristics -- never using the authoritative app0 entry or its md5. Model only the consumed `files` array and let ignoreUnknownKeys drop the decorative scalar metadata, so an upstream scalar-type change can never break OTA resolution again. The test is rewritten against the real manifest shape (integer hwModel), which now regresses the bug. Verified on hardware (Pixel 6a -> Heltec V3): the 2.7.25 manifest now resolves "firmware-heltec-v3-2.7.25.104df5f.bin (app0, md5=...)" and the WiFi OTA completes -- device reboots into the new image and reconnects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…P32 OTA
For pre-2.7.17 releases (which ship no .mt.json manifest), firmware-<target>-
<version>.bin is a *merged* image (bootloader + partition table + app), while
firmware-<target>-<version>-update.bin is the bare app image. The no-manifest
fallback tried the plain .bin first, so it flashed the merged image to the app0
partition: the bytes transferred intact (device SHA256 matched) but esp_ota_end
rejected the misaligned image ("OTA End failed").
Try -update.bin before the plain .bin in the fallback. Safe for 2.7.17+, which
resolve via the manifest and ship no -update.bin, so they fall through to the
plain .bin (= app0). Adds a regression test for the both-present case.
Found via hardware testing: flashing 2.7.15 to a Heltec V3 failed at esp_ota_end
with the merged .bin.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tloader notice Legacy DFU streamed fixed 20-byte packets unless the bootloader advertised the OTAFIX `_DFU` name. Gate the packet size on the negotiated ATT MTU instead (word-aligned to the bootloader's multiple-of-4 rule, capped at 244): a bootloader that can't take large writes won't grant a large MTU, so it self-gates back to 20. This unlocks ~12x faster DFU on high-MTU bootloaders (e.g. OTAFIX 2.1) and fixes a latent word-alignment bug in the old high-MTU path. Removes the dead OTAFIX_NAME_SUFFIX and an unverified "bricks the device" comment the Adafruit bootloader source contradicts. Stock RAK4631 (AdaDFU) negotiates only MTU 23, so it stays at 20-byte packets; when that slow path is taken the upload screen now shows a tip that flashing the OTAFIX bootloader enables much faster BLE updates (ProgressState.hint, surfaced via DfuUploadTransport.isLowSpeedTransfer). Verified on hardware (Pixel 6a + RAK4631): self-gates to 20-byte packets and the DFU completes; tests cover the high-MTU path and the 4-byte-alignment floor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The three BLE transports each re-declared SCAN_RETRY_COUNT=3 / SCAN_RETRY_DELAY=2s and passed them explicitly to scanForBleDevice, whose defaults are already those values — rely on the defaults. Also remove the unused BUTTONLESS_WITH_BONDS (8EC90004) UUID; only BUTTONLESS_NO_BONDS is ever written. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hint Compose Multiplatform string resources don't use Android's apostrophe escaping; the `\'` would render a literal backslash in the UI. Match the file's existing unescaped convention (e.g. firmware_update_save_dfu_file's "device's"). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
workpad.md holds the /craft pipeline's per-task working notes; it leaked into version control via an earlier run. Remove it and gitignore root-level workpad files so future runs stay untracked. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7117020 to
4a648d8
Compare
This was referenced Jun 23, 2026
jeremiah-k
pushed a commit
to jeremiah-k/Meshtastic-Android
that referenced
this pull request
Jun 23, 2026
…audit (meshtastic#5916) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
:feature:firmwarewas traced end-to-end against the firmware source, theAdminMessageOTA/DFU protocol, themeshtastic/esp32-unified-otaloader, and the Nordic DFU spec to confirm every update path actually works. That audit surfaced several latent, OTA-breaking bugs — all fixed here and confirmed on real hardware (Pixel 6a driving a Heltec V3 over Wi-Fi & BLE, and a RAK4631 / WisMesh Pocket over Nordic DFU).🐛 Bug Fixes
[509, 3]while the device sends one ACK per drained chunk → the per-fragment ACK wait timed out (10 s) and BLE OTA failed in the common case. Each chunk is now clamped to the negotiated write payload (1 chunk = 1 write = 1 ACK), and the streaming loop is response-type-driven (ACK→continue, OK→done, ERR→fail) so a late device error can never be mis-reported as success.FirmwareManifest.hwModelwas typedString, but published.mt.jsonmanifests emit it as an integer, so every manifest threw and the retriever silently fell back to filename heuristics. The model now parses the real manifest shape.firmware-<target>.binis a merged bootloader+app image; flashing it toapp0left it misaligned and the device'sesp_ota_endrejected it after a byte-perfect transfer. The fallback now prefers the bare-app-update.bin.@Volatileon the connection-loss guard, explicit failure on a prematureOK, and a guard against a non-positive negotiated write length.🌟 New
AdaDFUcapped at MTU 23 → 20-byte packets), the upload screen now shows a hint that flashing the OTAFIX bootloader enables much faster BLE updates.🛠️ Refactoring & Performance
scanForBleDevicedefaults) and an unused DFU UUID.🧹 Chores
Testing Performed
Unit tests (
feature/firmwarecommonTest):BleOtaTransportTest— chunk-clamp (1 write/1 ACK per chunk), final-chunkERR→ failure (not success), premature-OK→ failure, post-loop terminalOK/ERR.FirmwareManifestTest— real-shaped manifest (integerhwModel) parses + resolvesapp0.CommonFirmwareRetrieverTest— prefers-update.binover the plain.binwhen both exist.LegacyDfuTransportTest— high-MTU packet path + 4-byte-alignment floor.Full baseline green:
spotlessApply spotlessCheck detekt assembleDebug test allTests(651 tests).Hardware validation (Pixel 6a):
-update.bin→ successFollow-ups filed separately: a BLE-transport dedup refactor (touches these just-validated connect/stream paths) and an OTAFIX bootloader "how-to" link for the slow-DFU case.