mqtt: harden bridge lifecycle and packet queueing#3
Closed
robekl wants to merge 15 commits into
Closed
Conversation
Owner
|
Hi, sorry I didn't see this earlier. Looks like some good changes that were needed. I'm going to give it some testing over the next few days before merging. |
agessaman
pushed a commit
that referenced
this pull request
Mar 20, 2026
Default LNA enabled=true and fix the sleep order
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR aggregates reliability and low-memory hardening changes for MQTT bridge behavior on embedded targets, compared to
mqtt-bridge-implementation(dcbbe5a6).Focus areas:
Conceptual Changes
1) Queue packet snapshots instead of
Packet*referencesProblem
Queued MQTT work previously stored raw packet pointers that could outlive their owner lifecycle, creating stale pointer/use-after-free risk in async publish paths.
Change
Queue entries now store serialized packet bytes (
writeTo) and reconstruct temporary packets (readFrom) when processing.Why this fixes it
Queue ownership is now explicit and self-contained. Publish logic no longer depends on external packet lifetime guarantees, eliminating pointer lifetime races.
2) Make bridge begin/end lifecycle restart-safe on ESP
Problem
Restarting bridge/task paths could leak or retain stale state (event handlers, analyzer clients, buffers), and begin/end symmetry was incomplete.
Change
begin()with null checksend()end()Why this fixes it
Resource ownership and teardown are now consistent across start/stop cycles, reducing leaks and duplicate callback/resource state after bridge restarts.
3) Switch JWT JSON docs to static allocation, then harden overflow handling
Problem
JWT creation used dynamic JSON allocation, causing heap churn/fragmentation risk on constrained devices. Static capacity change introduced potential silent overflow risk if not checked.
Change
doc.overflowed()checks for header/payload builders384to512for safer optional-claim headroomWhy this fixes it
Removes heap churn while failing safely on capacity overflow. Increasing to
512reduces false failures and optional-claim truncation risk while keeping deterministic memory use.4) Remove large stack JSON fallback in packet/raw publish path
Problem
Fallback to large stack buffers under allocation failure increased stack pressure and risked instability on weak/embedded targets.
Change
Publish now requires PSRAM/heap buffer allocation and skips publish when allocation fails.
Why this fixes it
Keeps stack usage bounded and predictable in hot paths. Under pressure, behavior degrades by dropping publish rather than risking stack-related instability.
5) CLI timezone and memory command correctness fixes
Problem
get timezonematched beforeget timezone.offsetdue to prefix overlaptimezone.offsetparsing used permissive conversion (atoi-style behavior), allowing malformed input to be accepted as0memorycommand referenced ESP-only APIsChange
timezone.offsetbeforetimezonestrtol+ full-string validation and range check (-12..+14)Why this fixes it
Prevents wrong key matching and malformed config acceptance. Improves command correctness across both ESP and non-ESP builds.
6) Preserve user MQTT prefs on boot; restore safe fresh-install origin default
Problem
Boot-time logic was overwriting persisted
mqtt_origin/analyzer settings, but removing it entirely could leave fresh installs with empty origin.Change
mqtt_originis empty, set it tonode_nameand persist/mqtt_prefsWhy this fixes it
Keeps user-configured values intact while still giving first-boot devices a deterministic non-empty origin.
7) Harden broker and build-flag string handling
Problem
Potential null/termination edge cases when copying broker/build-flag strings.
Change
setBroker()against null input pointers for host/user/passwordWhy this fixes it
Prevents accidental unterminated strings and null pointer misuse in config plumbing.
8) Reduce queued-packet metadata footprint
Problem
Snapshot queue correctness introduced additional per-entry memory/copy cost.
Change
Trim queue struct fields and narrow types:
uint8_tfor packet lengthuint16_tfor raw lengthWhy this fixes it
Preserves snapshot correctness while slightly reducing queue item size and copy overhead.
9) Align comments with actual publish behavior
Problem
Comments still claimed stack fallback behavior after logic changed to skip publish on alloc failure.
Change
Update comments to document real behavior.
Why this fixes it
Prevents maintenance confusion and incorrect assumptions during future debugging/refactors.
10) Check
/mqtt_prefswrite resultProblem
Config persistence writes ignored byte-count result, masking partial writes or storage faults.
Change
Verify
file.write(...)byte count equals expected struct size and log partial-write failures.Why this fixes it
Improves reliability/diagnostics for flash/filesystem failure cases and prevents silent persistence mismatch.
Notes