fix: right-size warm tier for constrained platforms (#10746, #10705)#10759
Conversation
…atchdog during NodeDB save
⚡ Try this PR in the Web FlasherNote Building this pull request… the flash button, badges and supported-board |
…ESP32 (classic/S2/C3) tiers
There was a problem hiding this comment.
Pull request overview
Adjusts NodeDB warm-tier sizing and related caches on constrained platforms to avoid watchdog resets (RP2040/RP2350) and reduce heap pressure (ESP32 variants), improving stability when persisting large node databases.
Changes:
- Feed the RP2040 hardware watchdog between the
nodes.protosave and the warm-tierwarm.datsave inNodeDB::saveNodeDatabaseToDisk(). - Refine
WARM_NODE_COUNTper-platform (including adding explicit RP2040 and no-PSRAM ESP32-S3/C6/P4 sizing). - Reduce
TRAFFIC_MANAGEMENT_CACHE_SIZEon no-PSRAM ESP32-S3/C6/P4 and other ESP32 builds to reclaim heap.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/mesh/NodeDB.cpp | Adds RP2040 watchdog “feed” between back-to-back NodeDB and warm-tier persistence writes. |
| src/mesh/mesh-pb-constants.h | Refines per-platform warm-tier sizing and traffic-management cache sizing for constrained targets. |
|
Overall it seems eminently sensible. I've lost track of all the different esp32 flavours that are supported. |
|
The real fix is to trigger the watchdog in between saves. BUT when researching this i stumbled across other crashes that really hit heap exhaustion. I think those values are sensible unless told otherwise :-) |
…meshtastic#10705) (meshtastic#10759) * fix: right-size warm tier for constrained platforms and feed RP2040 watchdog during NodeDB save * fix: size warm tier and traffic cache per-MCU RAM, lowering no-PSRAM ESP32 (classic/S2/C3) tiers * docs: document nRF52/RP2040 #else fall-through in warm tier and TM cache cascades
…ODE_COUNT=0) Upstream meshtastic#10809 fixes the real root cause of the full-DB-save watchdog reboot: a nested spiLock deadlock in WarmNodeStore::save() (SafeFile's ctor/close re-acquire the non-recursive spiLock already held by the LockGuard). With that merged into develop, the warm tier can run again on RP2350 via the ARCH_RP2040 branch (=150, meshtastic#10759). Drop the WARM_NODE_COUNT=0 override from both W5500 variants so they inherit the upstream fix instead of disabling the warm tier. Validated on-hardware (wiznet_5500_evb_pico2_e22p @ .236): ~18h soak with the warm tier active, 208 eviction-at-full (120 nodes) events and 44 warm.dat writes (150 warm nodes), zero watchdog reboots. Build SUCCESS (2.8.0.b5c2a81). See firmware#10809, firmware#10746. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Constrained platforms fall into the generic
WARM_NODE_COUNT 320catch-all, which is too large for them.nodes.protoandwarm.datare written back-to-back in one blocking stretch, so the loop never feeds the 8s HW watchdog and it resets on the second write. Fix:watchdog_update()between the two writes, plus an explicitARCH_RP2040branch (150).WARM_NODE_COUNT320→150 andTRAFFIC_MANAGEMENT_CACHE_SIZE1000→500 (~12 KB heap recovered).Warm tier stays enabled on all platforms; generic ESP32 unchanged.