Pr1 nodedb warmstore#10705
Conversation
⚡ Try this PR in the Web FlasherWarning This is an automated, unreviewed CI test build. Back up your device configuration Supported boards built by this PR (24)
Build artifacts expire on 2026-07-18. Updated for |
There was a problem hiding this comment.
Pull request overview
This PR reworks NodeDB into a tiered storage model by adding a warm (“long-tail”) node tier that preserves minimal identity (notably PKI public keys) for nodes evicted from the hot NodeInfoLite store, and tightens retention/eviction rules for protected nodes (favorite/ignored/verified). It also adds a raw-flash persistence backend for the warm tier on nRF52840 and caps satellite-map growth to bound memory and nodes.db size.
Changes:
- Add
WarmNodeStore(RAM + persistence) and integrate it intoNodeDBload/evict/cleanup flows, plusRouterPKI key resolution viaNodeDB::copyPublicKey(). - Enforce satellite-map caps (
MAX_SATELLITE_NODES) and protected-node cap (MAX_NUM_NODES - 2) with user-facing warnings in some admin/UI paths. - Add unit tests for warm-tier policy and blocked/protected-node eviction/migration behavior; add nRF52840 linker/guard tooling to reserve flash for the warm ring.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| variants/nrf52840/nrf52840.ini | Force capped linker script for S140 v6 boards to keep the warm raw-flash region clear. |
| variants/nrf52840/nrf52.ini | Add post-link guard script to prevent images from overlapping the warm-store flash reservation. |
| extra_scripts/nrf52_warm_region.py | Post-link nm-based guard ensuring the firmware image ends below the reserved warm-store region. |
| src/platform/nrf52/nrf52840_s140_v6.ld | New linker script variant capping FLASH length to protect the warm-store ring area (S140 v6 layout). |
| src/platform/nrf52/nrf52840_s140_v7.ld | Cap FLASH length for S140 v7 layout to protect the warm-store ring area. |
| src/mesh/WarmNodeStore.h | Define warm-tier entry format, policies, and nRF52840 ring layout constants. |
| src/mesh/WarmNodeStore.cpp | Implement warm-tier admission/eviction, persistence (raw-flash ring vs /prefs/warm.dat), and replay. |
| src/mesh/NodeDB.h | Add warm-tier member + copyPublicKey(), setProtectedFlag(), and related helpers. |
| src/mesh/NodeDB.cpp | Integrate warm tier into eviction/migration/cleanup/reset flows; add satellite caps; add protected-cap enforcement. |
| src/mesh/Router.cpp | Resolve PKI keys via NodeDB::copyPublicKey() so long-tail nodes can still encrypt/decrypt DMs. |
| src/modules/AdminModule.cpp | Route favorite/ignore through setProtectedFlag(); allow blocking unknown nodes via getOrCreateMeshNode(). |
| src/graphics/draw/MenuHandler.cpp | Use setProtectedFlag() for ignore toggling (cap-aware). |
| src/mesh/mesh-pb-constants.h | Set new defaults for MAX_NUM_NODES, add MAX_SATELLITE_NODES, and define WARM_NODE_COUNT per platform. |
| src/mesh/generated/meshtastic/deviceonly.pb.h | Add persisted snr_q4 field to NodeInfoLite generated struct. |
| test/test_warm_store/test_main.cpp | New unit tests for warm-store admission/eviction/take/persistence behavior. |
| test/test_nodedb_blocked/test_main.cpp | New unit tests for hot-store migration and favorite/ignored retention + protected cap behavior. |
7c1e1c3 to
17ada5b
Compare
Firmware Size Report22 targets | vs
Show 17 more target(s)
Updated for ca08228 |
0770379 to
20114e1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/mesh/NodeDB.cpp:3252
- NodeDB::getOrCreateMeshNode() will happily create an entry for NodeNum 0. Across the codebase, node num 0 is used as a sentinel for “local” (e.g. MeshPacket.from == 0), so allowing 0 into the hot store can create a bogus protected/ignored node, consume a slot, and potentially trigger avoidable evictions (now that AdminModule can create nodes by ID). It should defensively reject n==0 up front.
meshtastic_NodeInfoLite *NodeDB::getOrCreateMeshNode(NodeNum n)
{
meshtastic_NodeInfoLite *lite = getMeshNode(n);
if (!lite) {
…ty retention)
Introduces a tiered NodeDB so the device retains identity (public key,
last_heard) for far more nodes than fit in the full-record hot store,
without growing heap or the persisted nodes.proto unboundedly.
- Hot store: full NodeInfoLite, MAX_NUM_NODES (120 on nRF52).
- Satellite maps: position/telemetry/environment/status capped at
MAX_SATELLITE_NODES (40 freshest); eviction via enforceSatelliteCaps /
evictSatelliteOverCap.
- Warm tier (WarmNodeStore): 40 B {num,last_heard,public_key} records for
evicted nodes so DMs to/from long-tail nodes keep encrypting/decrypting.
Persisted to /prefs/warm.dat, or on nRF52840 a dedicated 12 KB raw-flash
record-ring below LittleFS (3x4 KB pages; see linker scripts + the
nrf52_warm_region.py post-link guard).
NodeDB::getOrCreateMeshNode now demotes evicted nodes into the warm tier and
re-admits them (restoring key/last_heard). Router PKI decrypt/encode resolve
the peer key via NodeDB::copyPublicKey (hot store, then warm tier).
NodeInfoLite gains snr_q4 (sint32, Q4-encoded dB); the float snr is zeroed on
disk. NodeInfoLite grows 105 -> 112 B; backup 2432 -> 2468 B.
Note: the snr_q4 .proto change still needs to land in the protobufs submodule
(generated header is updated here; submodule pointer left at upstream).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardens how ignored/favourite nodes are received over admin and retained, closing paths where a block could be lost or accidentally cleared. - Blocking keeps the node's public key (admin set_ignored_node and addFromContact no longer zero it / drop the warm-tier key), so a blocked peer stays a verifiable identity. - set_ignored_node creates the node if absent, so a block by node ID sticks even for a node we've never heard from (e.g. pushed by a remote admin) with no NodeInfo or key. - Eviction protection (favourite/ignored/manually-verified) now also applies to the load-time hot-store migration and is never undone by cleanupMeshDB, which previously purged ignored nodes that lacked user info. - The hot-store migration leaves our own node (index 0) in place and prefers to demote non-protected nodes, like the runtime eviction scan. Caps the protected set (favourite + ignored + verified) at MAX_NUM_NODES-2 via NodeDB::setProtectedFlag(), so at least two evictable slots always remain and getOrCreateMeshNode can always make room — replacing the previous unconditional append that could run off the end of the node vector when every node was protected. A locally-set favourite/ignore that hits the cap reports back to the phone via a ClientNotification. Adds test_nodedb_blocked covering the migration, favourite/ignored eviction protection, ignored-survives-cleanup, and the protected-node cap. The maintenance methods stay private in production; the test reaches them through a PIO_UNIT_TESTING-guarded friend shim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> # Conflicts: # src/mesh/NodeDB.h
Zero-initialise `stranded[]` and `seqs[]/order[]` VLAs so cppcheck can verify there are no unguarded reads of uninitialised memory (the guards exist but are not visible to static analysis). Mark two local pointers `const` where the pointed-to entry is never mutated after assignment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Style/cleanup pass over the branch (no behavior change except the noted preprocessor simplifications, which are semantically identical): - Comments: move function descriptions to the headers, cap in-function comments at ~3-4 lines, drop leading-number step markers, label stacked #endif blocks, de-decorate banner comments. - dumpToLog: fully gate decl + definition + AdminModule call site behind MESHTASTIC_NODEDB_MIGRATION_VERBOSE so it compiles out when disabled (~1.2 KB when off). - mesh-pb-constants: drop the dead nRF52832 WARM_NODE_COUNT branch and trim the macro docs. - WarmNodeStore: simplify the redundant `ARCH_NRF52 && NRF52840_XXAA` guards to `NRF52840_XXAA`, add a kNoPage sentinel for the ring page state. - Shorten the always-on LOG_WARN strings (~120 B flash). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: NomDeTom <116762865+NomDeTom@users.noreply.github.com>
…dedb) - WarmNodeStore.h: default MIGRATION_VERBOSE to 0 (suppress info-level chatter on production builds; opt in with =1) - WarmNodeStore.cpp load(): move memset to top of function so all failure paths (header-read fail, invalid header) leave entries clear - WarmNodeStore.cpp save(): replace manual spiLock lock/unlock around mkdir with LockGuard covering the full SafeFile sequence, matching the lock discipline in load() - Router.cpp: memcpy(&p->public_key.bytes, ...) -> memcpy(p->public_key.bytes, ...) — pass decayed uint8_t* rather than pointer-to-array - AdminModule.cpp: check setProtectedFlag return for PKC auto-favorite; log cap-refusal warning instead of unconditional "auto-favoriting" - nrf52_warm_region.py: error message references both v6.ld and v7.ld Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fe0bd6d to
f57e53e
Compare
…NT=0) The warmstore (meshtastic#10705) reboots RP2350/W5500 boards via the 8s hardware watchdog when a full NodeDB (120) save is followed by the extra warm.dat write. RP2350 has no dedicated branch in the per-platform WARM_NODE_COUNT selector (src/mesh/mesh-pb-constants.h) so it inherits the generic #else (320). Disable the warm tier on both W5500 variants via WARM_NODE_COUNT=0 (compiles clean thanks to the new #if WARM_NODE_COUNT > 0 guards). Validated on-hardware (wiznet_5500_evb_pico2_e22p): DB filled to 120, the exact eviction-at-full sequence fired, board survived with no watchdog reboot. See firmware#10746. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
getting hard reboots from this PR on ESP32-S3 nodes with no PSRAM like the Wireless Paper. Crash happens when connected to Bluetooth or attempting to connect. It's chocking heap @NomDeTom |
|
clanker suggests
|
…10759) * fix: right-size warm tier for constrained platforms and feed RP2040 watchdog during NodeDB save * fix: size warm tier and traffic cache per-MCU RAM, lowering no-PSRAM ESP32 (classic/S2/C3) tiers * docs: document nRF52/RP2040 #else fall-through in warm tier and TM cache cascades
* NodeDB: 3-tier node store with persistent warm tier (long-tail identity retention)
Introduces a tiered NodeDB so the device retains identity (public key,
last_heard) for far more nodes than fit in the full-record hot store,
without growing heap or the persisted nodes.proto unboundedly.
- Hot store: full NodeInfoLite, MAX_NUM_NODES (120 on nRF52).
- Satellite maps: position/telemetry/environment/status capped at
MAX_SATELLITE_NODES (40 freshest); eviction via enforceSatelliteCaps /
evictSatelliteOverCap.
- Warm tier (WarmNodeStore): 40 B {num,last_heard,public_key} records for
evicted nodes so DMs to/from long-tail nodes keep encrypting/decrypting.
Persisted to /prefs/warm.dat, or on nRF52840 a dedicated 12 KB raw-flash
record-ring below LittleFS (3x4 KB pages; see linker scripts + the
nrf52_warm_region.py post-link guard).
NodeDB::getOrCreateMeshNode now demotes evicted nodes into the warm tier and
re-admits them (restoring key/last_heard). Router PKI decrypt/encode resolve
the peer key via NodeDB::copyPublicKey (hot store, then warm tier).
NodeInfoLite gains snr_q4 (sint32, Q4-encoded dB); the float snr is zeroed on
disk. NodeInfoLite grows 105 -> 112 B; backup 2432 -> 2468 B.
Note: the snr_q4 .proto change still needs to land in the protobufs submodule
(generated header is updated here; submodule pointer left at upstream).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* NodeDB: robust receive + retention for blocked (ignored) nodes
Hardens how ignored/favourite nodes are received over admin and retained,
closing paths where a block could be lost or accidentally cleared.
- Blocking keeps the node's public key (admin set_ignored_node and
addFromContact no longer zero it / drop the warm-tier key), so a blocked
peer stays a verifiable identity.
- set_ignored_node creates the node if absent, so a block by node ID sticks
even for a node we've never heard from (e.g. pushed by a remote admin) with
no NodeInfo or key.
- Eviction protection (favourite/ignored/manually-verified) now also applies to
the load-time hot-store migration and is never undone by cleanupMeshDB, which
previously purged ignored nodes that lacked user info.
- The hot-store migration leaves our own node (index 0) in place and prefers to
demote non-protected nodes, like the runtime eviction scan.
Caps the protected set (favourite + ignored + verified) at MAX_NUM_NODES-2 via
NodeDB::setProtectedFlag(), so at least two evictable slots always remain and
getOrCreateMeshNode can always make room — replacing the previous unconditional
append that could run off the end of the node vector when every node was
protected. A locally-set favourite/ignore that hits the cap reports back to the
phone via a ClientNotification.
Adds test_nodedb_blocked covering the migration, favourite/ignored eviction
protection, ignored-survives-cleanup, and the protected-node cap. The
maintenance methods stay private in production; the test reaches them through a
PIO_UNIT_TESTING-guarded friend shim.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
# src/mesh/NodeDB.h
* fix copilot comments
* once again
* WarmNodeStore: fix cppcheck warnings (uninitvar, constVariablePointer)
Zero-initialise `stranded[]` and `seqs[]/order[]` VLAs so cppcheck can
verify there are no unguarded reads of uninitialised memory (the guards
exist but are not visible to static analysis). Mark two local pointers
`const` where the pointed-to entry is never mutated after assignment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* self-care added to assist 2.7 and 2.8 nodedb migration
* Tidy warm-store/self-care: comments, guards, log + flash cleanup
Style/cleanup pass over the branch (no behavior change except the noted
preprocessor simplifications, which are semantically identical):
- Comments: move function descriptions to the headers, cap in-function
comments at ~3-4 lines, drop leading-number step markers, label stacked
#endif blocks, de-decorate banner comments.
- dumpToLog: fully gate decl + definition + AdminModule call site behind
MESHTASTIC_NODEDB_MIGRATION_VERBOSE so it compiles out when disabled
(~1.2 KB when off).
- mesh-pb-constants: drop the dead nRF52832 WARM_NODE_COUNT branch and trim
the macro docs.
- WarmNodeStore: simplify the redundant `ARCH_NRF52 && NRF52840_XXAA` guards
to `NRF52840_XXAA`, add a kNoPage sentinel for the ring page state.
- Shorten the always-on LOG_WARN strings (~120 B flash).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* more tidying up, aligning with docs and undoing other-arch regressions
* Update protobufs (meshtastic#19)
Co-authored-by: NomDeTom <116762865+NomDeTom@users.noreply.github.com>
* made the migration pathway cleareer
* address copilot review
* fixed a copilot review on a downstream PR.
* Address Copilot review comments for PR meshtastic#10705 (warmstore/nodedb)
- WarmNodeStore.h: default MIGRATION_VERBOSE to 0 (suppress info-level
chatter on production builds; opt in with =1)
- WarmNodeStore.cpp load(): move memset to top of function so all
failure paths (header-read fail, invalid header) leave entries clear
- WarmNodeStore.cpp save(): replace manual spiLock lock/unlock around
mkdir with LockGuard covering the full SafeFile sequence, matching
the lock discipline in load()
- Router.cpp: memcpy(&p->public_key.bytes, ...) -> memcpy(p->public_key.bytes,
...) — pass decayed uint8_t* rather than pointer-to-array
- AdminModule.cpp: check setProtectedFlag return for PKC auto-favorite;
log cap-refusal warning instead of unconditional "auto-favoriting"
- nrf52_warm_region.py: error message references both v6.ld and v7.ld
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* NodeDB: formatting cleanup (blank lines after preprocessor blocks)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Lukewarm store
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
…meshtastic#10705) (meshtastic#10759) * fix: right-size warm tier for constrained platforms and feed RP2040 watchdog during NodeDB save * fix: size warm tier and traffic cache per-MCU RAM, lowering no-PSRAM ESP32 (classic/S2/C3) tiers * docs: document nRF52/RP2040 #else fall-through in warm tier and TM cache cascades
NodeDB: 3-tier node store with warm tier + blocked-node retention
Reworks the NodeDB into a tiered store so the device retains identity for far more nodes than fit in the full-record store, and hardens how blocked (ignored) nodes are received and kept.
What it does
Load-time migration
Blocked-node handling
Tests:
test_warm_store,
test_nodedb_blocked (migration, favourite/ignored eviction protection, protected-cap).
Note: the snr_q4 field needs the matching deviceonly.proto change in the protobufs submodule before merge (generated header is updated here).
MCP server used to confirm migration behaviour - 150 nodes becomes 120 cleanly after startup.
🤝 Attestations
@alecperkins has tested on a heltec V4 - many thanks