2.8: NodeDB shrink, decoupling, and restructuring#10413
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors on-device node storage by shrinking meshtastic_NodeInfoLite (flattening identity fields + collapsing booleans into a bitfield) and moving larger per-node payloads (position/telemetry/environment/status) into “satellite” NodeDB maps, with a new PhoneAPI “gradient sync” mode that can replay those satellites as synthetic packets to the phone.
Changes:
- Refactor
NodeInfoLitestorage layout (flattened identity, bitfield flags, renamed public-key container) and update call sites accordingly. - Add NodeDB satellite maps + v24→v25 migration logic and persistence for positions/telemetry/environment/status.
- Add PhoneAPI gradient sync + replay state machine; add/adjust unit tests for TypeConversions and NodeInfoLite flag behavior.
Reviewed changes
Copilot reviewed 51 out of 55 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_type_conversions/test_main.cpp | New unit tests for TypeConversions with slim NodeInfoLite + bitfield semantics. |
| test/test_traffic_management/test_main.cpp | Update tests to use NODEINFO_BITFIELD_HAS_USER_MASK + flattened name fields. |
| test/test_crypto/test_main.cpp | Update PKI key container type used in crypto tests. |
| src/serialization/MeshPacketSerializer.cpp | Update traceroute route-name emission to new NodeInfoLite fields/flags. |
| src/serialization/MeshPacketSerializer_nRF52.cpp | Same as above for nRF52 JSON serializer variant. |
| src/mqtt/MQTT.cpp | Switch “known node” checks to nodeInfoLiteHasUser() helpers. |
| src/modules/WaypointModule.cpp | Fetch own position via NodeDB satellite accessor. |
| src/modules/TrafficManagementModule.cpp | Update cached-user lookup to use new TypeConversions::ConvertToUser(node) path. |
| src/modules/TraceRouteModule.cpp | Use flattened short_name/long_name fields and HAS_USER bit. |
| src/modules/Telemetry/EnvironmentTelemetry.cpp | Cache received environment metrics into NodeDB satellite map. |
| src/modules/StatusMessageModule.h | Remove per-module recent-status FIFO; redirect to NodeDB status cache. |
| src/modules/StatusMessageModule.cpp | Store received status in NodeDB via setNodeStatus(). |
| src/modules/SerialModule.cpp | Use getNodePosition() + flattened names when emitting WPL/text sender names. |
| src/modules/RoutingModule.cpp | Simplify KNOWN_ONLY rebroadcast logic using nodeInfoLiteHasUser(). |
| src/modules/RangeTestModule.cpp | Update range-test CSV to use satellite position + flattened long name. |
| src/modules/PositionModule.cpp | Read cached self position via NodeDB satellite accessor; remove direct node->position usage. |
| src/modules/KeyVerificationModule.cpp | Use flattened long name and flattened public key storage. |
| src/modules/ExternalNotificationModule.cpp | Replace direct bit checks with nodeInfoLiteIsMuted() helper. |
| src/modules/CannedMessageModule.cpp | Update PKI key presence checks + name rendering to flattened fields/helpers. |
| src/modules/AdminModule.cpp | Convert favorite/ignored/muted operations to bitfield helpers + satellite erasure/updatePosition(). |
| src/mesh/TypeConversions.h | Add explicit-args ConvertToNodeInfo() overload + thin NodeInfo conversion; new user conversion helpers. |
| src/mesh/TypeConversions.cpp | Implement new NodeInfo/User conversion logic for slim headers + satellites. |
| src/mesh/Router.cpp | Update favorite/ignored/has_user checks and PKI pubkey access for slim headers. |
| src/mesh/ReliableRouter.cpp | Update PKI unknown-pubkey detection for slim header pubkey. |
| src/mesh/ProtobufModule.h | Update sender short-name lookup to flattened short_name. |
| src/mesh/PhoneAPI.h | Add gradient sync nonces + replay queue/state + helper APIs. |
| src/mesh/PhoneAPI.cpp | Implement gradient sync: thin NodeInfo emission + satellite replay state machine. |
| src/mesh/NodeDB.h | Add satellite maps + accessors + bitfield masks/helpers + version bump. |
| src/mesh/NodeDB.cpp | Implement satellite storage, migration, save/load projection, and updated node cleanup logic. |
| src/mesh/MeshService.cpp | Ensure a self position slot exists in position DB; update fixed-position read path. |
| src/mesh/mesh-pb-constants.h | Tighten NodeInfoLite size static_assert; add build flags to exclude satellite DBs on STM32WL. |
| src/mesh/http/ContentHandler.cpp | Update /nodes JSON endpoint to flattened fields + satellite position accessor. |
| src/mesh/generated/meshtastic/deviceonly.pb.h | Regenerated nanopb headers for new NodeInfoLite + NodeDatabase satellite arrays. |
| src/mesh/generated/meshtastic/deviceonly.pb.cpp | Bind nanopb descriptors for new satellite entry messages. |
| src/mesh/generated/meshtastic/deviceonly_legacy.pb.h | Add legacy nanopb descriptor used for v24→v25 migration decoding. |
| src/mesh/generated/meshtastic/deviceonly_legacy.pb.cpp | Bind nanopb descriptors for legacy migration messages. |
| src/mesh/CryptoEngine.h | Update PKI APIs to accept NodeInfoLite public_key container type. |
| src/mesh/CryptoEngine.cpp | Implement signature updates for new PKI key container type. |
| src/graphics/Screen.cpp | Update favorite-node frame selection + sender long-name access to new helpers/fields. |
| src/graphics/niche/InkHUD/Applets/User/Heard/HeardApplet.cpp | Use satellite position accessors when computing distances. |
| src/graphics/niche/InkHUD/Applets/User/FavoritesMap/FavoritesMapApplet.cpp | Replace favorite flag checks with helper + satellite position access. |
| src/graphics/niche/InkHUD/Applets/User/DM/DMApplet.cpp | Use HAS_USER helper + flattened long name. |
| src/graphics/niche/InkHUD/Applets/User/AllMessage/AllMessageApplet.cpp | Use HAS_USER helper + flattened long name. |
| src/graphics/niche/InkHUD/Applets/System/Notification/NotificationApplet.cpp | Use HAS_USER helper for sender display. |
| src/graphics/niche/InkHUD/Applets/System/Menu/MenuApplet.cpp | Use favorite/user helpers + flattened long name in recipient UI. |
| src/graphics/niche/InkHUD/Applets/Bases/NodeList/NodeListApplet.cpp | Use satellite position accessors + flattened long name. |
| src/graphics/niche/InkHUD/Applets/Bases/Map/MapApplet.cpp | Use satellite position accessors throughout map computations. |
| src/graphics/niche/InkHUD/Applet.cpp | Update short-name parsing to new HAS_USER helper + flattened short name. |
| src/graphics/draw/UIRenderer.cpp | Read status/metrics/position from NodeDB satellites; update favorite/user helpers. |
| src/graphics/draw/NotificationRenderer.cpp | Update node picker name selection to flattened fields + HAS_USER helper. |
| src/graphics/draw/NodeListRenderer.cpp | Use NodeDB status cache + satellite position accessors; update favorite/ignored/muted checks. |
| src/graphics/draw/MessageRenderer.cpp | Update sender/recipient naming logic to flattened fields + HAS_USER helper. |
| src/graphics/draw/MenuHandler.cpp | Update manage-node menus to use bitfield helpers + flattened long name. |
| src/GPSStatus.h | Read fixed-position coordinates from NodeDB satellite position accessor. |
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/6b1d6cf6-ed6b-43b6-95cb-8e141757664e Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/6b1d6cf6-ed6b-43b6-95cb-8e141757664e Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Addressed in 796d455. I applied the requested follow-up fixes from the review thread, including the safe long-name copies, the RangeTest and sender-name null/fallback fixes, the KeyVerification null guards, the legacy NodeDatabase loadProto memset exemption, and NodeDB-backed snapshot/copy helpers for PhoneAPI replay instead of iterating the satellite maps directly. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/c76b9a5a-7244-4fbc-9ef0-98091d8caaea Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Addressed in a0daaf6. I removed the remaining unsafe satellite-map pointer access pattern by switching callers over to the locked copy helpers, and I special-cased the local node in hasValidPosition() so local position-dependent behavior still works when PositionDB is compiled out. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
|
||
| // Longitude is trickier | ||
| float lngNode = node->position.longitude_i * 1e-7; | ||
| float lngNode = pos->longitude_i * 1e-7; |
| @@ -153,22 +154,100 @@ uint32_t get_st7789_id(uint8_t cs, uint8_t sck, uint8_t mosi, uint8_t dc, uint8_ | |||
|
|
|||
| bool meshtastic_NodeDatabase_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field) | |||
| #include <pb_decode.h> | ||
| #include <pb_encode.h> | ||
|
|
||
| bool meshtastic_NodeDatabase_Legacy_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field) |
| LOG_DEBUG(src == RX_SRC_LOCAL ? "updateTelemetry LOCAL device" : "updateTelemetry REMOTE device node=0x%x", nodeId); | ||
| #if !MESHTASTIC_EXCLUDE_TELEMETRYDB | ||
| concurrency::LockGuard guard(&satelliteMutex); | ||
| nodeTelemetry[nodeId] = t.variant.device_metrics; | ||
| #endif | ||
| } else if (t.which_variant == meshtastic_Telemetry_environment_metrics_tag) { | ||
| LOG_DEBUG(src == RX_SRC_LOCAL ? "updateTelemetry LOCAL env" : "updateTelemetry REMOTE env node=0x%x", nodeId); |
| LOG_DEBUG(src == RX_SRC_LOCAL ? "updateTelemetry LOCAL device" : "updateTelemetry REMOTE device node=0x%x", nodeId); | ||
| #if !MESHTASTIC_EXCLUDE_TELEMETRYDB | ||
| concurrency::LockGuard guard(&satelliteMutex); | ||
| nodeTelemetry[nodeId] = t.variant.device_metrics; | ||
| #endif | ||
| } else if (t.which_variant == meshtastic_Telemetry_environment_metrics_tag) { | ||
| LOG_DEBUG(src == RX_SRC_LOCAL ? "updateTelemetry LOCAL env" : "updateTelemetry REMOTE env node=0x%x", nodeId); |
| int32_t getLatitude() const | ||
| { | ||
| if (config.position.fixed_position) { | ||
| meshtastic_NodeInfoLite *node = nodeDB->getMeshNode(nodeDB->getNodeNum()); | ||
| return node->position.latitude_i; | ||
| meshtastic_PositionLite pos; | ||
| return nodeDB->copyNodePosition(nodeDB->getNodeNum(), pos) ? pos.latitude_i : 0; |
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/5198da01-ec4c-4c16-8a09-68b8e6d5d410 Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Addressed in 3c569f6. I fixed the InkHUD stack-value typo, aligned both NodeDatabase callback definitions with the generated nanopb declarations, split the telemetry debug logs to avoid extra format args, and restored the fixed-position GPS getters to read from localPosition. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot fix the failing style checks |
Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/e60287ba-4ece-46e0-83d8-a6d89664c0bb Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com>
Addressed in c873833. I fixed the cppcheck style failures from the CI check jobs by making the affected node lookup temporaries const pointers and renaming the local favorite flag in NodeDB to avoid the shadowing warning. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Squash of testing-pre-squash onto fresh upstream develop (94bb21e). Carries everything that was in rc2 (16636125a) plus the seven post-rc2 commits: - Add port-368 DualbootControlModule + net-OTA build wiring - RawRecvModule: bump port-358 observation header to v2 with RF metadata - AtakPluginModule: bounds-check detail clone, fix pb_encode buffer size - pre-rc3: NodeDB one-time migration of legacy default long_name - pre-rc3: add BUILD.md, boards/README.md, monitor/README.md - test_atak: regression coverage for cloneTAKPacketData bounds-check - RadioInterface: document the RxRfMeta single-RX-flow invariant Rebase notes: - src/mesh/NodeDB.cpp: upstream meshtastic#10413 flattened meshtastic_NodeInfoLite, removing the nested user. sub-struct and the has_user bool. Owner-recovery block and the long_name rebrand migration block both updated to use the new nodeInfoLiteHasUser() accessor and the promoted top-level long_name / short_name fields. Privacy regeneration branch (#ifdef RANDOM_NODE_ID) preserved on top of the new API. - src/mesh/http/ContentHandler.cpp: trivial blank-line conflict at #endif. Build verified: pio run -e native-macos succeeds.
PR meshtastic#10413 (NodeDB shrink) introduced 4 STATE_REPLAY_* states that each transition via `return getFromRadio(buf);`. When the client did not opt into gradient sync (the standard meshtastic-python CLI path), all four replay states are no-ops but still walked recursively, producing 5-6 nested getFromRadio() frames. Each frame allocates large protobuf locals (`meshtastic_NodeInfo` / `meshtastic_MeshPacket` ~250-400 B), so on platforms with a small task stack the function prologue of the next frame faults during register save → HardFault → watchdog reset. Reproduced on RP2350 + arduino-pico FreeRTOS (CORE0 task stack hardcoded to 1024 words / 4 KB in cores/rp2040/freertos/freertos-main.cpp:149). Symptom: `meshtastic --info` (USB CDC and TCP) drops the connection immediately after the firmware logs `Done sending N of M nodeinfos`. `uxTaskGetStackHighWaterMark()` measured 292 words (1168 B) free at that point — the next recursive frame overflows. Affects any board with BLE excluded that relies on the CLI for configuration. Replace the six recursive `return getFromRadio(buf)` calls inside the state machine with `goto retry_state;` to a label placed just before the switch. The state machine now iterates within a single stack frame, preserving identical observable behavior for clients but eliminating unbounded stack growth. Also scope the LockGuard in STATE_SEND_OTHER_NODEINFOS' done-branch to its own block so it is released before re-entering the switch (a strict no-op given non-recursive concurrency::Lock semantics, but cleaner and documents intent), and add an early skip in STATE_REPLAY_POSITIONS to jump directly to STATE_SEND_FILEMANIFEST when gradient sync is off, avoiding three needless loop iterations on the legacy-client path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR meshtastic#10413 (NodeDB shrink) introduced 4 STATE_REPLAY_* states that each transition via `return getFromRadio(buf);`. When the client did not opt into gradient sync (the standard meshtastic-python CLI path), all four replay states are no-ops but still walked recursively, producing 5-6 nested getFromRadio() frames. Each frame allocates large protobuf locals (`meshtastic_NodeInfo` / `meshtastic_MeshPacket` ~250-400 B), so on platforms with a small task stack the function prologue of the next frame faults during register save → HardFault → watchdog reset. Reproduced on RP2350 + arduino-pico FreeRTOS (CORE0 task stack hardcoded to 1024 words / 4 KB in cores/rp2040/freertos/freertos-main.cpp:149). Symptom: `meshtastic --info` (USB CDC and TCP) drops the connection immediately after the firmware logs `Done sending N of M nodeinfos`. `uxTaskGetStackHighWaterMark()` measured 292 words (1168 B) free at that point — the next recursive frame overflows. Affects any board with BLE excluded that relies on the CLI for configuration. Replace the six recursive `return getFromRadio(buf)` calls inside the state machine with `goto retry_state;` to a label placed just before the switch. The state machine now iterates within a single stack frame, preserving identical observable behavior for clients but eliminating unbounded stack growth. Also scope the LockGuard in STATE_SEND_OTHER_NODEINFOS' done-branch to its own block so it is released before re-entering the switch (a strict no-op given non-recursive concurrency::Lock semantics, but cleaner and documents intent), and add an early skip in STATE_REPLAY_POSITIONS to jump directly to STATE_SEND_FILEMANIFEST when gradient sync is off, avoiding three needless loop iterations on the legacy-client path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 2.8: NodeDB refactor to decouple satellite entries and decrease size * Regen * Refactor node mute handling to use dedicated functions for clarity and consistency * Develop ref * Fix NodeDB review follow-ups Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/6b1d6cf6-ed6b-43b6-95cb-8e141757664e Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Address review validation nits Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/6b1d6cf6-ed6b-43b6-95cb-8e141757664e Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Trunk * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Extract legacy NodeDatabase migration * Fix remaining NodeDB review issues Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/c76b9a5a-7244-4fbc-9ef0-98091d8caaea Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Fixes * Trunk * Fix latest review compile follow-ups Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/5198da01-ec4c-4c16-8a09-68b8e6d5d410 Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Fix cppcheck style warnings Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/e60287ba-4ece-46e0-83d8-a6d89664c0bb Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Change pointer type for mesh node in set_favorite function * Change pointer types for mesh node references to const in multiple applets * Add NodeDB layout v25 documentation and migration guidelines * Remove tests for uninitialized PacketHistory state due to undefined behavior * Fix code block formatting in copilot instructions --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
NodeDB shrink + gradient sync
Restructured meshtastic_NodeInfoLite from a fat 166-byte all-in-one into a 105-byte identity/link-state header, with the optional payloads moved into per-NodeNum satellite maps on NodeDB: positions, device telemetry, environment telemetry, and status messages. Each map is gated by its own MESHTASTIC_EXCLUDE_*DB build flag (all default ON for STM32WL). UserLite got flattened into the header and stripped (dropped macaddr, packed seven bools into a single bitfield, shrank long_name 40→25). On-disk schema bumped to v25 with a v24→v25 boot-time migration through a parallel NodeInfoLite_Legacy descriptor.
On the wire side, capable phones opt into thin sync via two new special nonces (SPECIAL_NONCE_GRADIENT_SYNC / _ONLY_NODES) — the firmware then streams slim NodeInfo headers in the burst and replays positions / telemetry / environment / status afterward as synthetic POSITION_APP / TELEMETRY_APP / NODE_STATUS_APP packets that the phone routes through its existing per-portnum handlers. Legacy phones keep getting the bundled NodeInfo emission unchanged.