Skip to content

fix(meshcore): node list collapsing to a single node with multiple sources#3539

Merged
Yeraze merged 1 commit into
mainfrom
fix/meshcore-node-list-shows-one
Jun 18, 2026
Merged

fix(meshcore): node list collapsing to a single node with multiple sources#3539
Yeraze merged 1 commit into
mainfrom
fix/meshcore-node-list-shows-one

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

With multiple MeshCore sources, one source's node list could intermittently collapse to just the local node (e.g. one source showing 95 nodes, another showing 1) even though the device had — and the database had persisted — dozens of contacts. This was a read-path bug, not a data/processing bug: both sources' meshcore_nodes rows were present and fresh in the DB.

Diagnosed against a live two-MeshCore-source instance: the DB held 94 and 95 nodes for the two sources respectively, both freshly written, yet the UI served 1 for one of them.

Root cause

Two independent issues combined:

  1. refreshContacts() wiped the in-memory contact map on a transient empty read. It called this.contacts.clear() on any response.success && Array.isArray(response.data) — including a successful-but-empty get_contacts, which a busy companion can return right after a reconnect or mid path-refresh (these refreshes are debounced off every PathUpdated push). The list then stayed empty until adverts slowly refilled it.
  2. getAllNodes() served only the volatile in-memory map (this.localNode + this.contacts), never the durable per-source DB rows. So a momentarily-empty map collapsed the served list to a single node.

Changes

  • MeshCoreManager.getAllNodes() is now async and merges the durable per-source meshcore_nodes rows with the live in-memory contacts: DB rows are the base (so the list survives in-memory churn), in-memory contacts overlay them for freshness (rssi/snr/position), and DB-only fields (battery/uptime/radio) are preserved through the merge. The live local node is listed first and de-duplicated against its persisted row.
  • refreshContacts() now only clears+replaces when get_contacts returns at least one entry — an empty response no longer wipes the known list.
  • New per-source repo read MeshCoreRepository.getNodesBySource(sourceId) (the existing getAllNodes() was global/unscoped).
  • Updated all callers of the now-async manager getAllNodes() to await: meshcoreRoutes (/nodes, /status), sourceRoutes (node count + map), unifiedRoutes (aggregate counts), and the server-side map aggregation in server.ts.

Issues Resolved

None filed — reported directly. Relates to the multi-source MeshCore work.

Documentation Updates

  • CHANGELOG.md — Unreleased → Bug Fixes entry.

Testing

  • Full unit suite passes (7381 tests, 0 failures)
  • TypeScript compiles cleanly
  • New src/server/meshcoreManager.getAllNodes.test.ts (6 tests): DB-backed list when in-memory empty; in-memory overlay without duplication preserving DB-only fields; live local node first + de-duped; DB-failure fallback to in-memory; empty get_contacts does not wipe; non-empty get_contacts still replaces.
  • Reviewer / maintainer: on a multi-MeshCore-source deployment, confirm each source's node list reflects its persisted meshcore_nodes count and no longer drops to 1 after a reconnect.

🤖 Generated with Claude Code

…urces

With multiple MeshCore sources, one source's node list could show only the
local node while the DB held all its contacts. Two causes:

- refreshContacts() cleared the in-memory contact map on ANY successful
  get_contacts response, including a transient empty read on a busy
  companion — wiping the known list until adverts refilled it.
- getAllNodes() served only that volatile in-memory map, so a momentarily
  empty map collapsed the served list to just the local node.

Fix:
- getAllNodes() now merges durable per-source meshcore_nodes rows with the
  live in-memory contacts (in-memory wins for freshness; DB-only fields like
  battery/uptime/radio preserved) and lists the live local node first.
- refreshContacts() only clears+replaces when get_contacts returns entries.
- Add per-source repo read getNodesBySource(sourceId).
- Callers of the now-async getAllNodes() updated to await
  (meshcore/source/unified routes, server map aggregation).

Adds regression tests for the empty-response and DB-merge paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGxkuD4Fwa2JGVim8ZeVj
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: fix(meshcore): node list collapsing to a single node with multiple sources

Tasks:

  • Gather context and understand the PR
  • Read changed files
  • Review core logic in meshcoreManager.ts
  • Review new test file
  • Review repository and route changes
  • Post comprehensive review

Summary

This is a well-diagnosed and clearly written fix. The root cause analysis is accurate, the solution is appropriately layered (DB as durable base + in-memory overlay), and the test coverage directly targets the two failure modes. A few observations follow.


Core Logic (meshcoreManager.ts)

refreshContacts() guard (src/server/meshcoreManager.ts:1741) — the fix (response.data.length > 0) is correct. One subtle edge case worth noting: if a companion legitimately has zero contacts (brand-new device, or after a full reset), a real empty get_contacts response will permanently be ignored and the in-memory map will never be cleared. This is probably acceptable in practice since it only matters in that narrow window before any advert is received, and the DB would also be empty in that case. But it's worth a comment so future maintainers don't treat the guard as "never clear on empty" without understanding the trade-off.

getAllNodes() merge (src/server/meshcoreManager.ts:3456–3512) — the logic is solid:

  • DB rows form the durable base ✓
  • In-memory overlay wins for freshness ✓
  • DB-only fields (batteryMv, uptimeSecs, radio params) preserved via spread-then-overlay ✓
  • Local node deduplicated correctly ✓
  • DB failure falls back gracefully ✓

Minor observation (src/server/meshcoreManager.ts:3462): the local node is skipped from the DB base with if (n.isLocalNode) continue. This works, but it means if this.localNode is null at call time (e.g. before the initial handshake completes), the persisted local node row is silently dropped from the output entirely rather than being included as a fallback. Depending on the reconnect timing this could cause the local node to briefly disappear from the list. Consider including it from DB when this.localNode is null.

In-memory merge (src/server/meshcoreManager.ts:3492–3503): batteryMv and uptimeSecs are intentionally preserved from base (the DB row) since contacts don't carry them, but txPower, maxTxPower, and the radio params are NOT carried over from base in the in-memory overlay branch. They're set from DB in the first loop, then the spread-and-set in the overlay loop does ...base first, so they survive the spread — but only if the contact was already in the DB. If a fresh in-memory-only contact arrives that was never persisted, those fields are simply absent (undefined), which is correct. No bug here, but worth confirming the spread order is intentional.


Repository (src/db/repositories/meshcore.ts)

getNodesBySource() (src/db/repositories/meshcore.ts:168–176) — clean, properly scoped, matches the pattern of the existing getNodeByPublicKeyAndSource. Good addition.

getLocalNode() (src/db/repositories/meshcore.ts:216–224) — this existing method is not scoped by sourceId. It returns the first isLocalNode=true row it finds globally. With multiple MeshCore sources this would return an arbitrary source's local node. This is a pre-existing issue and out of scope for this PR, but worth flagging so it's tracked.


Tests (src/server/meshcoreManager.getAllNodes.test.ts)

The six tests are well-targeted and cover:

  1. DB-backed list when in-memory is empty ✓
  2. In-memory overlay without duplication + DB-only field preservation ✓
  3. Live local node first + de-duped from DB ✓
  4. DB failure falls back to in-memory ✓
  5. Empty get_contacts does not wipe ✓
  6. Non-empty get_contacts still replaces ✓

One gap: There is no test for the case where this.localNode is null but a isLocalNode=true row exists in the DB. Given the observation above about the local node being silently dropped in that case, a test documenting the current behavior (or the desired fallback) would be valuable.

flushMicrotasks (src/server/meshcoreManager.getAllNodes.test.ts:58–61) — the double-Promise.resolve() flush is a reasonable approach for settling async event handlers, but it's fragile if the contact persistence path chains more than two microtask hops. The tests pass currently, but this is worth a note if they become flaky.


Route callers

All four callers (meshcoreRoutes, sourceRoutes, unifiedRoutes, server.ts) correctly await the now-async getAllNodes(). No issues.

Performance note (src/server/routes/sourceRoutes.ts:819, src/server/routes/unifiedRoutes.ts:1245): getAllNodes() now issues a DB query on every call. The source-list endpoint calls it in a loop over all MeshCore sources to compute node counts. For deployments with many MeshCore sources (e.g. 10+) this becomes O(N) DB queries. Not a regression from the fix itself (the previous sync path was O(1)), but worth tracking. A getNodeCountBySource() query could make those count-only call sites cheaper if it ever becomes a bottleneck.


Overall

The fix is correct and well-tested. The two main observations (local node dropped when this.localNode is null; getLocalNode() is unscoped) are pre-existing or edge cases that are out of scope for this PR. Recommend addressing the local-node null case in a follow-up if reconnect behavior shows the node list flickering.

Verdict: Approve with the above non-blocking notes.

@Yeraze Yeraze merged commit 5773c6f into main Jun 18, 2026
19 checks passed
@Yeraze Yeraze deleted the fix/meshcore-node-list-shows-one branch June 18, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant