Skip to content

fix(meshcore): persist contact advType to meshcore_nodes (#3092)#3107

Merged
Yeraze merged 1 commit into
mainfrom
fix/meshcore-persist-contacts
May 20, 2026
Merged

fix(meshcore): persist contact advType to meshcore_nodes (#3092)#3107
Yeraze merged 1 commit into
mainfrom
fix/meshcore-persist-contacts

Conversation

@Yeraze

@Yeraze Yeraze commented May 20, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the user-visible part of #3092 by making sure the remote-telemetry scheduler can actually classify a target as a Repeater or Room Server.

Root cause

MeshCore contacts learned via the companion's advert stream were never persisted to the meshcore_nodes SQL table. The only writer to that table was setNodeTelemetryConfig, which seeds a publicKey-only row with advType=NULL. Consequently the remote-telemetry scheduler read target.advType === null, computed isRepeaterLike === false (because typeof null !== 'number'), and routed every target through the legacy LPP GetTelemetryData path — silently skipping the SendStatusReq + guest-login paths added in #3094.

HougeDK's logs in #3092 confirmed this: every telemetry-request line printed (companion: LPP) even though the targets were repeaters.

Fix

Three complementary edits:

  1. meshcoreManager.handleBridgeEvent — call a new persistContact helper from the contact_advertised, contact_added, and contact_path_updated handlers so every advert update mirrors to meshcore_nodes via databaseService.meshcore.upsertNode.
  2. meshcoreManager.refreshContacts — bulk-persist all contacts when a full refresh runs. This backfills existing deployments without requiring users to re-toggle telemetry-retrieval on each node.
  3. meshcoreRoutes PATCH .../telemetry-config — look up the in-memory contact via the new MeshCoreManager.getContact getter and upsert advType/advName immediately, so a node that gets enabled before its first advert refresh still picks up the right classification on the very next scheduler tick.

Persistence at the event level is fire-and-forget (failures logged, never thrown) so a transient DB error on a single advert can't break the contact-event pipeline. The full-refresh path awaits so the "Refreshed N contacts" log stays accurate.

Tests

  • New meshcoreManager.contactPersistence.test.ts covers all three event types, the contact_path_updated preserves-advType invariant, the DB-failure swallow path, and the new getContact getter (5 tests).
  • New PATCH /api/sources/test-source/meshcore/nodes/:publicKey/telemetry-config block in meshcoreRoutes.test.ts covers both the contact-known backfill path AND the contact-unknown skip path (2 tests).
  • Full Vitest suite: 10142 pass / 1 pre-existing flake (mqttBrokerManager accept-client test fails identically on clean main).

Test plan

  • Pull this branch on a MeshCore-only deployment with at least one repeater enabled for telemetry.
  • Confirm scheduler logs (repeater: status + LPP) instead of (companion: LPP) for repeater targets on the next tick.
  • Confirm SendStatusReq status fields (battery, uptime, queue len, RSSI, etc.) start landing as mc_status_* telemetry rows.
  • Confirm a setNodeTelemetryConfig PATCH for a freshly-seen contact backfills advType in the same request (no need to wait for next refresh).
  • Confirm an existing deployment with advType=NULL rows gets backfilled automatically on the next refreshContacts cycle.

🤖 Generated with Claude Code

MeshCore contacts learned via the companion's advert stream were never
persisted to the `meshcore_nodes` SQL table. The only writer to that
table was `setNodeTelemetryConfig`, which seeds a publicKey-only row
with `advType=NULL`. As a result the remote-telemetry scheduler always
read `target.advType === null`, computed `isRepeaterLike === false`
(typeof null !== 'number'), and routed every target through the
legacy LPP `GetTelemetryData` path — silently skipping the
`SendStatusReq` + guest-login paths added in #3094.

In production this meant: enabling Telemetry Retrieval on a Repeater
or Room Server would only ever produce timeouts on a node whose
firmware doesn't anonymously answer the LPP request.

Three complementary edits make the table accurate enough for the
scheduler (and any future per-source consumer) to classify targets
correctly:

1. `meshcoreManager.handleBridgeEvent` — call a new `persistContact`
   helper from the `contact_advertised`, `contact_added`, and
   `contact_path_updated` handlers so every advert update mirrors to
   `meshcore_nodes` via `databaseService.meshcore.upsertNode`.
2. `meshcoreManager.refreshContacts` — bulk-persist all contacts when
   a full refresh runs. This backfills existing deployments without
   requiring the user to re-toggle telemetry-retrieval on each node.
3. `meshcoreRoutes` PATCH `.../telemetry-config` — look up the
   in-memory contact via the new `MeshCoreManager.getContact` getter
   and upsert advType/advName immediately, so a node that gets
   enabled before its first advert refresh still picks up the right
   classification on the very next scheduler tick.

Persistence is fire-and-forget at the event level (failures logged,
never thrown) so a transient DB error on a single advert can't break
the contact-event pipeline. The full-refresh path awaits to keep the
"Refreshed N contacts" log accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: fix(meshcore): persist contact advType to meshcore_nodes

Review Progress:

  • Read and analyze implementation changes
  • Review test coverage and quality
  • Check for potential bugs and security issues
  • Evaluate performance considerations
  • Provide comprehensive feedback

Summary

This PR successfully addresses issue #3092 by implementing proper persistence of MeshCore contact metadata to the meshcore_nodes database table. The fix ensures that remote telemetry scheduling can correctly classify nodes as Repeaters or Room Servers instead of defaulting to the legacy LPP-only path.

Code Quality ✅

Excellent implementation with several strengths:

Implementation Design

  • Well-architected solution: Three complementary approaches (event-level persistence, bulk refresh, and route-side backfill) provide comprehensive coverage
  • Fire-and-forget pattern: The persistContact method properly logs errors without throwing, preventing contact pipeline disruption (meshcoreManager.ts:658-678)
  • Defensive programming: Extensive null-checking and fallback values throughout
  • Clear separation of concerns: Database operations isolated in dedicated helper methods

Error Handling

  • Robust error handling: persistContact catches and logs failures without breaking the contact event pipeline (meshcoreManager.ts:673-677)
  • Non-blocking design: Database failures don't prevent in-memory contact updates
  • Detailed logging: Error messages include truncated public keys for debugging

Code Style

  • Consistent patterns: Follows existing codebase conventions
  • Good documentation: Comprehensive JSDoc comments explaining the fix rationale
  • Proper async/await usage: Correct handling of Promise chains

Test Coverage ✅

Comprehensive test suite covering all critical paths:

Contact Persistence Tests (meshcoreManager.contactPersistence.test.ts)

  • ✅ Event-level persistence for contact_advertised and contact_added
  • ✅ Preservation of advType during contact_path_updated events
  • ✅ Error resilience (DB failures don't break contact updates)
  • ✅ New getContact() getter functionality
  • ✅ Proper async test patterns with microtask queue flushing

Route Tests (meshcoreRoutes.test.ts)

  • ✅ Contact backfill before telemetry config seeding
  • ✅ Correct call ordering verification
  • ✅ Graceful handling when contact not in memory
  • ✅ Proper mocking of database services

Security Considerations ✅

No security concerns identified:

  • Uses existing validated database service methods
  • Follows established authentication/authorization patterns
  • Input validation maintained for all user-facing endpoints
  • No new attack vectors introduced

Performance Considerations ✅

Well-optimized implementation:

Database Operations

  • Efficient upserts: Uses existing upsertNode method, avoiding unnecessary queries
  • Minimal overhead: Fire-and-forget pattern prevents blocking contact pipeline
  • Batch operations: refreshContacts uses Promise.all for parallel execution

Memory Usage

  • Reasonable caching: Contact data already cached in-memory for functionality
  • No memory leaks: Proper cleanup in disconnect scenarios

Network Impact

  • No additional RF operations: All changes are local database operations
  • Preserves existing throttling: Remote telemetry scheduling unchanged

Potential Issues (Minor) ⚠️

1. Race Condition (Low Risk)

Location: meshcoreRoutes.ts:887-907

const contact = manager.getContact(publicKey);
if (contact) {
  await databaseService.meshcore.upsertNode(/* ... */);
}
await databaseService.meshcore.setNodeTelemetryConfig(/* ... */);

Issue: Theoretical race where contact is removed between getContact and upsertNode
Impact: Minimal - would just skip the backfill, main operation continues
Mitigation: Acceptable as-is given low probability and non-critical impact

2. Bulk Operation Failure Handling

Location: meshcoreManager.ts:1126-1128
Issue: Promise.all means one contact failure could prevent others from persisting
Suggestion: Consider Promise.allSettled for more resilient bulk operations
Impact: Low - refreshContacts typically called on stable connections

Recommendations ✅

Code Quality

  1. Consider Promise.allSettled: For bulk contact persistence in refreshContacts to handle partial failures more gracefully
  2. Add metrics: Consider adding counters for persistence success/failure rates for monitoring

Documentation

  • The PR description excellently explains the root cause and fix
  • Code comments are comprehensive and helpful
  • Test descriptions clearly explain the regression being prevented

Technical Accuracy ✅

Implementation correctly addresses the root cause:

  • ✅ Identifies the real issue (contacts never persisted to SQL)
  • ✅ Implements comprehensive solution (event-level + bulk + route-side)
  • ✅ Maintains backward compatibility
  • ✅ Preserves existing error handling patterns
  • ✅ Follows multi-source architecture correctly

Verdict: APPROVED

This is a high-quality fix that:

  • Correctly identifies and addresses the root cause
  • Implements a robust, multi-layered solution
  • Includes comprehensive test coverage
  • Follows established code patterns and conventions
  • Has minimal performance impact
  • Introduces no security concerns

The implementation demonstrates excellent engineering practices with proper error handling, comprehensive testing, and clear documentation. The fix will resolve the telemetry scheduling classification issue while maintaining system reliability.

Job run

@Yeraze Yeraze merged commit 0a3a839 into main May 20, 2026
19 checks passed
@Yeraze Yeraze deleted the fix/meshcore-persist-contacts branch May 20, 2026 17:02
Yeraze added a commit that referenced this pull request May 20, 2026
* chore(release): bump version to 4.6.3

Five user-visible files bumped per the CLAUDE.md version recipe:
package.json, package-lock.json (regenerated), helm/meshmonitor/Chart.yaml,
desktop/src-tauri/tauri.conf.json, desktop/package.json. CLAUDE.md
banner line bumped to match.

CHANGELOG entry covers the five PRs since 4.6.2-1:
- #3105 unified tapback metadata fix
- #3106 docs: drop worktree restriction
- #3107 meshcore contact advType persistence
- #3108 MQTT channel permissions via channel_database
- #3109 hint banner Catppuccin restyle
- #3110 node.channel ingest + traceroute/neighbor channel gate

Companion blog post (docs/blog/2026-05-20-v4.6.3-permissions.md)
walks operators through the new Virtual Channel Permissions flow,
the map-visibility behavior change, and the floating-lines fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): regenerate desktop/package-lock.json for 4.6.3

The desktop sub-project carries its own lockfile and the bump to 4.6.3
left it pinned to 4.6.1. The Windows Desktop CI job runs `npm install`
without `--legacy-peer-deps` and fails on the package.json /
package-lock.json version mismatch. Regenerate to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant