fix(virtual-node): stop creating zombie nodes on map (#2602)#2615
Conversation
Three independent code paths in meshtasticManager were stamping a fresh `lastHeard` on synthetic / indirectly-discovered node rows, causing them to pass the activity-time filter in `getActiveNodes` and leak to connected Meshtastic clients via `sendNodeInfosFromDb`. The mobile app then rendered them on its node list and map as "zombies" the user could not delete (they don't exist in the physical node's NodeDB). Fixes by source: 1. `processTextMessageProtobuf` broadcast pseudo-node creation: keep the `!ffffffff` row (FK target for `messages.toNodeNum`) but never stamp `lastHeard`. The row now stays out of `getActiveNodes` entirely. 2. `processTracerouteMessage` intermediate hop loop: known hops are now skipped completely (no upsert at all) so we don't accidentally clobber their `longName`/`shortName` *or* update their `lastHeard` from a relay event. Unknown hops still get a stub row so future name lookups resolve, but with NULL `lastHeard`. This intentionally reverses the behavior from #2610 — surfacing relay-only hops on the dashboard was the wrong tradeoff once we saw the same row was leaking to virtual node clients. 3. `estimateIntermediatePositions`: same fix — intermediate-hop stub rows created here for FK satisfaction also lose their `lastHeard` stamp. 4. NeighborInfo neighbors-of-neighbors creation in `processNeighborInfo`: placeholder rows for nodes the reporter has heard but the local node has not no longer get a `lastHeard` stamp. 5. `virtualNodeServer.sendNodeInfosFromDb` now scopes its query to the manager's `sourceId` (multi-source isolation) and defensively filters out any row with nodeNum 0xFFFFFFFF or nodeId `!ffffffff` to handle pre-existing installs whose broadcast row is already stamped. 6. `virtualNodeServer.handleClientMessage` intercepts AdminMessage `removeByNodenum` from connected clients, fabricates a routing-ack with errorReason=NONE so the client UI doesn't hang, and never forwards the request to the physical node. Per the issue resolution, node deletion is a MeshMonitor-only operation regardless of `allowAdminCommands`. New helper `meshtasticProtobufService.createFakeRoutingAck` builds the FromRadio response. Tests: - New `virtualNodeServer.zombieFix.test.ts` (8 tests): broadcast filter, textual !ffffffff filter, sourceId scoping, default 24h fallback, removeByNodenum intercept under both `allowAdminCommands` modes, setFavoriteNode passthrough, decode-failure fallthrough. - Rewrote `meshtasticManager.traceroute-hops.test.ts` (6 tests) to assert the new behavior: known hops untouched, unknown hops stubbed WITHOUT lastHeard, reserved values still filtered. Mocks now track upsert state so downstream `estimateIntermediatePositions` calls resolve to the existing stub instead of duplicating it. Closes #2602. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @Yeraze's task —— View job Code Review: Fix Virtual Node Zombies (#2602)✅ Overall Assessment: ExcellentThis is a well-architected fix for a complex multi-system issue. The root cause analysis is thorough, the solution is surgical, and the implementation demonstrates strong software engineering practices. 🔍 Code Quality & Best PracticesStrengths:
Minor observations:
🐛 Bug AnalysisNo bugs identified. The implementation correctly addresses all aspects of the zombie node issue:
Edge case handling:
🔒 Security AnalysisStrong security posture:
No security concerns identified. ⚡ Performance ConsiderationsPositive impact:
No performance regressions identified. 🧪 Test Coverage AnalysisExcellent coverage:
Rewritten tests properly document behavior change: 🎯 Specific Technical Feedback
🚀 RecommendationsThis PR is ready for merge. Consider these minor enhancements for future iterations:
📊 System Test Results AssessmentThe failing system tests (Quick Start, Security, Reverse Proxy) are correctly identified as unrelated to this PR's changes. The Virtual Node CLI test passed, which is the critical validation for this fix. Verdict: System test results support approval. ✅ Final Recommendation: APPROVEThis PR demonstrates excellent software engineering practices:
The zombie node issue is fully resolved without introducing regressions. |
System Test ResultsMeshMonitor System Test ResultsTest Run: 2026-04-09 14:23:40 EDT Test Summary
❌ Overall Result: FAILEDSome tests failed. Please review the failures above and fix before creating/updating PR. Failed Tests
|
System Test ResultsMeshMonitor System Test ResultsTest Run: 2026-04-09 14:42:44 EDT Test Summary
✅ Overall Result: PASSEDAll deployment configurations are working correctly! Test DetailsConfiguration Import:
Quick Start Test:
Security Test:
V1 API Test:
Reverse Proxy Test:
Reverse Proxy + OIDC Test:
Virtual Node CLI Test:
Backup & Restore Test:
Database Migration Test:
DB Backing Consistency Test:
|
…port is fresh (#3026) PR #2615 (zombie-node fix) intentionally leaves `lastHeard=NULL` on placeholder neighbor rows we've only learned about second-hand through a NeighborInfo packet. That was correct in isolation, but the read-side filter in both /api/neighbor-info (server.ts) and /api/sources/:id/neighbor-info (sourceRoutes.ts) rejects any row whose neighbor.lastHeard is NULL or older than the cutoff, so every indirect-neighbor link gets dropped on read. This change keeps the reporter freshness check (we must have directly heard from the reporting node), but falls back to the NeighborInfo record's own `timestamp` / firmware-supplied `lastRxTime` when the neighbor's `lastHeard` is missing or stale. `timestamp` is stored in ms (set via `Date.now()` in meshtasticManager), `lastRxTime` is firmware seconds — handled accordingly. Tests: - src/server/routes/sourceRoutes.neighbor-info.test.ts — fixture timestamp corrected to ms (matches real DB shape); existing staleness/null tests retargeted at the reporter side; new cases cover neighbor.lastHeard NULL + fresh NI report (kept), NULL + stale NI report (dropped), and NULL + fresh lastRxTime (kept). - src/server/server.neighbor-info-position.test.ts — inline shadow filter synced with the real handler so the test mirror does not drift. Refs #3025 Authored by Merlin 🪄 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Neighbors view (#3560) The live source map and the Map Analysis page disagreed on which neighbor links to draw. Both read the same neighbor_info table, but applied different freshness filters: - Map Analysis "Neighbors" (/api/analysis/neighbors) windows by the NeighborInfo record's own `timestamp` (report recency). - The live map (/api/sources/:id/neighbor-info and the legacy /api/neighbor-info) gated on the endpoint *nodes'* `lastHeard`, so a node's stale last-known neighbor list kept showing for as long as the node stayed otherwise active (still sending position/telemetry). That surfaced far more links than the analysis view and misrepresented how recently each RF link was actually observed. Switch the live-map endpoints to report-time freshness (record `timestamp` within maxNodeAge), matching the analysis definition. Because each NeighborInfo packet deletes-then-reinserts the reporter's full list with a fresh timestamp, a fresh record inherently means we heard the reporter within the window. Keying off the record timestamp (instead of node `lastHeard`) also naturally preserves indirect-neighbor links whose neighbor row has a null lastHeard (#3025/#2615), since lastHeard is no longer consulted. Firmware-supplied `lastRxTime` is also dropped from the predicate: it is subject to device clock skew and the analysis view does not use it, so it could otherwise reintroduce divergence. Tests updated to assert report-time semantics (stale report dropped even when endpoint nodes are still active; fresh report kept even when a node's lastHeard is stale/null). Claude-Session: https://claude.ai/code/session_01SVGxkuD4Fwa2JGVim8ZeVj Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Closes #2602.
Virtual Node Server (TCP :4404) was shipping synthetic / indirectly-discovered nodes to connected Meshtastic apps, where they showed up on the map/node list as "zombies" that could not be deleted — the apps' delete command targets the physical node's NodeDB, which never contained them in the first place.
Root cause: four independent code paths in
meshtasticManager.tswere stamping a freshlastHeard = Date.now()/1000on rows that represent nodes we have never directly heard from. That stamp made them pass thegetActiveNodesactivity filter and leak out throughvirtualNodeServer.sendNodeInfosFromDb.Changes
1. Stop stamping
lastHeardon indirect / synthetic nodes!ffffffff) — row is still created (required as FK target formessages.toNodeNum) but no longer carrieslastHeard.meshtasticManager.ts:~4230lastHeardfrom relay activity). Unknown hops still get a stub row so name lookups resolve, but with NULLlastHeard.meshtasticManager.ts:~5300estimateIntermediatePositions— FK-satisfaction stubs created here also no longer stamplastHeard.meshtasticManager.ts:~5996lastHeard.meshtasticManager.ts:~61902. Virtual Node Server hardening
sendNodeInfosFromDbnow scopes itsgetActiveNodesquery to the manager'ssourceId(multi-source isolation for 4.0)nodeNum === 0xFFFFFFFFornodeId === '!ffffffff', even if pre-existing installs still havelastHeardstamped on the broadcast row3. Ack-and-drop
removeByNodenumWhen a connected client sends an
AdminMessage { removeByNodenum }:allowAdminCommandsmeshtasticProtobufService.createFakeRoutingAck()builds aFromRadio { MeshPacket { Data { portnum=ROUTING_APP, Routing { errorReason=NONE }, requestId } } }This matches the project's existing precedent (issue comment): "Reset Node DB" is already disabled on the VN path because of the damage it can cause. Node deletion follows the same rule — it's a MeshMonitor-UI-only operation now.
Tests
New:
src/server/virtualNodeServer.zombieFix.test.ts(8 tests, all passing)sourceIdscoping ongetActiveNodesmaxNodeAgeHourssetting missingremoveByNodenumintercept underallowAdminCommands=falseAND=truesetFavoriteNode(another admin op) is left alone → passes through normallyRewritten:
src/server/meshtasticManager.traceroute-hops.test.ts(6 tests, all passing)lastHeardfieldestimateIntermediatePositionssees the existing stub instead of creating a duplicate (real DB behavior)Full vitest suite: 4179 passed, 0 failed (213 test files, ~4min)
TypeScript
tsc --noEmit: cleanSystem tests
./tests/system-tests.shresults:tests/test-quick-start.sh:393:Expected Channel 0 to be unnamed, got 'primary') — no channel-handling code was touched in this PRtests/system-tests.sh:211)No response received after 3 attemptsnginx/proxy layer — no HTTP routing code was touched in this PRThe Virtual Node CLI system test (the one that directly covers the changed surface) passed. The 3 failing tests exercise code paths that are not in this PR's diff (channel name parsing, nginx reverse proxy HTTP response). Flagging them here rather than silently proceeding — happy to investigate in a separate PR if desired.
Test plan
!ffffffff, NeighborInfo placeholders, or traceroute-relay-only stubslastHeardvia the normalprocessMeshPacketpath)lastHeardand do NOT appear in the VN's node list🤖 Generated with Claude Code