fix(virtual-node): stop creating zombie nodes on map (#2602) by Yeraze · Pull Request #2615 · Yeraze/meshmonitor

Yeraze · 2026-04-09T18:11:34Z

Summary

Closes #2602.

Virtual Node Server (TCP :4404) was shipping synthetic / indirectly-discovered nodes to connected Meshtastic apps, where they showed up on the map/node list as "zombies" that could not be deleted — the apps' delete command targets the physical node's NodeDB, which never contained them in the first place.

Root cause: four independent code paths in meshtasticManager.ts were stamping a fresh lastHeard = Date.now()/1000 on rows that represent nodes we have never directly heard from. That stamp made them pass the getActiveNodes activity filter and leak out through virtualNodeServer.sendNodeInfosFromDb.

Changes

1. Stop stamping `lastHeard` on indirect / synthetic nodes

Broadcast pseudo-node (!ffffffff) — row is still created (required as FK target for messages.toNodeNum) but no longer carries lastHeard. meshtasticManager.ts:~4230
Traceroute intermediate hops — known hops are now skipped completely (no upsert → can't clobber name OR update lastHeard from relay activity). Unknown hops still get a stub row so name lookups resolve, but with NULL lastHeard. meshtasticManager.ts:~5300
- This intentionally reverses [BUG] impossible to find node despite its presence in database. #2610's behavior of surfacing relay-only hops on the dashboard. Once the same row was leaking to virtual node clients, the tradeoff flipped: correctness on VN outweighs dashboard visibility for nodes we haven't directly heard from.
estimateIntermediatePositions — FK-satisfaction stubs created here also no longer stamp lastHeard. meshtasticManager.ts:~5996
NeighborInfo neighbors-of-neighbors — placeholder rows for nodes only the reporter has heard no longer stamp lastHeard. meshtasticManager.ts:~6190

2. Virtual Node Server hardening

sendNodeInfosFromDb now scopes its getActiveNodes query to the manager's sourceId (multi-source isolation for 4.0)
Defensive filter: never ship rows with nodeNum === 0xFFFFFFFF or nodeId === '!ffffffff', even if pre-existing installs still have lastHeard stamped on the broadcast row

3. Ack-and-drop `removeByNodenum`

When a connected client sends an AdminMessage { removeByNodenum }:

Intercepted before the existing allow/block flow, so it fires regardless of allowAdminCommands
New helper meshtasticProtobufService.createFakeRoutingAck() builds a FromRadio { MeshPacket { Data { portnum=ROUTING_APP, Routing { errorReason=NONE }, requestId } } }
Sent only to the requesting client (no broadcast ghost acks)
Never forwarded to the physical node

This matches the project's existing precedent (issue comment): "Reset Node DB" is already disabled on the VN path because of the damage it can cause. Node deletion follows the same rule — it's a MeshMonitor-UI-only operation now.

Tests

New: src/server/virtualNodeServer.zombieFix.test.ts (8 tests, all passing)
- Broadcast pseudo-node filter (by nodeNum AND nodeId)
- sourceId scoping on getActiveNodes
- Default 24h fallback when maxNodeAgeHours setting missing
- removeByNodenum intercept under allowAdminCommands=false AND =true
- setFavoriteNode (another admin op) is left alone → passes through normally
- Graceful fallthrough on AdminMessage decode failure
Rewritten: src/server/meshtasticManager.traceroute-hops.test.ts (6 tests, all passing)
- Header rewritten to document the [BUG] impossible to find node despite its presence in database. #2610 → [BUG] 3.12.0 - Virtual Node zombie nodes on map #2602 behavior reversal
- Asserts known hops are untouched (empty upsert call list)
- Asserts unknown hops get stub rows without lastHeard field
- Mocks now track upsert state so estimateIntermediatePositions sees the existing stub instead of creating a duplicate (real DB behavior)
Full vitest suite: 4179 passed, 0 failed (213 test files, ~4min)
TypeScript tsc --noEmit: clean

System tests

./tests/system-tests.sh results:

Test	Result	Related to this fix?
Configuration Import	✅	—
Quick Start	❌	No — channel 0 naming assertion (`tests/test-quick-start.sh:393`: `Expected Channel 0 to be unnamed, got 'primary'`) — no channel-handling code was touched in this PR
Security	❌	No — tied to Quick Start result (`tests/system-tests.sh:211`)
V1 API	✅	—
Reverse Proxy	❌	No — `No response received after 3 attempts` nginx/proxy layer — no HTTP routing code was touched in this PR
Reverse Proxy + OIDC	✅	—
Virtual Node CLI	✅	Yes — directly exercises this PR's code
Backup & Restore	✅	—
Database Migration	✅	—
DB Backing Consistency	✅	—
API Exercise (3 DBs: sqlite + postgres + mysql)	✅	—

The Virtual Node CLI system test (the one that directly covers the changed surface) passed. The 3 failing tests exercise code paths that are not in this PR's diff (channel name parsing, nginx reverse proxy HTTP response). Flagging them here rather than silently proceeding — happy to investigate in a separate PR if desired.

Test plan

Connect a Meshtastic app to the Virtual Node Server and confirm the map no longer contains !ffffffff, NeighborInfo placeholders, or traceroute-relay-only stubs
Attempt "delete node" from the connected app → verify the app's UI acks immediately and nothing changes on the physical device or in MeshMonitor
Confirm dashboard still shows real peers (direct packets continue to update lastHeard via the normal processMeshPacket path)
Trigger a traceroute → verify unknown hops appear in the DB with NULL lastHeard and do NOT appear in the VN's node list

🤖 Generated with Claude Code

Three independent code paths in meshtasticManager were stamping a fresh `lastHeard` on synthetic / indirectly-discovered node rows, causing them to pass the activity-time filter in `getActiveNodes` and leak to connected Meshtastic clients via `sendNodeInfosFromDb`. The mobile app then rendered them on its node list and map as "zombies" the user could not delete (they don't exist in the physical node's NodeDB). Fixes by source: 1. `processTextMessageProtobuf` broadcast pseudo-node creation: keep the `!ffffffff` row (FK target for `messages.toNodeNum`) but never stamp `lastHeard`. The row now stays out of `getActiveNodes` entirely. 2. `processTracerouteMessage` intermediate hop loop: known hops are now skipped completely (no upsert at all) so we don't accidentally clobber their `longName`/`shortName` *or* update their `lastHeard` from a relay event. Unknown hops still get a stub row so future name lookups resolve, but with NULL `lastHeard`. This intentionally reverses the behavior from #2610 — surfacing relay-only hops on the dashboard was the wrong tradeoff once we saw the same row was leaking to virtual node clients. 3. `estimateIntermediatePositions`: same fix — intermediate-hop stub rows created here for FK satisfaction also lose their `lastHeard` stamp. 4. NeighborInfo neighbors-of-neighbors creation in `processNeighborInfo`: placeholder rows for nodes the reporter has heard but the local node has not no longer get a `lastHeard` stamp. 5. `virtualNodeServer.sendNodeInfosFromDb` now scopes its query to the manager's `sourceId` (multi-source isolation) and defensively filters out any row with nodeNum 0xFFFFFFFF or nodeId `!ffffffff` to handle pre-existing installs whose broadcast row is already stamped. 6. `virtualNodeServer.handleClientMessage` intercepts AdminMessage `removeByNodenum` from connected clients, fabricates a routing-ack with errorReason=NONE so the client UI doesn't hang, and never forwards the request to the physical node. Per the issue resolution, node deletion is a MeshMonitor-only operation regardless of `allowAdminCommands`. New helper `meshtasticProtobufService.createFakeRoutingAck` builds the FromRadio response. Tests: - New `virtualNodeServer.zombieFix.test.ts` (8 tests): broadcast filter, textual !ffffffff filter, sourceId scoping, default 24h fallback, removeByNodenum intercept under both `allowAdminCommands` modes, setFavoriteNode passthrough, decode-failure fallthrough. - Rewrote `meshtasticManager.traceroute-hops.test.ts` (6 tests) to assert the new behavior: known hops untouched, unknown hops stubbed WITHOUT lastHeard, reserved values still filtered. Mocks now track upsert state so downstream `estimateIntermediatePositions` calls resolve to the existing stub instead of duplicating it. Closes #2602. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-09T18:11:52Z

Claude finished @Yeraze's task —— View job

Code Review: Fix Virtual Node Zombies (#2602)

✅ Overall Assessment: Excellent

This is a well-architected fix for a complex multi-system issue. The root cause analysis is thorough, the solution is surgical, and the implementation demonstrates strong software engineering practices.

🔍 Code Quality & Best Practices

Strengths:

Excellent documentation: Every code path includes detailed comments explaining the behavior change and rationale (e.g., meshtasticManager.ts:4234-4239)
Defensive programming: Multiple layers of protection in virtualNodeServer.ts:775-778 (both nodeNum === BROADCAST_NODE_NUM and nodeId === '!ffffffff')
Clean separation of concerns: Virtual Node hardening vs. data generation fixes are handled in appropriate modules
Comprehensive test coverage: New dedicated test file with 8 focused test cases plus rewritten existing tests

Minor observations:

Code consistently follows established patterns and error handling conventions
Variable naming is clear and intention-revealing (intermediateHops, BROADCAST_NODE_NUM)
Proper async/await usage throughout

🐛 Bug Analysis

No bugs identified. The implementation correctly addresses all aspects of the zombie node issue:

Root cause fixed: Four code paths no longer stamp lastHeard on synthetic nodes (meshtasticManager.ts:4248-4250, 5338-5343, 6002-6007, 6205-6211)
Defensive filtering: virtualNodeServer.ts:775-778 prevents any pre-existing zombie rows from leaking through
Source isolation: getActiveNodes now properly scoped to sourceId (virtualNodeServer.ts:765)

Edge case handling:

Gracefully handles existing installations that may have pre-stamped broadcast rows
Properly scopes to source ID for multi-source deployments
Handles AdminMessage decode failures gracefully

🔒 Security Analysis

Strong security posture:

Admin command filtering strengthened: removeByNodenum is now blocked regardless of allowAdminCommands setting
Fake ACK implementation: meshtasticProtobufService.ts:1340-1380 creates proper routing acknowledgments without exposing internal state
Input validation: AdminMessage decoding includes try/catch blocks with safe defaults
Principle of least privilege: Virtual Node Server continues to block dangerous operations by default

No security concerns identified.

⚡ Performance Considerations

Positive impact:

Reduced database load: Skipping known nodes in traceroute processing (meshtasticManager.ts:5329-5334) avoids unnecessary upserts
Efficient filtering: Broadcast node filter happens early in the loop (virtualNodeServer.ts:776-778)
Minimal overhead: Source ID scoping adds one parameter to existing query

No performance regressions identified.

🧪 Test Coverage Analysis

Excellent coverage:

New comprehensive test suite: virtualNodeServer.zombieFix.test.ts with 8 targeted test cases
Integration scenarios: Tests both allowAdminCommands=false and =true configurations
Edge case coverage: Broadcast node filtering, source ID scoping, decode failures
Behavior preservation: Ensures setFavoriteNode still works while removeByNodenum is blocked
Legacy compatibility: Tests handle existing installations with pre-stamped zombie data

Rewritten tests properly document behavior change: meshtasticManager.traceroute-hops.test.ts header clearly explains the #2610 → #2602 tradeoff decision.

🎯 Specific Technical Feedback

meshtasticManager.ts:5320-5334: The approach of skipping known hops entirely is smart - avoids unnecessary database churn while preserving the intent of creating stubs for unknown nodes.
virtualNodeServer.ts:378-408: The removeByNodenum interception logic is well-structured. Using createFakeRoutingAck maintains client compatibility while preventing harmful operations.
meshtasticProtobufService.ts:1340+: New createFakeRoutingAck method follows established patterns and properly constructs protobuf messages.
Test architecture: The mock setup in virtualNodeServer.zombieFix.test.ts is comprehensive and properly isolates the system under test.

🚀 Recommendations

This PR is ready for merge. Consider these minor enhancements for future iterations:

Metrics: Could add counters for blocked removeByNodenum attempts to help administrators understand client behavior
Configuration: The Virtual Node Server could expose zombie filtering as a configurable option (though the current behavior is correct by default)
Documentation: Consider adding this fix to architecture documentation as an example of multi-system issue resolution

📊 System Test Results Assessment

The failing system tests (Quick Start, Security, Reverse Proxy) are correctly identified as unrelated to this PR's changes. The Virtual Node CLI test passed, which is the critical validation for this fix.

Verdict: System test results support approval.

✅ Final Recommendation: APPROVE

This PR demonstrates excellent software engineering practices:

Thorough root cause analysis and surgical fixes
Comprehensive testing with clear behavior documentation
Strong security posture with defense-in-depth
Performance-conscious implementation
Clear documentation and maintainability

The zombie node issue is fully resolved without introducing regressions.

github-actions · 2026-04-09T18:27:53Z

System Test Results

MeshMonitor System Test Results

Test Run: 2026-04-09 14:23:40 EDT

Test Summary

Test Suite	Result
Configuration Import	✅ PASSED
Quick Start Test	❌ FAILED
Security Test	❌ FAILED
V1 API Test	✅ PASSED
Reverse Proxy Test	❌ FAILED
Reverse Proxy + OIDC	✅ PASSED
Virtual Node CLI Test	✅ PASSED
Backup & Restore Test	✅ PASSED
Database Migration Test	✅ PASSED
DB Backing Consistency	✅ PASSED
API Exercise (3 DBs)	✅ PASSED

❌ Overall Result: FAILED

Some tests failed. Please review the failures above and fix before creating/updating PR.

Failed Tests

Quick Start Test: Zero-config deployment test failed
Security Test: API endpoint security test failed
Reverse Proxy Test: Production HTTPS deployment test failed

github-actions · 2026-04-09T18:42:46Z

System Test Results

MeshMonitor System Test Results

Test Run: 2026-04-09 14:42:44 EDT

Test Summary

Test Suite	Result
Configuration Import	✅ PASSED
Quick Start Test	✅ PASSED
Security Test	✅ PASSED
V1 API Test	✅ PASSED
Reverse Proxy Test	✅ PASSED
Reverse Proxy + OIDC	✅ PASSED
Virtual Node CLI Test	✅ PASSED
Backup & Restore Test	✅ PASSED
Database Migration Test	✅ PASSED
DB Backing Consistency	✅ PASSED
API Exercise (3 DBs)	✅ PASSED

✅ Overall Result: PASSED

All deployment configurations are working correctly!

Test Details

Configuration Import:

Tests configuration import and device reboot cycle
Verifies channel roles, PSKs, and LoRa configuration
Note: Channel name verification skipped due to architectural limitation

Quick Start Test:

Zero-config deployment (no SESSION_SECRET or COOKIE_SECURE required)
HTTP access without HSTS
Auto-generated admin user with default credentials
Session cookies work over HTTP
Meshtastic node connection and message exchange verified

Security Test:

Verifies Node IP address hidden from anonymous users in API responses
Verifies MQTT configuration hidden from anonymous users
Verifies Node IP address visible to authenticated users
Verifies MQTT configuration visible to authenticated users
Verifies protected endpoints require authentication

V1 API Test:

Tests v1 REST API endpoints with Bearer token authentication
Verifies Bearer token requests bypass CSRF protection
Verifies POST/PUT/DELETE work without CSRF token when using Bearer auth
Verifies session-based requests still require CSRF token

Reverse Proxy Test:

Production deployment with COOKIE_SECURE=true
HTTPS-ready configuration
Trust proxy enabled for reverse proxy compatibility
CORS configured for HTTPS domain
Meshtastic node connection and message exchange verified

Reverse Proxy + OIDC Test:

OIDC authentication integration
Mock OIDC provider health checks
Authorization flow and session creation
Hybrid mode (OIDC + local auth)
Meshtastic node connection verified

Virtual Node CLI Test:

Virtual Node Server enabled on TCP port 4404
Meshtastic Python client successfully connects
Node data download and synchronization verified
Test message sent on gauntlet channel (index 3)
Message delivery confirmed via Web UI API
Virtual Node Server connection logging verified

Backup & Restore Test:

System backup created from running dev container
New container spun up with RESTORE_FROM_BACKUP env var
Data integrity verified (node count, message count, settings)
Restore event logged in audit log
Dev container unaffected by restore test

Database Migration Test:

SQLite to PostgreSQL migration verified
SQLite to MySQL migration verified
Data integrity confirmed for both target databases
Row counts match between source and target

DB Backing Consistency Test:

SQLite, PostgreSQL, and MySQL backends tested with same device
Node counts within ±10 across all three backends
Favorite counts identical across all backends
Key station verified as favorite on all backends

…port is fresh (#3026) PR #2615 (zombie-node fix) intentionally leaves `lastHeard=NULL` on placeholder neighbor rows we've only learned about second-hand through a NeighborInfo packet. That was correct in isolation, but the read-side filter in both /api/neighbor-info (server.ts) and /api/sources/:id/neighbor-info (sourceRoutes.ts) rejects any row whose neighbor.lastHeard is NULL or older than the cutoff, so every indirect-neighbor link gets dropped on read. This change keeps the reporter freshness check (we must have directly heard from the reporting node), but falls back to the NeighborInfo record's own `timestamp` / firmware-supplied `lastRxTime` when the neighbor's `lastHeard` is missing or stale. `timestamp` is stored in ms (set via `Date.now()` in meshtasticManager), `lastRxTime` is firmware seconds — handled accordingly. Tests: - src/server/routes/sourceRoutes.neighbor-info.test.ts — fixture timestamp corrected to ms (matches real DB shape); existing staleness/null tests retargeted at the reporter side; new cases cover neighbor.lastHeard NULL + fresh NI report (kept), NULL + stale NI report (dropped), and NULL + fresh lastRxTime (kept). - src/server/server.neighbor-info-position.test.ts — inline shadow filter synced with the real handler so the test mirror does not drift. Refs #3025 Authored by Merlin 🪄 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Neighbors view (#3560) The live source map and the Map Analysis page disagreed on which neighbor links to draw. Both read the same neighbor_info table, but applied different freshness filters: - Map Analysis "Neighbors" (/api/analysis/neighbors) windows by the NeighborInfo record's own `timestamp` (report recency). - The live map (/api/sources/:id/neighbor-info and the legacy /api/neighbor-info) gated on the endpoint *nodes'* `lastHeard`, so a node's stale last-known neighbor list kept showing for as long as the node stayed otherwise active (still sending position/telemetry). That surfaced far more links than the analysis view and misrepresented how recently each RF link was actually observed. Switch the live-map endpoints to report-time freshness (record `timestamp` within maxNodeAge), matching the analysis definition. Because each NeighborInfo packet deletes-then-reinserts the reporter's full list with a fresh timestamp, a fresh record inherently means we heard the reporter within the window. Keying off the record timestamp (instead of node `lastHeard`) also naturally preserves indirect-neighbor links whose neighbor row has a null lastHeard (#3025/#2615), since lastHeard is no longer consulted. Firmware-supplied `lastRxTime` is also dropped from the predicate: it is subject to device clock skew and the analysis view does not use it, so it could otherwise reintroduce divergence. Tests updated to assert report-time semantics (stale report dropped even when endpoint nodes are still active; fresh report kept even when a node's lastHeard is stale/null). Claude-Session: https://claude.ai/code/session_01SVGxkuD4Fwa2JGVim8ZeVj Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Yeraze merged commit c051ee4 into main Apr 9, 2026
18 of 19 checks passed

Yeraze mentioned this pull request Apr 9, 2026

fix(dashboard): source list doesn't refresh after add/edit/delete #2616

Merged

5 tasks

This was referenced May 15, 2026

[BUG] Meshtastic NeigborInfo: Response received but not shown in UI #3025

Closed

fix(neighbor-info): keep links when neighbor lastHeard is NULL but NI report is fresh #3026

Merged

Yeraze mentioned this pull request Jun 19, 2026

fix(neighbors): align live map "Show Neighbor Info" with the Map Analysis Neighbors view #3560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(virtual-node): stop creating zombie nodes on map (#2602)#2615

fix(virtual-node): stop creating zombie nodes on map (#2602)#2615
Yeraze merged 1 commit into
mainfrom
fix/2602-virtual-node-zombies

Yeraze commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 9, 2026

MeshMonitor System Test Results

Test Summary

❌ Overall Result: FAILED

Failed Tests

Uh oh!

github-actions Bot commented Apr 9, 2026

MeshMonitor System Test Results

Test Summary

✅ Overall Result: PASSED

Test Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Yeraze commented Apr 9, 2026

Summary

Changes

1. Stop stamping lastHeard on indirect / synthetic nodes

2. Virtual Node Server hardening

3. Ack-and-drop removeByNodenum

Tests

System tests

Test plan

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Fix Virtual Node Zombies (#2602)

✅ Overall Assessment: Excellent

🔍 Code Quality & Best Practices

🐛 Bug Analysis

🔒 Security Analysis

⚡ Performance Considerations

🧪 Test Coverage Analysis

🎯 Specific Technical Feedback

🚀 Recommendations

📊 System Test Results Assessment

✅ Final Recommendation: APPROVE

Uh oh!

github-actions Bot commented Apr 9, 2026

MeshMonitor System Test Results

Test Summary

❌ Overall Result: FAILED

Failed Tests

Uh oh!

github-actions Bot commented Apr 9, 2026

MeshMonitor System Test Results

Test Summary

✅ Overall Result: PASSED

Test Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Stop stamping `lastHeard` on indirect / synthetic nodes

3. Ack-and-drop `removeByNodenum`

github-actions Bot commented Apr 9, 2026 •

edited

Loading