Skip to content

fix(dm): skip PKI flag when keyMismatchDetected to prevent silent drop after firmware purge#3140

Merged
Yeraze merged 1 commit into
mainfrom
fix/dm-pki-block-when-firmware-purged
May 22, 2026
Merged

fix(dm): skip PKI flag when keyMismatchDetected to prevent silent drop after firmware purge#3140
Yeraze merged 1 commit into
mainfrom
fix/dm-pki-block-when-firmware-purged

Conversation

@Yeraze

@Yeraze Yeraze commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

  • sendTextMessage was setting pkiEncrypted=true whenever our nodes.publicKey column was non-null, but after the key-repair flow calls sendRemoveNode the firmware no longer holds a key for that destination — so the firmware silently drops the DM.
  • Add a keyMismatchDetected guard so we fall back to channel encryption while a mismatch is active, matching what the post-purge node-info-exchange path already does (meshtasticManager.ts:2174-2175, meshtasticManager.ts:5604-5605).
  • Regression test covers the five publicKey × keyMismatchDetected combinations.

Why this matters / how it surfaced

This was masked by — and revealed through — #3133. MQTT-first-seen nodes had publicKey stored as hex; every subsequent direct-radio NodeInfo string-compared unequal against the base64 form, tripping a false mismatch → auto-purge → firmware NodeDB no longer has the destination's key. With our DB still holding a (now-stale) hex key, sendTextMessage kept setting pkiEncrypted=true, the firmware kept dropping the DM, and users saw "messages send but never arrive."

The hex/base64 normalization in #3133 stopped the false positives, which is why DMs flowed again. The underlying "PKI requested but firmware has no key → silent drop" hazard remains for genuine key changes (lost device, fresh keypair, etc.), hence this guard.

The bug chain

  1. MQTT-first-seen node's publicKey stored as hex.
  2. Direct-radio NodeInfo arrives base64.
  3. Key-mismatch detector at meshtasticManager.ts:5572 does string-equality → false positive → keyMismatchDetected = true.
  4. processKeyRepairs (or the immediate-purge branch at line 5610) calls sendRemoveNode(node.nodeNum) — firmware purges the entry, including its public key.
  5. Our DB still has a publicKey value → sendTextMessage sets pkiEncrypted=true.
  6. Firmware can't PKI-encrypt without the dest key → drops the outbound DM. No error surfaces to the client.

After this PR, step 5 checks keyMismatchDetected and skips PKI. The DM goes out over channel encryption until a fresh NodeInfo re-establishes the key.

Test plan

  • npx vitest run src/server/meshtasticManager.test.ts — 111 passed, 0 failed
  • npx tsc --noEmit — clean
  • Manual: with a node that has keyMismatchDetected=true in the DB, send a DM and verify the log shows the "skipping PKI" message and the packet reaches the destination on channel encryption.
  • Manual: confirm normal-state DMs (no mismatch) still get pkiEncrypted=true.

Related

🤖 Generated with Claude Code

…ey after purge

When key-repair detects a mismatch and (via processKeyRepairs or the
immediate-purge path) calls sendRemoveNode, the firmware purges the
destination's entry — including its public key — from its NodeDB. Our
local nodes.publicKey column still holds the (now-stale) key, so
sendTextMessage was still setting pkiEncrypted=true. The firmware, with
no key to use, silently drops the outbound DM. Net effect: DMs to a
node with an active keyMismatchDetected appear to "send" from
MeshMonitor's side but never reach the destination.

This was masked by — and the symptom surfaced through — #3133 (MQTT
publicKey hex vs base64). MQTT-first-seen nodes had their key written
as hex; every subsequent direct-radio NodeInfo string-compared unequal
against the base64 form, triggering a false mismatch, triggering the
auto-purge, leaving the firmware keyless for that destination. Once
the hex/base64 normalization landed (2430152), the false positives
stopped and DMs flowed again. The underlying "PKI=true with no key →
silent drop" problem remains for genuine key changes, hence this
guard.

The fix consults targetNode.keyMismatchDetected before requesting PKI.
When the flag is set, we skip pkiEncrypted=true and fall back to
channel encryption — matching what the post-purge node-info-exchange
code already does (lines 2174-2175, 5604-5605).

Regression test: src/server/meshtasticManager.test.ts adds a "DM PKI
guard" describe block covering the five publicKey/keyMismatchDetected
combinations, mirroring the production decision logic so future
refactors of that branch fail loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review Complete

This is a well-crafted fix for a subtle but critical bug in PKI encryption handling. Here's my comprehensive review:

🎯 Summary

Excellent work! This PR addresses a genuine security/reliability issue where DMs were being silently dropped by firmware after key repair flows. The fix is minimal, targeted, and well-tested.

Strengths

Code Quality & Best Practices

  • Clear logic flow: The PKI decision logic in sendTextMessage:7789 is easy to follow
  • Excellent logging: Both debug and info level logging provide clear visibility into PKI decisions
  • Defensive programming: Proper error handling with try/catch around node lookups
  • Follows repo conventions: Adheres to CLAUDE.md guidelines around multi-source architecture and logging standards

Security & Correctness

  • Addresses real vulnerability: Prevents silent message drops that could cause user confusion
  • Proper security fallback: Falls back to channel encryption (PSK-based) when PKI is unreliable
  • Aligns with existing patterns: Matches the guard logic already used in other parts of the codebase (meshtasticManager.ts:2174-2175, 5604-5605)
  • No information leakage: Doesn't expose sensitive key material in logs

Test Coverage

  • Comprehensive edge case testing: The test suite covers all 5 combinations of publicKey × keyMismatchDetected
  • Clear test intent: Test descriptions explicitly explain the security implications
  • Good regression protection: The decidePkiEncrypted test function mirrors the actual implementation logic
  • Realistic test data: Uses proper base64 key format in test cases

🔍 Technical Analysis

Root Cause Understanding

The fix correctly identifies the core issue: after processKeyRepairs calls sendRemoveNode, the firmware's NodeDB no longer has the target's public key, but the application DB still stores it. This state mismatch caused pkiEncrypted=true requests to be silently dropped.

Implementation Approach

  • Minimal change: Only affects the PKI decision point, no broad architectural changes
  • Logical gate ordering: Checks keyMismatchDetected after confirming publicKey exists
  • Appropriate scope: Only affects DM routing (destination specified), channel messages unaffected

📋 Minor Suggestions

  1. Error handling consistency (meshtasticManager.ts:7795): The empty catch block is appropriate here, but consider adding a debug log for troubleshooting:

    } catch (error) {
      logger.debug(`Failed to lookup target node ${destination} for PKI decision:`, error);
      // If lookup fails, send without PKI — firmware will use channel encryption
    }
  2. Comment clarity (meshtasticManager.ts:7779-7784): The comment is excellent but could be slightly more concise:

    // Skip PKI when keyMismatchDetected: after firmware purge (processKeyRepairs),
    // our DB still has the stale key but firmware lacks it → silent drop.
    // Fall back to channel encryption until fresh NodeInfo resolves mismatch.

🚫 No Significant Issues Found

  • No performance concerns: Database lookup is already required for DM routing
  • No security regressions: Falling back to channel encryption is the correct safe default
  • No architectural violations: Follows existing multi-source patterns correctly
  • No edge cases missed: Test coverage is comprehensive

🧪 Test Plan Validation

The manual test steps in the PR description are appropriate:

  • ✅ Unit tests pass (meshtasticManager.test.ts)
  • ✅ TypeScript compilation clean
  • 📋 Manual verification steps are realistic and sufficient

🎯 Overall Assessment

LGTM with high confidence. This is a high-quality fix that:

  • Solves a real user-impacting bug
  • Uses minimal, surgical changes
  • Includes comprehensive test coverage
  • Follows established patterns and conventions
  • Has clear documentation of the problem and solution

The engineering approach demonstrates solid understanding of the Meshtastic protocol, firmware behavior, and the codebase architecture. This is exactly the kind of targeted fix that should be merged quickly.


Recommendation: Approve and merge

@Yeraze Yeraze merged commit 9498bd7 into main May 22, 2026
19 checks passed
@Yeraze Yeraze deleted the fix/dm-pki-block-when-firmware-purged branch May 22, 2026 15:07
Yeraze added a commit that referenced this pull request May 22, 2026
…ish (#3145)

Version bump and CHANGELOG entry for 4.6.5. Highlights:

- #3143 MQTT Bridge mirror dashboard (read-only Open button on bridge
  cards; suppresses every TX surface; MqttBridgeManager now exposes the
  Meshtastic-shaped methods the consolidated /api/poll + /api/connection
  endpoints expect).
- #3142 MeshCore Telemetry dashboard tab via source-agnostic Dashboard.
- #3141 MeshCore: preserve LPP channel byte in remote telemetry rows.
- #3140 DM: skip PKI flag when keyMismatchDetected.
- #3138 docs reorganization.
- #3137 unified: merge nodes across sources so labels show in the
  unified Nodes view.
- #3136 mqtt: allow standalone mqtt_bridge as client-proxy target.

CLAUDE.md version line updated to 4.6.5.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant