Skip to content

fix(meshcore): correlate buffered SNR/route to the matching packet for auto-ack {SNR}/{ROUTE} (#3589)#3592

Merged
Yeraze merged 1 commit into
mainfrom
fix/3589-meshcore-snr-route-tokens
Jun 21, 2026
Merged

fix(meshcore): correlate buffered SNR/route to the matching packet for auto-ack {SNR}/{ROUTE} (#3589)#3592
Yeraze merged 1 commit into
mainfrom
fix/3589-meshcore-snr-route-tokens

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Problem

{SNR} and {ROUTE} in MeshCore companion auto-acknowledge templates were intermittently empty for both DMs and channel messages (issue #3589) — direct-hop messages dropped {SNR}, multi-hop dropped both, with no apparent pattern.

Root cause

The per-packet SNR and the relay-hash chain are not delivered on the MeshCore ContactMsgRecv / ChannelMsgRecv events (confirmed in meshcore.js: those events only carry pathLen, no SNR, no hashes). They arrive on a separate LogRxData push, which the firmware emits immediately before the matching txt-msg recv. The backend buffered that data in a single slot (pendingTxtMsgPath) and consumed it on the next recv.

That single-slot correlation was fragile and produced the intermittent loss:

  1. Room-post buffer leak — a room post is also a TXT_MSG, so the preceding LogRxData buffered a path for it, but the room-post handler returned early without consuming the buffer. The stale SNR/route then leaked onto the next unrelated DM/channel message.
  2. Stale / mis-correlated buffer — when a packet's LogRxData fired but its recv never arrived (a non-TXT packet, or raw logging gaps), the buffer lingered and could be attached to a later, unrelated message.

Fix

Introduce a shared consumePendingPath(msgPathLen) helper used by both recv handlers and the room-post path. It always clears the buffer (consume-once) and attaches the buffered SNR/hops only when both guards pass:

  1. Freshness — buffer must be younger than PENDING_PATH_MAX_AGE_MS (500 ms). LogRxData and its matching recv land on the same I/O tick; an older buffer belongs to a packet whose recv never arrived.
  2. pathLen correlation — the recv's packed pathLen byte must equal the buffered packet's rawPathLen (same wire byte for the same packet); a mismatch proves it's a different packet.

When the data genuinely isn't present, the tokens degrade as before ({SNR}, {ROUTE}direct/), but they no longer get a wrong value from an unrelated packet, and the leak that silently blanked them is eliminated.

Tests

Added to meshcoreNativeBackend.otaPacket.test.ts:

  • buffered path not attached when recv pathLen differs (different packet)
  • stale buffer (past the freshness window) not attached (fake timers)
  • consume-once: a second recv with no LogRxData gets nothing
  • room post consumes/discards its buffer so it can't leak onto the next message

Existing SNR/path correlation tests still pass.

Verification

  • Full Vitest suite: 7600 total, 7016 passed, 0 failed, 0 failed suites (success: true)
  • tsc --noEmit: no new errors introduced (pre-existing repo-wide frontend errors unchanged at 61)
  • Changed files add no new lint findings

Closes #3589

🤖 Generated with Claude Code

… packet (#3589)

The {SNR} and {ROUTE} auto-ack tokens for MeshCore companion messages
were intermittently empty because the per-packet SNR and relay-hash
chain are NOT carried on the ContactMsgRecv/ChannelMsgRecv events — they
arrive only on the separate LogRxData push, buffered in a single slot and
consumed by the next message recv.

That single-slot correlation was fragile:
- a room post (also a TXT_MSG) buffered a path but returned early WITHOUT
  consuming it, so its SNR/route leaked onto the next unrelated message;
- a stale buffer from a packet whose recv never arrived (e.g. a non-TXT
  packet, or LogRxData with no following recv) could be attached to a
  later message, or left dangling.

Add two correlation guards in a shared consumePendingPath() helper used by
both recv handlers and the room-post path:
1. Freshness — reject a buffer older than PENDING_PATH_MAX_AGE_MS (500ms);
   LogRxData fires on the same I/O tick as its matching recv.
2. pathLen match — the recv's packed pathLen byte must equal the buffered
   packet's rawPathLen; a mismatch proves it's a different packet.

The buffer is always cleared on consume (consume-once), and the room post
now consumes/discards its own buffer so it can't leak forward.

Adds tests for pathLen mismatch, staleness, consume-once, and the
room-post leak.

Closes #3589

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review — fix(meshcore): correlate buffered SNR/route to the matching packet (#3589)

  • Gathering context
  • Reading changed files
  • Reviewing code quality, bugs, performance, security
  • Reviewing test coverage
  • Posting final review

Summary

This is a well-reasoned fix for a real protocol-level problem. The root cause analysis is accurate, the implementation is clean and testable, and the new tests directly exercise the failure modes that caused the original bug. Here's the detailed review:


Code Quality & Design

consumePendingPath helper (meshcoreNativeBackend.ts:371-390) — the extraction of a reusable consumePendingPath(msgPathLen) method is excellent. It cleanly enforces consume-once semantics in a single place, which is exactly the right abstraction here. The dual-guard design (freshness + pathLen correlation) is sound.

bufferedAt timestamp — storing Date.now() at buffer-write time and checking it at consume time is a clean way to detect stale buffers. The 500ms window (PENDING_PATH_MAX_AGE_MS) is well-justified in the comment and appropriate for the firmware's same-I/O-tick delivery guarantee.

Room post fix (meshcoreNativeBackend.ts:475) — this was a real leak: isRoomPost branch returned early without consuming the buffer, allowing the stale path/SNR to bleed onto the next unrelated message. The fix (calling consumePendingPath(msg.pathLen) before the early return) is correct.


Potential Issue: pathLen correlation edge case

consumePendingPath when msgPathLen is not a number (meshcoreNativeBackend.ts:381-388):

if (
  typeof msgPathLen === 'number' &&
  typeof buffered.rawPathLen === 'number' &&
  msgPathLen !== buffered.rawPathLen
) {
  // Buffer is from a different packet (packed pathLen bytes differ).
  return undefined;
}

When msgPathLen is undefined or not a number, the pathLen correlation check is skipped entirely, and a fresh buffer is always attached. This is explicitly intended as a fallback for firmware that doesn't provide pathLen. However, this fallback means a stale-but-fresh (within 500ms) buffer from a different packet will still be attached if the recv event has no pathLen. This is an acceptable trade-off given the freshness window, but worth noting since it's a subtle assumption.


Minor observations

  1. snr: undefined explicitly set on room posts (meshcoreNativeBackend.ts:487) — this is consistent with the intent (not surfacing SNR on room posts), but the field could simply be omitted since it's the same as not including it. Minor style point, no functional issue.

  2. bufferedAt field in the private type — the inline type definition of pendingTxtMsgPath (meshcoreNativeBackend.ts:220) includes bufferedAt: number which is set on write and checked on consume. Clean; no issues.

  3. The PENDING_PATH_MAX_AGE_MS is declared as a static readonly (meshcoreNativeBackend.ts:234). This is a good choice — it means tests can verify the constant's value directly if needed, even though current tests use vi.useFakeTimers() to advance time past it (which is the right approach).


Test Coverage

The 4 new test cases in meshcoreNativeBackend.otaPacket.test.ts directly and specifically test the failure modes:

Test Scenario covered
does NOT attach … when recv pathLen differs Guard 2 (pathLen mismatch)
does NOT attach a stale buffered path Guard 1 (freshness, using vi.useFakeTimers)
consumes the buffer once — a second recv gets nothing consume-once semantics
room post consumes (and discards) the buffer Room-post leak fix

The fake timer test correctly uses try/finally to restore real timers — good defensive testing practice.

Gap to consider: There's no test for a ChannelMsgRecv with a mismatched pathLen. The consumePendingPath helper is shared, so the contact_message tests implicitly cover it, but a symmetrical channel_message pathLen-mismatch test would make the coverage more explicit.


Security & Performance

No security concerns. No performance concerns — the single-slot buffer and lightweight timestamp check are negligible overhead.


Verdict

Approve. The fix is correct, minimal, well-documented, and the new tests are targeted at the exact failure modes. The consumePendingPath abstraction is clean and leaves the code in better shape than before. The edge case around msgPathLen not being a number is a known intentional trade-off.

@Yeraze Yeraze merged commit 2ae7bba into main Jun 21, 2026
19 checks passed
@Yeraze Yeraze deleted the fix/3589-meshcore-snr-route-tokens branch June 21, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

{SNR} and {ROUTE} tokens intermittently missing in MeshCore companion auto-acknowledge messages

1 participant