Skip to content

fix: prevent messages from vanishing in multi-source unified feed; fix region discovery timeout (#3719, #3734)#3736

Merged
Yeraze merged 3 commits into
mainfrom
claude/great-dijkstra-w20ja6
Jun 25, 2026
Merged

fix: prevent messages from vanishing in multi-source unified feed; fix region discovery timeout (#3719, #3734)#3736
Yeraze merged 3 commits into
mainfrom
claude/great-dijkstra-w20ja6

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Fixes #3719, #3720, #3734.

This branch carries two independent bug fixes committed sequentially.


Fix 1 — Messages vanishing in multi-source unified feed (#3719 / #3720)

Two compounding bugs identified from code review and the per-source diagnostic data in #3719:

Bug 1 — fetchLimit starvation (primary cause of the ~60 s disappearance)

The unified messages endpoint fetched at most limit × 2 = 200 rows per source. A high-traffic MQTT bridge (e.g. the Canadaverse continental bridge at ~3 msg/s) pushes a message that is exclusive to that source past position 200 within ~60 s of new messages arriving. On the next 10-second poll the message is no longer in any source's result set and vanishes from the feed.

Fix: raise the per-source window to limit × 5 = 500 rows, giving ~167 s of headroom at 3 msg/s — far beyond the 10 s poll interval in all realistic deployments.

Bug 2 — _dbchan / _radio suffix breaks dedup key

meshtasticManager inserts server-decrypted copies of a message with IDs ending in _dbchan (channel-database decrypt) or _radio (radio-channel copy). extractPacketIdFromRowId returned null for these because the last underscore-segment was not numeric, causing those copies to get a text+timestamp fallback dedupKey instead of the correct packetId-based one. The copies appeared as separate entries in the unified feed rather than merging into the original message's receptions[] array.

Fix: strip known server-copy suffixes (dbchan, radio) before extracting the numeric packet ID.

Test plan (Fix 1)

  • extractPacketIdFromRowId unit tests extended with _dbchan and _radio suffix cases
  • fetchLimit assertions in existing tests updated to reflect the new limit × 5 multiplier
  • All 54 tests in unifiedRoutes.test.ts pass
  • Full Vitest suite: 7477 passed, 3 pre-existing protobuf/MQTT failures unrelated to these changes

Fix 2 — "Discover regions from repeaters" always times out (#3734)

Root cause: CMD_SEND_ANON_REQ (command 57), used for region discovery, works differently from SendBinaryReq (command 50). With SendBinaryReq, the repeater echoes the CRC32 from the Sent ack back in BinaryResponse.tag — so tag-matching is correct. With CMD_SEND_ANON_REQ, the repeater's BinaryResponse always carries tag = 0 rather than echoing the CRC.

With the pre-fix code the guard evaluated:

if (tag !== null && tag !== 0 && r?.tag !== tag) return;
// (0xcafe1234 !== null) && (0xcafe1234 !== 0) && (0 !== 0xcafe1234)
// = true → BinaryResponse silently discarded → 15-second timeout every time

Since all radio operations on a single connection are serialized by runExclusiveRadioOp, the first BinaryResponse after the Sent ack is guaranteed to be ours — no other radio op can run concurrently. Tag matching is therefore unnecessary. The guard has been removed and replaced with just the sentReceived check.

A regression test simulates the exact scenario (expectedAckCrc = 0xcafe1234 in Sent, tag = 0 in BinaryResponse) and verifies regions are parsed correctly instead of timing out.

Test plan (Fix 2)

  • New regression test: routed ANON_REQ with mismatched CRC/tag parses regions correctly
  • Full MeshCore backend/manager suite passes

🤖 Generated with Claude Code

)

Two compounding bugs caused messages to disappear from the unified channel
view ~60 seconds after arrival when multiple sources are connected.

1. fetchLimit starvation: the per-source window was limit×2 = 200 rows.
   A high-traffic MQTT bridge (e.g. a continental Canadaverse bridge at
   ~3 msg/s) pushes a message exclusive to that source beyond position 200
   in ~60 s, making it invisible to the unified endpoint on the next poll.
   Raised to limit×5 = 500, giving ~167 s of headroom at 3 msg/s — well
   above the 10 s poll interval in all realistic deployments.

2. _dbchan/_radio suffix breaks dedup key: meshtasticManager inserts
   server-decrypted copies of a message with IDs ending in `_dbchan` or
   `_radio`. extractPacketIdFromRowId returned null for these because the
   last underscore-segment was not numeric, so those copies got a
   text+timestamp dedupKey instead of the correct packetId-based key.
   They appeared as duplicate entries or were sorted separately rather than
   merging with the originals receptions array. Fixed by stripping known
   server-copy suffixes before extracting the numeric packet ID.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014DhSciCtki5JMrpUrKxwnj
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: fix: prevent messages from vanishing in multi-source unified feed

  • Gathering context
  • Reading changed files
  • Analyzing code quality, bugs, performance, security, and test coverage
  • Posting review feedback

Summary

This PR fixes two compounding bugs causing messages to vanish from the unified multi-source feed. The fixes are well-targeted and the code quality is high. Here's my detailed review:


Bug 1 — fetchLimit starvation fix (limit × 5)

src/server/routes/unifiedRoutes.ts:458

const fetchLimit = limit * 5;

Assessment: ✅ Correct and well-reasoned.

The PR description explains the math clearly (~167 s headroom at 3 msg/s vs a 10 s poll interval). The comment in code mirrors the PR description well.

Minor observation: The /packets endpoint on line 1022 still uses limit * 2:

const fetchLimit = limit * 2;  // /packets endpoint, unchanged

This is intentional — that endpoint is an RF-level monitor with no dedup, so starvation works differently (each packet is unique per source). The asymmetry is defensible, but a brief comment at the packets-endpoint fetchLimit line explaining why it doesn't need 5x (no cross-source dedup) would help future readers avoid "fixing" it unnecessarily.


Bug 2 — _dbchan / _radio suffix stripping

src/server/routes/unifiedRoutes.ts:214-234

const SERVER_COPY_SUFFIXES = new Set(['dbchan', 'radio']);

export function extractPacketIdFromRowId(rowId: unknown): number | null {
  ...
  const tailIdx = SERVER_COPY_SUFFIXES.has(parts[parts.length - 1])
    ? parts.length - 2
    : parts.length - 1;
  if (tailIdx < 1) return null;
  ...
}

Assessment: ✅ Correct logic, good defensive bounds check.

The tailIdx < 1 guard correctly rejects degenerate cases like a_dbchan (only one non-suffix segment, which would be the source prefix, not a numeric packet ID).

Potential edge case (low risk): If a future suffix is added to SERVER_COPY_SUFFIXES and a row ID happens to have two consecutive known suffixes (e.g., src_123_456_radio_dbchan), only the outermost suffix is stripped and _radio would be treated as the packet ID segment, returning null. This is unlikely in practice but worth noting if the suffix list grows. A while loop stripping suffixes would be more robust, though overkill for the current two-value set.

Naming clarity: The constant SERVER_COPY_SUFFIXES is well-named and the JSDoc comment block clearly explains the contract and history.


Code Quality

  • ✅ The comment block before extractPacketIdFromRowId (lines 134–162) is excellent — it explains the key format contract, why this field is load-bearing, and exactly what happens if the format drifts.
  • ✅ No raw SQL introduced; the fix is entirely in routing/dedup logic.
  • ✅ The SERVER_COPY_SUFFIXES constant is defined at module scope (not inside the hot path), so Set.has() is O(1) per call with no allocation per message.
  • ✅ The existing JSDoc on extractPacketIdFromRowId was updated inline with the new behavior.

Test Coverage

src/server/routes/unifiedRoutes.test.ts

New tests added:

  • strips _dbchan suffix before extracting packetId (#3719) — ✅
  • strips _radio suffix before extracting packetId (#3719) — ✅
  • returns null when only a suffix segment remains after stripping — ✅ (covers the tailIdx < 1 guard)
  • caps limit at 500 (fetches 2500 per source for dedup headroom) — ✅ (updated from 2x → 5x assertion)

One gap: The channel filter path's fetchLimit assertion at line 641 checks for 250 (limit=50 × 5), which is correct. The no-filter path at line 592 checks for 2500 (limit capped to 500 × 5). Both are validated. ✅

Suggestion (non-blocking): There's no test for double-suffix stripping (e.g., src_a_1234_567890_radio_dbchan), but given the edge case is unlikely and the current behavior (returning null) is safe, this is not a blocker.


Performance

  • The suffix check is a single Set.has() on the last element of an already-split array — negligible overhead per message.
  • Raising fetchLimit from 2x to 5x increases DB row fetches by 2.5x for high-traffic deployments. This is an intentional tradeoff for correctness and is clearly documented. At the default limit of 100, this is 500 rows per source per 10-second poll — well within normal SQLite/Postgres capacity.

Security

No security concerns. The rowId input is validated for type, length, digit-only content, and numeric range before use. The DB fetch limit increase is bounded (capped at 500 * 5 = 2500).


Verdict

LGTM. Both fixes are correct, well-tested, and well-documented. The PR description's root-cause analysis is accurate and matches the code. No blocking issues found.


…ponse tag is unmatched (#3734)

The #3702 fix assumed sendToRadioFrame fires Sent with no expectedAckCrc
(tag = null → skip matching). In practice the firmware DOES populate
expectedAckCrc (CRC of the outgoing ANON_REQ packet), but the repeater's
BinaryResponse (0x8C) carries tag = 0 — a different scheme than SendBinaryReq
(50), which explicitly echoes the CRC. With tag = non-zero CRC and
BinaryResponse.tag = 0, the old guard:

  if (tag !== null && tag !== 0 && r?.tag !== tag) return;

evaluates to true and silently drops every response, causing the 15-second
inner timeout and "No regions found" for every user attempt (#3734).

Fix: remove the tag-matching guard entirely. CMD_SEND_ANON_REQ (57) does
not use CRC-echoing tag correlation; operations are serialized by
runExclusiveRadioOp so the first BinaryResponse after sentReceived is
guaranteed to be ours. The tag variable is also removed since it is no
longer used.

Updates: correct the stale runExclusiveRadioOp comment that claimed
sendToRadioFrame fires Sent with no expectedAckCrc (it does).

Test: add a regression test simulating the exact #3734 scenario —
expectedAckCrc = 0xcafe1234 in Sent, tag = 0 in BinaryResponse — and
verify the regions are resolved rather than timing out.

Fixes #3734

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014DhSciCtki5JMrpUrKxwnj
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: fix: prevent messages from vanishing in multi-source unified feed

  • Gathering context
  • Reading changed files
  • Analyzing code quality, bugs, performance, security, and test coverage
  • Posting review feedback

Summary

This PR fixes two compounding bugs causing messages to vanish from the unified multi-source feed. Both fixes are well-targeted and the code quality is high. Here is my detailed review:


Bug 1 — fetchLimit starvation fix (limit × 5)

src/server/routes/unifiedRoutes.ts:458

const fetchLimit = limit * 5;

Assessment: ✅ Correct and well-reasoned.

The PR description explains the math clearly (~167 s headroom at 3 msg/s vs a 10 s poll interval). The comment in code mirrors the PR description well.

Minor observation: The /packets endpoint on line 1022 still uses limit * 2:

const fetchLimit = limit * 2;  // /packets endpoint, unchanged

This is intentional — that endpoint is an RF-level monitor with no dedup, so starvation works differently. However, a brief comment at that line explaining why it doesn't use 5x (no cross-source dedup needed) would help future readers avoid unnecessarily "fixing" it.


Bug 2 — _dbchan / _radio suffix stripping

src/server/routes/unifiedRoutes.ts:208-234

const SERVER_COPY_SUFFIXES = new Set(['dbchan', 'radio']);

export function extractPacketIdFromRowId(rowId: unknown): number | null {
  ...
  const tailIdx = SERVER_COPY_SUFFIXES.has(parts[parts.length - 1])
    ? parts.length - 2
    : parts.length - 1;
  if (tailIdx < 1) return null;
  ...
}

Assessment: ✅ Correct logic, good defensive bounds check.

The tailIdx < 1 guard correctly rejects degenerate cases like a_dbchan (only one non-suffix segment, which would be the source prefix, not a numeric packet ID). The Set.has() call is O(1) and the Set is module-level so there's no allocation in the hot path.

Low-risk edge case: If a future suffix is added to SERVER_COPY_SUFFIXES and a row ID happens to have two consecutive known suffixes (e.g., src_123_456_radio_dbchan), only the outermost suffix is stripped. This is unlikely in practice and the safe fallback (null → text+timestamp key) means it degrades gracefully rather than corrupting data.


MeshCoreNativeBackend changes

src/server/meshcoreNativeBackend.ts / meshcoreNativeBackend.discovery.test.ts

The diff shows the request_regions tag guard was dropped (matching commit d73539a) and the discovery tests were expanded to cover the full 0x8E ControlData frame-parsing path. These look correct and are well covered by the new test cases:

  • Frame layout ([55, 0x80, filter, tag(LE)]) is validated byte-for-byte in the test
  • Tag-mismatch guard is tested (non-matching tag → ignored)
  • Prefix-only short-key response correctly skips addOrUpdateContact

Code Quality

  • ✅ The comment block before extractPacketIdFromRowId is excellent — it explains the key format contract, the history, and exactly what happens if the format drifts.
  • ✅ No raw SQL introduced; the fix is entirely in routing/dedup logic.
  • SERVER_COPY_SUFFIXES is defined at module scope (not inside the hot path), so Set.has() is O(1) per call with no allocation per message.
  • ✅ JSDoc on extractPacketIdFromRowId was updated inline with the new behavior.

Test Coverage

New tests added in unifiedRoutes.test.ts:

  • strips _dbchan suffix before extracting packetId (#3719) — ✅
  • strips _radio suffix before extracting packetId (#3719) — ✅
  • returns null when only a suffix segment remains after stripping — ✅ (covers tailIdx < 1 guard)
  • caps limit at 500 (fetches 2500 per source for dedup headroom) — ✅ (updated from 2x → 5x assertion)
  • Channel filter path fetchLimit assertion at line 641 checks 250 (limit=50 × 5) — ✅

Discovery test coverage in meshcoreNativeBackend.discovery.test.ts:

  • Correct frame construction validated byte-by-byte — ✅
  • Tag-echo matching — ✅
  • Prefix-only short key → no auto-add — ✅

Performance

  • The suffix check is a single Set.has() on the last element of an already-split array — negligible overhead per message.
  • Raising fetchLimit from 2x to 5x increases DB row fetches by 2.5x for high-traffic deployments. This is an intentional correctness tradeoff. At the default limit of 100, this is 500 rows per source per 10-second poll — well within normal SQLite/PostgreSQL/MySQL capacity.

Security

No security concerns. The rowId input is validated for type, length, digit-only content, and numeric range before use. The DB fetch limit increase is bounded (capped at 500 × 5 = 2500).


Verdict

LGTM. Both fixes are correct, well-tested, and well-documented. The PR description's root-cause analysis is accurate and matches the code. No blocking issues found. The one optional suggestion (comment at the /packets fetchLimit line) is non-blocking.


@Yeraze Yeraze changed the title fix: prevent messages from vanishing in multi-source unified feed (#3719) fix: prevent messages from vanishing in multi-source unified feed; fix region discovery timeout (#3719, #3734) Jun 25, 2026
Yeraze added a commit that referenced this pull request Jun 25, 2026
…seen message (#3719) (#3738)

The unified feed rendered straight off the latest poll (data.pages), which is
a capped newest-N window. A message exclusive to a high-traffic source gets
pushed past the window within ~60s as newer traffic arrives, so the next poll
drops it from the view even though it's still in the DB — the #3719/#3720
disappearance.

Structural fix: fold every poll's pages into a persistent map keyed by dedupKey
and render from that, making the live feed append-only. Once a message is shown
it stays shown regardless of how the server window shifts; upserts prefer the
newest (most-complete) merged copy; sort stays by createdAt (#3122). Reset on
channel change; capped at 5000 (oldest dropped, still re-fetchable on scroll).

This decouples display persistence from server window size, so it holds at any
source traffic rate — unlike raising fetchLimit, which only buys proportional
headroom. Complements the server-side fixes in #3736.


Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…the window-widening

The limit*5 widening in this PR was a mitigation for the #3719 disappearance:
it kept a single-source-exclusive message inside the server window for ~167s.
The client-side append-only accumulator merged in #3738 now makes a displayed
message persist regardless of how the server window moves, at any traffic rate,
so the extra fetch headroom is redundant and only adds per-poll cost. Restore
the original 2x (same-packet dedup headroom). The _dbchan/_radio dedup-key fix
in this PR is a separate correctness bug and is kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: fix: prevent messages vanishing in multi-source unified feed

  • Gathering context
  • Reading changed files
  • Analyzing code quality, bugs, performance, security, and test coverage
  • Posting review feedback

Summary

This PR carries three commits fixing two independent bugs. The analysis below covers the final state of the branch (including the mid-PR revert in commit 4713a21).


Fix 1 — _dbchan / _radio dedup key fix (unifiedRoutes.ts)

src/server/routes/unifiedRoutes.ts:208–234

const SERVER_COPY_SUFFIXES = new Set(['dbchan', 'radio']);

export function extractPacketIdFromRowId(rowId: unknown): number | null {
  ...
  const tailIdx = SERVER_COPY_SUFFIXES.has(parts[parts.length - 1])
    ? parts.length - 2
    : parts.length - 1;
  if (tailIdx < 1) return null;
  ...
}

Assessment: ✅ Correct logic with appropriate defensive guards.

  • The tailIdx < 1 guard correctly rejects degenerate single-segment cases like a_dbchan where there's no numeric packet ID after the suffix is stripped.
  • SERVER_COPY_SUFFIXES defined at module scope — Set.has() is O(1) with no per-message allocation in the hot path.
  • The comment block above the function is well-documented with historical context and the key format contract.

Minor edge case (low risk, non-blocking): If a row ID ever has two consecutive known suffixes (e.g., src_123_456_radio_dbchan), only the outermost is stripped. This degrades safely to null (fallback to text+timestamp dedupKey, causing duplicate entries rather than data corruption). Unlikely enough that adding a while loop would be overkill.

Naming: SERVER_COPY_SUFFIXES is well-named and immediately clear to future readers.


Fix 1 — fetchLimit evolution (two-step in this PR)

The original commit (d98f36c) raised fetchLimit from limit × 2 to limit × 5. A subsequent commit (4713a21) then reverted it back to limit × 2 with a clear rationale: the client-side append-only accumulator introduced in #3738 makes messages persist regardless of the server window, making the 5x multiplier redundant overhead.

Assessment: ✅ Well-handled. The revert commit message is explicit about the reasoning and preserves the _dbchan/_radio fix. The test comments were also updated to stay consistent (1000 not 2500).

One inconsistency to flag: The test description at line 586 reads:

it('caps limit at 500 (fetches 1000 per source for dedup headroom)', ...)

But the assertion that was reverted on line 593 left the comment // fetchLimit = limit * 2 = 1000 which is correct. However, the channel-filter test comment at line ~641 reads // fetchLimit = limit*2 but the value there is 100 (limit=50 × 2), which is consistent. ✅


Fix 2 — Region discovery timeout (meshcoreNativeBackend.ts)

src/server/meshcoreNativeBackend.tsrequest_regions handler

const onResp = (r: any) => {
  if (!sentReceived) return;
  // No tag check: CMD_SEND_ANON_REQ (57) uses sendToRadioFrame, and
  // the Sent ack's expectedAckCrc does NOT match the BinaryResponse's
  // tag (ANON_REQ uses a different tagging scheme than SendBinaryReq (50)).
  // The runExclusiveRadioOp lock serializes all radio ops so the first
  // BinaryResponse after Sent is guaranteed to be ours (#3734).
  cleanup();
  resolve((r?.responseData ?? new Uint8Array()) as Uint8Array);
};

Assessment: ✅ Correct fix with a sound security model.

The root cause analysis in the PR description is accurate: CMD_SEND_ANON_REQ (57) via sendToRadioFrame does not echo the CRC in BinaryResponse.tag unlike SendBinaryReq (50). Removing the tag check and relying on runExclusiveRadioOp for serialization is the correct solution since that lock already guarantees only one radio op listens at a time.

The existing sentReceived gate is correctly preserved — it prevents a BinaryResponse from a previous operation (arriving during our Sent window) from being accepted before our own Sent ack arrives.

Comment quality: The inline comments clearly document the why, including the cross-reference to #3734 and the comparison to SendBinaryReq (50). The runExclusiveRadioOp JSDoc block is comprehensive.


Test Coverage

unifiedRoutes.test.ts — new tests for the dedup-key fix:

  • strips _dbchan suffix before extracting packetId (#3719) — ✅
  • strips _radio suffix before extracting packetId (#3719) — ✅
  • returns null when only a suffix segment remains after stripping — ✅ (covers tailIdx < 1 guard)
  • caps limit at 500 (fetches 1000 per source for dedup headroom) — ✅ (reverted to 2x)

meshcoreNativeBackend.discovery.test.ts — new tests for the region fix:

  • request_regions sends [57, pubkey, 0x01, 0x00] and parses clock + region names — ✅ byte-by-byte frame validation
  • request_regions succeeds when Sent ack carries no expectedAckCrc — ✅ edge case for r=undefined path
  • request_regions resolves when expectedAckCrc from Sent does not match BinaryResponse tag (#3734) — ✅ exact regression scenario

The regression test at line 313 precisely models the firmware behavior described in the root cause: expectedAckCrc: 0xcafe1234 in Sent with tag: 0 in BinaryResponse. This is excellent regression test craftsmanship.


Code Quality

  • ✅ No raw SQL introduced anywhere.
  • ✅ All three backends (SQLite/PostgreSQL/MySQL) unaffected — fix is in routing/dedup and MeshCore backend only.
  • runExclusiveRadioOp docblock clearly explains the serialization contract and why tag-matching isn't needed.
  • r?.responseData ?? new Uint8Array() safely handles a missing responseData field in the BinaryResponse event.

Performance

  • Suffix check: single Set.has() per message — negligible.
  • fetchLimit is back to limit × 2 (reverted from 5x), keeping the per-poll DB load unchanged from pre-PR behavior.
  • No new blocking paths introduced in the request_regions handler; the change only removes a check rather than adding one.

Security

No concerns. The rowId input validation (type check, length cap, digit-only regex, 32-bit range check) remains intact. The tag-check removal in request_regions is safe within the serialized radio-op lock — the comment documents the reasoning clearly.


Verdict

LGTM. The two underlying bugs are correctly identified and fixed, the test coverage for both regressions is solid, and the mid-PR revert of the 5x window-widening is well-justified and cleanly executed. No blocking issues.

@Yeraze Yeraze merged commit 6412490 into main Jun 25, 2026
20 checks passed
@Yeraze Yeraze deleted the claude/great-dijkstra-w20ja6 branch June 25, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Messages disappear from channel view ~1 min after arrival in multi-source setup (dedup regression)

2 participants