Skip to content

fix(meshcore): Discover Regions always timed out due to missing Sent payload#3702

Merged
Yeraze merged 2 commits into
mainfrom
claude/great-dijkstra-y6fp31
Jun 24, 2026
Merged

fix(meshcore): Discover Regions always timed out due to missing Sent payload#3702
Yeraze merged 2 commits into
mainfrom
claude/great-dijkstra-y6fp31

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #3698 — "Discover regions from repeater" never returned any regions even with a working nearby repeater.

  • Root cause: sendToRadioFrame fires the Sent event without a payload. The request_regions handler expected expectedAckCrc in that payload (like sendTextMessage provides), leaving tag = null permanently. Every BinaryResponse was then discarded by the tag === null guard, the 15 s inner timeout fired silently, and all repeaters were skipped.
  • Fix: Track sentReceived as a separate boolean. When tag is null/0 (no usable CRC from the Sent ack), accept the first BinaryResponse after the Sent ack fires. When a non-zero tag is present, the original tag-matching logic is preserved.
  • Test: Added a regression test that simulates a payload-less Sent event and verifies the regions response is still resolved correctly.

Test plan

  • Run npm test — existing request_regions test still passes (tag-matching path exercised by the mock's expectedAckCrc: 0x99)
  • New test request_regions succeeds when Sent ack carries no expectedAckCrc passes
  • Manual verification: click "Discover regions from repeaters" with a nearby MeshCore repeater — regions should now appear (or "No regions found" if the repeater has no configured regions, which is correct behaviour)

🤖 Generated with Claude Code

https://claude.ai/code/session_017gqwwKYeyPEfZpMTWn5jX4


Generated by Claude Code

…ayload

`sendToRadioFrame` fires the `Sent` event without a payload, unlike
`sendTextMessage` which includes `expectedAckCrc`. The old `onSent`
handler tried to read `r?.expectedAckCrc` from this payload-less event,
leaving `tag = null`. The `onResp` guard `tag === null` then caused
every `BinaryResponse` to be silently discarded, so the 15 s inner
timer always fired and every repeater was skipped. Users saw "No regions
found" even with a working repeater nearby.

Fix: track `sentReceived` independently. When the Sent ack provides no
usable tag (null or 0), accept the first `BinaryResponse` that arrives
after the Sent ack — the requests are sequential so there is no
concurrent reply to misidentify. When a non-zero tag is present (future
firmware or sendBinaryRequest-style sends), the original tag-matching
logic is preserved.

Adds a regression test that simulates the payload-less Sent event and
verifies the response is still resolved correctly.

Fixes #3698

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017gqwwKYeyPEfZpMTWn5jX4
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review

  • Read changed files
  • Review code quality and correctness
  • Check test coverage
  • Post final feedback

Summary

This PR correctly identifies and fixes a real bug: sendToRadioFrame fires the Sent event without a payload, which meant expectedAckCrc was never populated, tag stayed null, and the tag !== null && tag !== 0 guard in onResp always blocked the BinaryResponse from being accepted. The 15-second inner timeout would fire silently every time. The root cause analysis is accurate.


Code Review (src/server/meshcoreNativeBackend.ts)

The fix logic (lines 956–992):

let sentReceived = false;
let tag: number | null = null;
...
const onSent = (r: any) => {
  sentReceived = true;
  tag = (r?.expectedAckCrc ?? null);
};
const onResp = (r: any) => {
  if (!sentReceived) return;
  if (tag !== null && tag !== 0 && r?.tag !== tag) return;
  cleanup();
  resolve((r?.responseData ?? new Uint8Array()) as Uint8Array);
};

Correctness: The logic is sound. Accepting the first BinaryResponse after Sent when tag is null/0 is reasonable — request_regions calls are sequential (gated by the outer 30s sendCommand timeout and the resolvePublicKey lookup), so a stale BinaryResponse from a prior request is unlikely to race in. The guard if (!sentReceived) return prevents premature acceptance.

Minor observations:

  1. tag being null vs 0 conflation (line 976): The condition tag !== null && tag !== 0 treats both null (no tag) and 0 (tag present but zero-valued) as "tag-less mode". A zero expectedAckCrc from firmware is theoretically possible, though in practice CRC=0 is unlikely and the existing tag-matching path (if sendTextMessage-style) would also need it to match. This is an acceptable simplification with a comment explaining it — the PR description covers the rationale.

  2. c.on(K.ResponseCodes.Sent, onSent) uses persistent listener (line 989): Using on instead of once means if Sent fires more than once (e.g., firmware quirk), tag gets overwritten. once would be safer here since the first Sent event is the one we care about. The BinaryResponse handler already uses on correctly to allow multiple non-matching responses to pass through. Changing Sent to once would be a small hardening improvement.

  3. void c.sendToRadioFrame(frame) (line 992): Consistent with the existing pattern. The return value isn't used, void suppresses the linter warning correctly.


Test Coverage (src/server/meshcoreNativeBackend.discovery.test.ts)

The new test at line 282–316 correctly simulates the bug scenario:

  • Overrides sendToRadioFrame to emit Sent with no payload
  • Emits BinaryResponse with a firmwareTag value that wouldn't have matched before
  • Asserts res.success === true and that the regions are correctly parsed

Good: The test exercises the exact failure mode described in the PR and verifies the fix end-to-end.

One gap: There's no test for the timeout path — i.e., verifying that if neither Sent nor BinaryResponse ever fires, the command rejects after timeoutMs. This isn't a regression introduced by this PR (the timeout existed before), but the new sentReceived guard creates an additional way for the promise to hang (if Sent never fires but BinaryResponse does). Worth noting as a future improvement.

Existing test preserved: The original request_regions sends [57, pubkey, 0x01, 0x00] test (line 255) still passes the expectedAckCrc: 0x99 path, which keeps the tag-matching code exercised. Good.


Verdict

Approve with one minor suggestion: Consider changing c.on(K.ResponseCodes.Sent, onSent) to c.once(...) on line 989 so a duplicate Sent event can't overwrite the tag after BinaryResponse has already been matched. Not a blocker — the fix is correct and well-tested as written.

Addresses code-review suggestion: the Sent ack fires exactly once per
frame send, so `once` is correct. Using `on` left a window where a
spurious duplicate Sent event could overwrite `tag` between the first
ack and the BinaryResponse arriving. No behaviour change for the normal
(single-fire) path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017gqwwKYeyPEfZpMTWn5jX4
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review

  • Read changed files
  • Review code quality and correctness
  • Check test coverage
  • Post final feedback

Summary

This PR correctly identifies and fixes a real bug: sendToRadioFrame fires the Sent event without a payload, so expectedAckCrc was never populated, tag stayed null, and the original guard if (tag === null || r?.tag !== tag) return always blocked the BinaryResponse from being accepted. The 15-second inner timeout would fire silently every time. The root cause analysis is accurate and the fix is well-targeted.


Code Review (src/server/meshcoreNativeBackend.ts)

The fix (lines 955–995):

let sentReceived = false;
let tag: number | null = null;
...
const onSent = (r: any) => {
  sentReceived = true;
  tag = (r?.expectedAckCrc ?? null);
};
const onResp = (r: any) => {
  if (!sentReceived) return;
  if (tag !== null && tag !== 0 && r?.tag !== tag) return;
  cleanup();
  resolve((r?.responseData ?? new Uint8Array()) as Uint8Array);
};
...
c.once(K.ResponseCodes.Sent, onSent);   // ← once, not on
c.on(K.PushCodes.BinaryResponse, onResp);
c.once(K.ResponseCodes.Err, onErr);

Correctness: The logic is sound. The two-variable approach (sentReceived + tag) cleanly separates "did we receive the Sent ack" from "do we have a usable CRC tag". Accepting the first BinaryResponse after Sent when tag is null/0 is correct for the sendToRadioFrame path — request_regions calls are sequential, so a stale BinaryResponse from a prior call is unlikely to race in, and the if (!sentReceived) return guard prevents premature acceptance.

Previous reviewer suggestion already applied: The c.on → c.once for Sent (line 992) has been correctly applied in this revision. This prevents a hypothetical duplicate Sent event from overwriting tag after the response has already been matched.

Minor observations (non-blocking):

  1. c.off(K.ResponseCodes.Sent, onSent) in cleanup() (line 983): Since Sent is now registered with once, calling c.off in cleanup() is a no-op if the once already fired and auto-removed the listener. This is harmless — EventEmitter.off is idempotent when the listener isn't registered — but worth knowing it's defensive dead code for the Sent listener only. The BinaryResponse and Err off-calls remain necessary.

  2. tag !== null && tag !== 0 condition (line 976): Treats both null (payload absent) and 0 (CRC explicitly zero) as "tag-less mode". CRC=0 from firmware is theoretically possible but extremely unlikely. The comment in the diff explains the rationale. Acceptable.

  3. onErr uses once (line 994): Correct — only one Err response expected. Consistent with the Sent listener.


Test Coverage (src/server/meshcoreNativeBackend.discovery.test.ts)

New test (lines 282–316) correctly simulates the exact bug scenario:

  • Overrides sendToRadioFrame to emit Sent without any payload
  • Emits BinaryResponse with tag: 0xdeadbeef (a non-zero firmware tag that wouldn't have matched before)
  • Asserts res.success === true and correct region/clock parsing

This is a good regression test — it would have caught the original bug.

Existing test preserved (line 255): The original request_regions sends [57, pubkey, 0x01, 0x00] test still exercises the tag-matching code path (the mock emits expectedAckCrc: 0x99). Both paths are now covered.

One potential gap (future improvement, not a blocker): No test for the case where Sent never fires but BinaryResponse arrives — with the new sentReceived guard this would silently hang until the outer timeout. This is a pre-existing limitation and not introduced by this PR, but worth a follow-up test.


Verdict

Approve. The fix is correct, the c.once suggestion from the prior review has been incorporated, the comments explain the rationale clearly, and the regression test covers the failure mode directly. No blocking issues.

@Yeraze Yeraze merged commit 3a03571 into main Jun 24, 2026
20 checks passed
@Yeraze Yeraze deleted the claude/great-dijkstra-y6fp31 branch June 24, 2026 11:32
Yeraze pushed a commit that referenced this pull request Jun 25, 2026
…ponse tag is unmatched (#3734)

The #3702 fix assumed sendToRadioFrame fires Sent with no expectedAckCrc
(tag = null → skip matching). In practice the firmware DOES populate
expectedAckCrc (CRC of the outgoing ANON_REQ packet), but the repeater's
BinaryResponse (0x8C) carries tag = 0 — a different scheme than SendBinaryReq
(50), which explicitly echoes the CRC. With tag = non-zero CRC and
BinaryResponse.tag = 0, the old guard:

  if (tag !== null && tag !== 0 && r?.tag !== tag) return;

evaluates to true and silently drops every response, causing the 15-second
inner timeout and "No regions found" for every user attempt (#3734).

Fix: remove the tag-matching guard entirely. CMD_SEND_ANON_REQ (57) does
not use CRC-echoing tag correlation; operations are serialized by
runExclusiveRadioOp so the first BinaryResponse after sentReceived is
guaranteed to be ours. The tag variable is also removed since it is no
longer used.

Updates: correct the stale runExclusiveRadioOp comment that claimed
sendToRadioFrame fires Sent with no expectedAckCrc (it does).

Test: add a regression test simulating the exact #3734 scenario —
expectedAckCrc = 0xcafe1234 in Sent, tag = 0 in BinaryResponse — and
verify the regions are resolved rather than timing out.

Fixes #3734

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014DhSciCtki5JMrpUrKxwnj
Yeraze added a commit that referenced this pull request Jun 25, 2026
…x region discovery timeout (#3719, #3734) (#3736)

* fix: prevent messages from vanishing in multi-source unified feed (#3719)

Two compounding bugs caused messages to disappear from the unified channel
view ~60 seconds after arrival when multiple sources are connected.

1. fetchLimit starvation: the per-source window was limit×2 = 200 rows.
   A high-traffic MQTT bridge (e.g. a continental Canadaverse bridge at
   ~3 msg/s) pushes a message exclusive to that source beyond position 200
   in ~60 s, making it invisible to the unified endpoint on the next poll.
   Raised to limit×5 = 500, giving ~167 s of headroom at 3 msg/s — well
   above the 10 s poll interval in all realistic deployments.

2. _dbchan/_radio suffix breaks dedup key: meshtasticManager inserts
   server-decrypted copies of a message with IDs ending in `_dbchan` or
   `_radio`. extractPacketIdFromRowId returned null for these because the
   last underscore-segment was not numeric, so those copies got a
   text+timestamp dedupKey instead of the correct packetId-based key.
   They appeared as duplicate entries or were sorted separately rather than
   merging with the originals receptions array. Fixed by stripping known
   server-copy suffixes before extracting the numeric packet ID.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014DhSciCtki5JMrpUrKxwnj

* fix(meshcore): drop tag guard in request_regions — ANON_REQ BinaryResponse tag is unmatched (#3734)

The #3702 fix assumed sendToRadioFrame fires Sent with no expectedAckCrc
(tag = null → skip matching). In practice the firmware DOES populate
expectedAckCrc (CRC of the outgoing ANON_REQ packet), but the repeater's
BinaryResponse (0x8C) carries tag = 0 — a different scheme than SendBinaryReq
(50), which explicitly echoes the CRC. With tag = non-zero CRC and
BinaryResponse.tag = 0, the old guard:

  if (tag !== null && tag !== 0 && r?.tag !== tag) return;

evaluates to true and silently drops every response, causing the 15-second
inner timeout and "No regions found" for every user attempt (#3734).

Fix: remove the tag-matching guard entirely. CMD_SEND_ANON_REQ (57) does
not use CRC-echoing tag correlation; operations are serialized by
runExclusiveRadioOp so the first BinaryResponse after sentReceived is
guaranteed to be ours. The tag variable is also removed since it is no
longer used.

Updates: correct the stale runExclusiveRadioOp comment that claimed
sendToRadioFrame fires Sent with no expectedAckCrc (it does).

Test: add a regression test simulating the exact #3734 scenario —
expectedAckCrc = 0xcafe1234 in Sent, tag = 0 in BinaryResponse — and
verify the regions are resolved rather than timing out.

Fixes #3734

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014DhSciCtki5JMrpUrKxwnj

* revert fetchLimit to 2x — client append-only feed (#3738) supersedes the window-widening

The limit*5 widening in this PR was a mitigation for the #3719 disappearance:
it kept a single-source-exclusive message inside the server window for ~167s.
The client-side append-only accumulator merged in #3738 now makes a displayed
message persist regardless of how the server window moves, at any traffic rate,
so the extra fetch headroom is redundant and only adds per-poll cost. Restore
the original 2x (same-packet dedup headroom). The _dbchan/_radio dedup-key fix
in this PR is a separate correctness bug and is kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SJPe6J5vKrcbwzt6vCdtrt

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] MeshCore: Discover regions from repeater not working

2 participants