Skip to content

feat(meshcore): query only 0-hop repeaters for region discovery + route the query direct (#3743)#3765

Merged
Yeraze merged 5 commits into
mainfrom
feat/meshcore-zerohop-region-discovery
Jun 25, 2026
Merged

feat(meshcore): query only 0-hop repeaters for region discovery + route the query direct (#3743)#3765
Yeraze merged 5 commits into
mainfrom
feat/meshcore-zerohop-region-discovery

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #3743. Makes MeshCore region discovery actually return regions, in two parts:

  1. Query only current 0-hop repeaters (the original [FEAT] MeshCore: Improve region discovery by only querying current zero-hop repeaters #3743 ask): run a 0-hop discovery sweep first and query only the repeaters/room-servers that answered, in arrival order — instead of querying every repeater ever seen (each at a 20s timeout). Cuts a worst case from N × 20s (e.g. 62 known repeaters → ~21 min) to a single ~8s sweep plus quick replies from nearby repeaters.
  2. Route the regions request directly so the repeater actually answers it (the part that made "Discover Regions" return empty even for a correctly-configured repeater).

Root cause (firmware-verified against repeater-v1.15.0)

A MeshCore repeater answers a regions ANON_REQ only when it arrives via a direct routesimple_repeater's onAnonDataRecv gates the regions branch on packet->isRouteDirect() and silently drops flooded ones. (Login is the lone flood-exception, which is why remote admin/CLI worked but Discover Regions didn't.) The companion floods whenever the contact has no installed out_path (sendAnonReq: out_path_len == OUT_PATH_UNKNOWN). MeshMonitor's discover_path only produces the diagnostic path (push 0x8D) — it never installs the routing out_path — so the request kept flooding and timing out.

What changed

  • discoverRegions() now:
    • Runs a 0-hop sweep (discover_nodes, filter 0x0C = repeater + room-server), retries once if it finds nothing, and reports noZeroHopRepeaters instead of silently querying everyone (addresses @m0urs's retry request on the issue).
    • Queries only the swept repeaters, in arrival order.
    • Installs a zero-hop direct out_path (setContactOutPath) before each request_regions — since a 0-hop sweep only hears direct-range repeaters, each one is a direct neighbour — so the ANON_REQ routes direct and the repeater replies.
  • setContactOutPath() gains an optional timeout; a set_out_path ack timeout is now treated as benign (the CMD_ADD_UPDATE_CONTACT write lands even when meshcore.js loses its Ok ack in post-sweep radio chatter) — logged at debug instead of a misleading "check serial/TCP connection" warning.
  • discoverNodes() now returns the seen public-key set so the sweep result can drive selection.
  • Frontend: useMeshCore + MeshCoreSettingsView surface "no nearby (0-hop) repeaters found" vs "repeaters reported no regions".

Testing

  • New/updated unit tests in meshcoreManager.scope.test.ts (36 pass): 0x0C sweep, query-only-swept-repeaters, arrival order, retry-then-succeed, two-empty-sweeps → noZeroHopRepeaters, and an assertion that a zero-hop direct out_path is installed before querying.
  • Full Vitest suite green (0 failures); tsc clean.
  • Validated live against a real v1.15.0 repeater: Discover Regions now returns its configured region reliably across repeated runs, where it previously always timed out empty.

🤖 Generated with Claude Code

Yeraze and others added 4 commits June 25, 2026 18:06
Run a 0-hop discovery sweep first and query only the repeaters/room-servers
that answered, in arrival order. Retry the sweep once if it finds nothing;
surface 'no 0-hop repeaters found' rather than falling back to querying every
known repeater. Cuts region discovery from ~N x 20s (all known repeaters,
e.g. 62 -> ~21min) to a single ~8s sweep plus quick replies from nearby
repeaters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
v1.15+ repeaters answer a regions ANON_REQ only when it arrives via a DIRECT
route (firmware simple_repeater onAnonDataRecv gates the REGIONS branch on
isRouteDirect(); flooded requests are silently dropped — login is the lone
flood-exception, which is why admin CLI worked but Discover Regions didn't).
The companion floods whenever the contact has no installed out_path. Since the
0-hop discovery sweep only hears direct-range repeaters, each swept repeater is
a direct neighbour, so install a zero-hop direct out_path before querying it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
…covery

The CMD_ADD_UPDATE_CONTACT write lands even when its Ok ack is lost in the
post-sweep radio chatter (meshcore.js resolves on Ok, so it reports a timeout
while the route is actually installed). Use a short 3s window and treat a missed
ack as non-fatal — the request_regions reply is the real success signal — so a
working install no longer waits 12s or logs a scary 'may flood' warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
A set_out_path timeout means meshcore.js didn't see the CMD_ADD_UPDATE_CONTACT
Ok ack, not that the device is unreachable — the write applies regardless. Log
it at debug instead of a misleading 'check serial/TCP connection' warning;
genuine rejections still warn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: feat(meshcore): query only 0-hop repeaters for region discovery + route the query direct (#3743)

Tasks

  • Gather context and understand the request
  • Read changed files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Check security concerns
  • Assess test coverage
  • Post final review feedback

Overall Assessment

This is a well-executed fix for a subtle firmware-level routing issue. The root cause analysis is thorough, the fix is well-targeted, and the test coverage is comprehensive. The PR is in good shape to merge. I have a few observations, mostly minor:


Core Logic (meshcoreManager.ts)

discoverRegions() — Good

The new sweep-then-query flow is clean and the retry logic is appropriately conservative (one retry, not unbounded). Key observations:

  • sweepZeroHopRepeaters inner function (meshcoreManager.ts:2618): The function cross-references seen keys against this.contacts to resolve metadata. This is correct because discoverNodes only returns public keys. However, there is a subtle assumption: if a repeater responds to the sweep but isn't yet in this.contacts (e.g., it was brand new), it would be silently skipped. This seems acceptable given how the contact map is populated on discovery events, but worth noting.

  • setContactOutPath call with new Uint8Array(0) and a 3s timeout (meshcoreManager.ts:2661): The 3s timeout is intentionally short since the write is best-effort and the PR correctly explains that the Ok ack may be lost in radio chatter. The fallthrough behavior (logging at debug then proceeding to request_regions anyway) is correct — the real success signal is whether request_regions gets a reply.

  • Both loops iterate repeaters sequentially (meshcoreManager.ts:2645): Sequential loop is the right call here (as noted in the docstring — concurrent request_regions calls risk tag cross-matching). No concern.

discoverNodes() — Good

The addition of seen: [...this.activeDiscovery.seen] to the return value (meshcoreManager.ts:2513) is minimal and non-breaking. Pre-existing callers that destructure { returned, newCount } are unaffected.

setContactOutPath() — Minor concern

The timeout parameter was added correctly with a sensible default. The timeout detection logic at meshcoreManager.ts:2835 uses:

const isTimeout = response.error?.includes('timeout');

This is a string-matching heuristic on an error message from meshcore.js. It will work fine currently, but is fragile if the underlying library changes its error messaging. A comment noting this dependency on the exact error text from meshcore.js would help future maintainers (though this is a minor stylistic point).


Frontend (MeshCoreSettingsView.tsx + useMeshCore.ts)

handleDiscoverRegions (MeshCoreSettingsView.tsx:64): The noZeroHopRepeaters feedback path correctly gates on the server response, and the hint text at line 269 ('Queries known repeaters for the regions they serve. Run "Discover Repeaters" (above) first for the fullest list.') is now slightly misleading since the discover-regions button runs its own internal sweep. A copy update from "Run 'Discover Repeaters' first" to something like "Queries 0-hop (direct range) repeaters for the regions they serve." would better reflect the new behavior, but it's a documentation-level issue, not a code defect.

useMeshCore.ts:815: The noZeroHopRepeaters field is normalized with data.noZeroHopRepeaters === true (strict equality vs. truthy), which is the right defensiveness for an API boundary.


Test Coverage (meshcoreManager.scope.test.ts)

Test coverage is excellent. Specifically:

  • runs a repeater+room-server (0x0C) sweep before querying — verifies the filter bitmask
  • queries only the repeaters returned by the 0-hop sweep, not every known repeater — validates scope narrowing
  • queries in sweep arrival order, not contact-map order — validates the order Map sort
  • retries the sweep once when the first finds no 0-hop repeaters, then queries — validates retry success path
  • returns noZeroHopRepeaters after two empty sweeps, querying nobody — validates double-empty early exit
  • ✅ The setContactOutPath call assertion (meshcoreManager.scope.test.ts:465) confirms an empty Uint8Array is installed before the region query

One gap: there is no test for the case where a repeater is in the sweep seen set but is not in this.contacts (the "new contact not yet in map" edge case mentioned above). That said, this is a low-risk scenario in practice and the existing coverage is already thorough.


Summary

Area Status
Root cause fix (direct routing) ✅ Correct
Sweep → query → retry flow ✅ Clean
Timeout handling (best-effort) ✅ Appropriate
discoverNodes return type extension ✅ Non-breaking
Frontend UX feedback ✅ Good, minor copy note
Test coverage ✅ Comprehensive
Security ✅ No concerns
Performance ✅ Significant improvement (N×20s → ~8s sweep + fast replies)

Recommend merge. The only actionable suggestion is optionally updating the UI hint text in MeshCoreSettingsView.tsx:269 to reflect that the button now runs its own sweep internally.

…out-string dependency

- The UI hint no longer says 'Run Discover Repeaters first' — the button now
  runs its own 0-hop sweep internally.
- Comment the isTimeout string-match so future maintainers know it depends on
  meshcore.js's error wording.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: feat(meshcore): query only 0-hop repeaters for region discovery + route the query direct (#3743)

Tasks

  • Gather context and understand the request
  • Read changed files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Check security concerns
  • Assess test coverage
  • Post final review feedback

Overall Assessment

This is an excellent, well-reasoned fix that solves a subtle firmware routing requirement. The root cause is clearly documented, the implementation is surgical, and the test coverage is thorough. Recommend merge. A few observations below, all minor.


meshcoreManager.ts — Core Logic

discoverRegions() — Clean

The sweep-then-query flow is well-structured. Key observations:

  • Silent contact skip in sweepZeroHopRepeaters (meshcoreManager.ts:2622-2629): A repeater that responds to the 0-hop sweep but is not yet in this.contacts will be silently dropped via the order.has(c.publicKey) filter. In practice this is unlikely because discovery events populate contacts as they arrive, but since the discoverNodes() 8s window runs before the contact map join, there's a race. Worth a brief comment explaining the tradeoff (e.g. "key must already be in contacts — new arrivals during the sweep window may be missed") so future maintainers don't wonder.

  • order.has(c.publicKey) vs. order.get(c.publicKey) !== undefined (meshcoreManager.ts:2627): Both are correct here (no undefined keys in a new Map(...).set(k, i) where i >= 0). This is fine.

  • Sequential region queries (meshcoreManager.ts:2644-2678): Correct, as the PR notes, because concurrent request_regions calls could cross-match reply tags. No concern.

setContactOutPath() — Minor observation

The timeout-detection heuristic at meshcoreManager.ts:2837:

const isTimeout = response.error?.includes('timeout');

This is now documented in-code with a comment noting the meshcore.js dependency — good. The fallback behavior (log at debug and return false, then the caller proceeds to request_regions) is the right call here. The comment added to the function itself accurately describes the tradeoff.

discoverNodes() return type extension (meshcoreManager.ts:2507-2513):

Non-breaking addition. Pre-existing destructuring callers ({ returned, newCount }) are unaffected since seen is an additive field. No concern.


MeshCoreSettingsView.tsx — Frontend

handleDiscoverRegions (MeshCoreSettingsView.tsx:64-87): The three UX outcomes (!result, noZeroHopRepeaters, regions.length === 0) are handled cleanly and map well to the server's three-state response.

Hint text (MeshCoreSettingsView.tsx:269-271): The previous review noted the old hint was misleading ("Run Discover Repeaters first"). The updated text is now accurate:

'Sweeps for nearby (0-hop / direct-range) repeaters and asks each one which regions it serves.'

This is the right wording. ✅

discoveringRegions button disable state (MeshCoreSettingsView.tsx:262): Correctly gates on both discoveringRegions and discovering !== null (prevents running a manual discovery sweep and a region discovery sweep concurrently). No concern.


useMeshCore.ts

noZeroHopRepeaters: data.noZeroHopRepeaters === true (useMeshCore.ts:815): Strict equality at the API boundary is correct and defensive. ✅


Test Coverage (meshcoreManager.scope.test.ts)

The test harness design is clean — the zeroHop option in makeManager supporting both string[] (flat, every sweep gets the same set) and string[][] (per-attempt, drives the retry path) is elegant.

Coverage is comprehensive:

  • ✅ 0x0C bitmask filter
  • ✅ Only swept repeaters queried (not all known contacts)
  • ✅ Arrival order preserved
  • ✅ Retry-then-succeed path
  • ✅ Two empty sweeps → noZeroHopRepeaters
  • setContactOutPath called with empty Uint8Array before each query
  • ✅ Non-companion short-circuit
  • ✅ Timeout/failure fallthrough (skips to next repeater)

One gap (low risk): No test covers a repeater that is present in the seen set from the sweep but absent from this.contacts at resolution time (e.g. brand-new repeater whose contact entry hasn't arrived yet). This is a rare edge in practice, but a comment in the test file noting this omission would help future contributors understand the coverage boundary.


Summary Table

Area Status
Root cause fix (direct routing via set_out_path) ✅ Correct
Sweep → query → retry flow ✅ Clean
Timeout handling (best-effort, debug log) ✅ Appropriate
discoverNodes return type extension ✅ Non-breaking
Frontend UX feedback (3 states) ✅ Correct
Updated hint text ✅ Accurate now
Test coverage ✅ Comprehensive
Security ✅ No concerns
Performance ✅ Significant improvement (N×20s → ~8s sweep + fast replies)

Recommend merge. The observations above are all informational or minor style suggestions — none are blockers.

@Yeraze

Yeraze commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the review! Addressed in 2a2e561b:

  • UI hint text (MeshCoreSettingsView.tsx) — updated to reflect that the button runs its own 0-hop sweep internally, dropping the stale "Run Discover Repeaters first" guidance.
  • isTimeout string-match (meshcoreManager.ts) — added a comment flagging the dependency on meshcore.js's error wording.

On the "repeater in seen but not in this.contacts" edge case: leaving as-is intentionally. request_regions needs the contact's metadata (name) and setContactOutPath needs it in the device contact list, so a key we have no contact record for can't be queried meaningfully yet. In practice the discovery push populates this.contacts before the sweep window closes, and a node that's still missing will be picked up on the next run — preferable to surfacing a half-populated entry.

@Yeraze Yeraze merged commit 5d07652 into main Jun 25, 2026
20 checks passed
@Yeraze Yeraze deleted the feat/meshcore-zerohop-region-discovery branch June 26, 2026 00:04
Yeraze added a commit that referenced this pull request Jun 27, 2026
… OWNER (#3820)

"Discover Repeaters" left repeaters as "Unknown" because NODE_DISCOVER_RESP
carries no name and a zero-hop repeater may not advert for a long time
(observed >30 min on live hardware), so the advert-refresh path (prior commit)
could wait indefinitely.

Firmware (ripplebiz/MeshCore simple_repeater `handleAnonOwnerReq`) answers an
UNAUTHENTICATED ANON_REQ OWNER (CMD_SEND_ANON_REQ sub-type 0x02) with
"node_name\nowner_info" — the same anon-request transport, and the same
isRouteDirect() gating, as the region discovery shipped in #3743/#3765. So we
can pull a discovered repeater's real name without admin login.

- meshcoreNativeBackend: new `request_owner` bridge command — builds the
  [57][pubkey][0x02][0x00] frame, awaits the 0x8C BinaryResponse (serialized via
  runExclusiveRadioOp, mirroring request_regions), parses clock + first line of
  "node_name\nowner_info".
- meshcoreManager: new `fetchOwnerName()` — installs a zero-hop direct out_path
  (firmware drops flooded OWNER reqs), issues request_owner, writes the name onto
  the contact and broadcasts it. discoverNodes() gains an opt-in `fetchNames`
  flag that fetches names for nameless repeaters/room-servers after the sweep.
- meshcoreRoutes: the user-facing POST /discover opts in (fetchNames=true); the
  internal region-discovery sweep does not (it has its own per-repeater pass).

Verified live: delete "Yeraze Repeater" -> Discover Repeaters -> name populates
as "Yeraze Repeater" within the discovery window (~12 s), no admin login, no
advert required. The advert-refresh (Option A) remains as a zero-airtime
complement for repeaters that do advert.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
Yeraze added a commit that referenced this pull request Jun 27, 2026
… no admin login (#3820) (#3825)

* fix(meshcore): refresh known-but-nameless contacts so discovered repeaters get their name (#3820)

A repeater added via "Discover Repeaters" stays "Unknown" in the node list
until an unrelated refresh (e.g. opening its admin panel) happens to run —
even after it sends a zero-hop advert.

Root cause (firmware-verified): NODE_DISCOVER_RESP carries no name, so
discovery pre-creates a *nameless* device contact. The repeater's later
zero-hop advert DOES carry the name and the firmware DOES store it on the
device — but because the contact already exists, the firmware pushes a
pubkey-only 0x80 advert (not the full 0x8A new-advert record), so MeshMonitor's
contact_advertised event arrives with no adv_name. The device-record re-read
(schedulePathRefresh -> get_contacts) was gated behind `if (!wasKnown)`, so it
never ran for the already-known stub, leaving it "Unknown".

Fix: move the missing-name/type refresh trigger out of the `!wasKnown` branch so
it also fires for known-but-nameless contacts. Zero airtime — firmware drops
nameless adverts, so every contact_advertised event means the device just
stored a real name; the refresh is a local debounced get_contacts read and stops
re-firing once the name is pulled. #3756's `||`-not-`??` guard can only protect
a name already held; it cannot acquire one never pulled, so it was insufficient
for the fresh-discovery case.

Plan/analysis: docs/internal/dev-notes/MESHCORE_UNNAMED_NODE_3820_PLAN.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

* feat(meshcore): actively fetch discovered repeater names via ANON_REQ OWNER (#3820)

"Discover Repeaters" left repeaters as "Unknown" because NODE_DISCOVER_RESP
carries no name and a zero-hop repeater may not advert for a long time
(observed >30 min on live hardware), so the advert-refresh path (prior commit)
could wait indefinitely.

Firmware (ripplebiz/MeshCore simple_repeater `handleAnonOwnerReq`) answers an
UNAUTHENTICATED ANON_REQ OWNER (CMD_SEND_ANON_REQ sub-type 0x02) with
"node_name\nowner_info" — the same anon-request transport, and the same
isRouteDirect() gating, as the region discovery shipped in #3743/#3765. So we
can pull a discovered repeater's real name without admin login.

- meshcoreNativeBackend: new `request_owner` bridge command — builds the
  [57][pubkey][0x02][0x00] frame, awaits the 0x8C BinaryResponse (serialized via
  runExclusiveRadioOp, mirroring request_regions), parses clock + first line of
  "node_name\nowner_info".
- meshcoreManager: new `fetchOwnerName()` — installs a zero-hop direct out_path
  (firmware drops flooded OWNER reqs), issues request_owner, writes the name onto
  the contact and broadcasts it. discoverNodes() gains an opt-in `fetchNames`
  flag that fetches names for nameless repeaters/room-servers after the sweep.
- meshcoreRoutes: the user-facing POST /discover opts in (fetchNames=true); the
  internal region-discovery sweep does not (it has its own per-repeater pass).

Verified live: delete "Yeraze Repeater" -> Discover Repeaters -> name populates
as "Yeraze Repeater" within the discovery window (~12 s), no admin login, no
advert required. The advert-refresh (Option A) remains as a zero-airtime
complement for repeaters that do advert.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] MeshCore: Improve region discovery by only querying current zero-hop repeaters

1 participant