Skip to content

Mesh-wide, coordinated zero-hop 'virtual node' broadcast with multi-mode & multi-frequency capability#7183

Draft
erayd wants to merge 1 commit into
meshtastic:masterfrom
erayd:modeswitch
Draft

Mesh-wide, coordinated zero-hop 'virtual node' broadcast with multi-mode & multi-frequency capability#7183
erayd wants to merge 1 commit into
meshtastic:masterfrom
erayd:modeswitch

Conversation

@erayd

@erayd erayd commented Jul 1, 2025

Copy link
Copy Markdown
Contributor

I have recently created a firmware feature that is intended to help new users on our mesh, and wanted to check if there's any interest in having this (or some subset of it) merged upstream - is there? If so, I'll tidy it up into a more general form suitable for merging (e.g. no static config). If not, I'll just maintain it as a separate patch. It's currently installed on a number of our high sites on a 2.6.11 base for testing.

Please note that the code in this PR as it exists currently is not intended to be directly merged; rather it's a starting point for testing & discussion about what aspects of this work, if any, are desirable upstream.

The Feature:

  • Nodes with this firmware advertise a second, 'virtual node', with a common node ID (i.e. all sites share the same secondary ID).
  • This virtual node is advertised on both the configured primary channel and on LF20.
  • If the node has a channel named "Tips" configured, any incoming text messages on that channel will be re-originated using the virtual node as the origin.
  • All re-originated packets are sent with a hop limit of zero.
  • If the message contains a radio setting prefix (e.g. #SF20 for SHORT_FAST slot 20), the node will reconfigure the LoRa settings & channel name before re-originating the packet. It will then immediately switch back to its usual config.

The Reasons:

  • To advise new users in our area that they need to change their radio settings to SF16, where the bulk of our mesh lives, and leverage our existing infrastructure to ensure that message is received across our coverage footprint.
  • To provide a different set of tips to users on LF20 vs users on our main SF16 mesh.
  • To quickly announce or test region-wide to other radio modes than our usual one.
  • To prevent tips from leaking outside of the radio mode & geographic areas they are intended for.
  • To allow the ability for multiple users to send messages using the "Tips Robot" identity, without needing to share a single device.

Sending messages as zero-hop prevents them from leaking - either to other neighbouring meshes, or to another radio mode across our SF/LF bridges. It means we can catch new users, and users from out of town, without spamming the rest of the mesh about stuff that they already know.

Is there any appetite to have this merged as a mainline feature, either in part or in full? Based on the earlier discord discussion, it seems like at least the mode switching part may be independently useful.

See also the discord thread here

Not currently implemented in this PR, but could be enabled by the mode-switch stuff: there's the potential to build an "I am over here" feature on top of it. So that nodes could e.g. every few hours briefly jump over to the default LONG_FAST slot, and send a "I am using these settings" packet, and then jump back. Would be a different, 'discovery' packet type, not a normal nodeinfo or text message. Upshot would be that, assuming it's enabled by default, new users could then automatically discover nodes in their area even if those are using different modem settings, and would know where to go without any central coordination needed to send them there.

🤝 Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

Please note that this PR requires the following protobuf changes in order to work:

diff --git a/meshtastic/mesh.proto b/meshtastic/mesh.proto
index 03162d8..ec54c99 100644
--- a/meshtastic/mesh.proto
+++ b/meshtastic/mesh.proto
@@ -1393,6 +1393,21 @@ message MeshPacket {
    * Set by the firmware internally, clients are not supposed to set this.
    */
   uint32 tx_after = 20;
+
+  /*
+   * The modem preset to use fo rthis packet
+   */
+  uint32 modem_preset = 21;
+
+  /*
+   * The frequency slot to use for this packet
+   */
+  uint32 frequency_slot = 22;
+
+  /*
+   * Whether the packet has a nonstandard radio config
+   */
+  bool nonstandard_radio_config = 23;
 }

 /*

Note that this commit has details hardcoded for the Wellington (NZ)
mesh, and also requires the following patch to the protobufs:

-----
diff --git a/meshtastic/mesh.proto b/meshtastic/mesh.proto
index 03162d8..ec54c99 100644
--- a/meshtastic/mesh.proto
+++ b/meshtastic/mesh.proto
@@ -1393,6 +1393,21 @@ message MeshPacket {
    * Set by the firmware internally, clients are not supposed to set this.
    */
   uint32 tx_after = 20;
+
+  /*
+   * The modem preset to use fo rthis packet
+   */
+  uint32 modem_preset = 21;
+
+  /*
+   * The frequency slot to use for this packet
+   */
+  uint32 frequency_slot = 22;
+
+  /*
+   * Whether the packet has a nonstandard radio config
+   */
+  bool nonstandard_radio_config = 23;
 }

 /*
-----
@erayd

erayd commented Jul 1, 2025

Copy link
Copy Markdown
Contributor Author

If this were generalised for merge, I can see it logically splitting into the following:

Preset / Frequency / Channel Switching

Integrated into the core, mostly in RadioInterface.cpp. Handles switching the radio config & channel name / key as needed, and deals with any queuing complications that may result from some packets being intended for a different mode.

May require some changes to the queuing behaviour for optimal performance (as opposed to my current version, which just treats any queuing delay as acceptable).

Virtual Node

Independent module, with separate protobuf for config. Config options would be:

  • Whether the feature is enabled (default=no)
  • List of (modem config, channel name, channel key, TX interval) vectors on which NodeInfo should be sent.
  • Local channel number from which inbound text messages should be re-originated
  • Virtual node short name
  • Virtual node long name
  • Virtual node public key (virtual node ID to be automatically derived from this)

Should there be a configurable hop limit for this (perhaps in the vector list)? Or should it be forced to always zero?

Config Beacon

Independent module, with a new app type & associated protobuf. At a configured interval, would switch to the default LONG_FAST slot and broadcast a zero-hop packet containing the following. After transmitting, the node would then return immediately to its normal radio settings.

  • Version [2 bits] // beacon packet version, increment on non-backwards-compatible format changes
  • Reserved [2 bits] // in case we need them later, should default to 0
  • Role hint: [2 bits]
    • MIGHT forward traffic (e.g. CLIENT type roles); or
    • WILL forward traffic (e.g. REPEATER, ROUTER, ROUTER_LATE); or
    • WILL NOT forward traffic (always this if rebroadcast mode is set to NONE)
  • Forwarding hint: [2 bits]
    • Will forward ALL traffic (rebroadcast mode ALL or ALL_SKIP_DECODING); or
    • Will forward only CORE traffic (rebroadcast mode CORE_PORTNUMS_ONLY); or
    • Will forward only KNOWN traffic (rebroadcast mode LOCAL_ONLY or KNOWN ONLY); or
    • Will forward NO traffic
  • Modem settings [24 bits (F=24 + BW=3 + SF=3 + CR=2)]
    • Frequency: uint24 (kHz)
    • Bandwidth: uint3 enum
    • Spreading factor: uint3 enum
    • Coding Rate: uint2 (offset so 0=5)
  • Node ID [32 bits]
  • Primary channel hash (allows distinguishing channels with the same name, but different keys) [8 bits]
  • Primary channel name [96 bits]

Without overhead, the example packet above would fit into 21 bytes. I'm unsure how much overhead protobuf encoding would incur for the above - I'm not familiar enough with its wire format - but for this kind of payload raw encoding would be trivial if that encoding overhead is a concern (is it?). IMO this is small enough that with a sensible interval, it could reasonably scale to a very high numer of nodes, even in a dense event mesh - especially as it's a zero-hop packet. Keeping it small does matter, because we don't have a good picture of what the default channel util looks like (because we may be sending beacons to a modem config that we aren't normally resident on), and therefore cannot auto-throttle.

Given that these are all fixed-length fields, backwards-compatibility can be maintained for raw encoding without a version bump in the event of future field changes by simply appending any new field data to the end of the existing packet (or using the reserved bits).

Assuming it were enabled by default, which IMO would be a good idea, this would allow new users to automatically discover which settings are in use by other nearby nodes, without any separate coordination mechanism being required. This would require app support, but could allow a simple autoconfiguration mechanism that presents a list like the following, and allows users to tap one to join that particular mesh. If the hash of the tapped channel indicates a non-default key, the app could also prompt the user to enter the key.

  • LongFast (919.875MHz 250kHz SF11 CR5) - 4 nodes // default meshtastic config
  • ShortFast (915.000MHz 250kHz SF7 CR5) - 19 nodes // default SHORT_FAST preset on a nonstandard frequency
  • PrivateMesh (915.000MHz 250kHz SF7 CR5) - 2 nodes // private channel piggybacking on the above SF infrastructure
  • JoesSensors (925.375MHz 250kHz SF9 CR5) - 106 nodes // Joe has put a GPS tracker on every cow in his herd

Sending only the info above allows these packets to remain very small (and therefore imposing minimal airtime overhead on the default out-of-the-box LONG_FAST setup), and sending them zero-hop ensures that the user is presented with only the nodes that are legitimately within direct range.

Including the device role & rebroadcast hints means that the user can also see at a glance whether there is a nearby device that will forward traffic. In the example above, the user might see that the ShortFast item that is displayed includes three nodes that will forward traffic (one router, two clients), but JoesSensors does not have any.

@GUVWAF GUVWAF left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea for sure. I left two comments.

#if !MESHTASTIC_EXCLUDE_TIPS
} else if (MeshTipsModule::configureRadioForPacket(this, txp)) {
// We just switched radio config, so wait to ensure the new channel is available
setTransmitDelay();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine for the "Tips" broadcast, but during the switching time and further delays from e.g. CSMA/CA, you'll be on the non-default frequency and hence you cannot receive from the default frequency. Have you measured how long the switching takes such that we can optimize the delay?

Furthermore, during this time packets in the queue can get reordered or canceled, so you might be switching around without actually transmitting on the other frequency. I think in this period we should avoid changing the packet in front of the queue (except for canceling it).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During... you'll be on the non-default frequency & cannot receive.

Yes. This is an expected tradeoff of using the feature - the radio can only do one thing at a time.

Have you measured how long the switching takes such that we can optimize the delay?

No formal measurement. However, the delay is quite small (milliseconds), and IMO reasonable to consider part of the tradeoff. Would formal measurements be useful? Happy to spend some time benchmarking radio times if so.

I think in this period we should avoid changing the packet in front of the queue (except for canceling it).

This is an extremely good point, and I agree - thanks for the suggestion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

he delay is quite small (milliseconds), and IMO reasonable to consider part of the tradeoff

Milliseconds is fine indeed, provided people are at least aware of this tradeoff.
If it's really so small, I'm wondering if adding a normal delay() here would be better then to avoid any changes to the queue in the meantime?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - I don't have any strong opinions about which mechanism is used, so long as there is enough time for it to listen and check whether the channel is busy before it starts trying to transmit.

meshtastic_MeshPacket *p_LF20 = packetPool.allocCopy(*p);
service->sendToMesh(p, RX_SRC_LOCAL, false);

p_LF20->frequency_slot = 20;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this and the modem preset on the MeshPacket will be rather costly as it's stored everywhere. I think it's better to only mark it as "non-standard" and then fetch the config from this module. If you want to support multiple configs, we could make it an enum, or maybe extend channel beyond the 8 logical channels.

@erayd erayd Jul 4, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing it in config is not IMO flexible enough; it's designed to support any arbitrary preset+slot combo that the message on the tips channel indicates. However, the per-packet overhead could be minimised by storing a lookup table for non-standard packets (instead of additional attributes on the MeshPacket protobuf), thus ensuring that the overhead exists only for those packets which have a nonstandard egress mode. Would this be a suitable compromise?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not following, how would the lookup table method work?

However, I now understand better how this is intended. You would use this to instruct another node to rebroadcast it on the modem preset/frequency slot you indicate, right? I thought it was for instructing it from your phone as handleReceived() is also called for packets coming from the phone, but only for DMs currently:

handleReceived(p, src);

@erayd erayd Jul 5, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not following, how would the lookup table method work?

If (p->nonstandard_radio_config) {
    getPresetFromLookupTable();
    getFrequencySlotFromLookupTable();
    doTheThing();
}

The lookup table would just be an array of the same size as the TX queue max length, containing struct {packet_from, packet_id, preset, slot}. It means not incurring the memory overhead for every packet needed to store a preset & slot, but very slightly increased CPU usage (only for nonstandard packets) due to doing an up-to-16-item iteration through the array to find the right settings.

You would use this to instruct another node to rebroadcast it on the modem preset/frequency slot you indicate, right?

Correct. The point is to simultaneously instruct all our high sites to do this, so that we can hit our entire coverage footprint with a zero-hop packet on an arbitrary preset / slot.

Currently, we are using it to periodically send the following message on the LONG_FAST default slot (20):

"Did you know that most of our mesh uses a different radio setting to yours? For best connectivity, set your modem preset to SHORT_FAST, frequency slot 16, channel name ShortFast."

Zero-hop means it cannot leak outside our coverage footprint, and will not result in any other nodes attempting to rebroadcast it. It also guarantees that anybody receiving it is almost certainly within range of one of our SF16 high sites. The only situation where they would not be, but would still receive that message, is if they are at the end of a very borderline signal path where a LONG_FAST signal can make it through, but a SHORT_FAST one can't.

The packet path goes:

  • $MY_CLIENT -> (SF16, Tips channel); then
  • $NEAREST_HIGH_SITE -> (SF16, Tips channel); then
  • $ALL_TIPS-CHANNEL_HIGH_SITES -> (LF20, LongFast channel, 0-hop); then
  • $ALL_LF20_NODES_IN_RANGE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, sounds good. Considering that this is rather uncommon to be used, I would go for an approach like this.
While the encoded protobuf will not contain unused fields, the decoded (i.e. as the MeshPacket struct) version, which is used in several queues in the firmware, will need to allocate memory for the fields.

@erayd

erayd commented Jul 15, 2025

Copy link
Copy Markdown
Contributor Author

Definitely seems like a significant lack of feedback here, other than from @GUVWAF.

Is there any objection if I clean this up and generalise it for proper inclusion in the firmware? Or is it preferred for me to maintain this as an independent patchset for just our local mesh?

@nullrouten0

nullrouten0 commented Jul 16, 2025

Copy link
Copy Markdown

This is fascinating and interesting to me, I like it.

Curious why a virtual node . If using the proper ID then recipients can tell where they received the message from. Does it make the packets appear identical so they aren’t duplicates?

We have 3 presets in the Bay Area and I could see us using this feature.

I’m also pleased to see switching of presets and frequencies without a reboot, which has always frustrated me

@erayd

erayd commented Jul 16, 2025

Copy link
Copy Markdown
Contributor Author

Curious why a virtual node . If using the proper ID then recipients can tell where they received the message from.

Three reasons:

  1. So that more than one user can send messages using that identity; and
  2. Because that way we only need to advertise nodeinfo for one node on a different frequency, rather than advertising many there; and
  3. Because by its very nature it cannot receive replies (due to listening on a different frequency / modem preset), and therefore it sends nodeinfo packets with the 'unmessageable' flag set to true. This avoids the misleading situation where a message may be received from a node identity that looks messageable (i.e. a normal personal node), but isn't.

Does it make the packets appear identical so they aren’t duplicates?

Yes. When a packet is broadcast using this mechanism, it is identical from every site that sends it.

@NomDeTom

Copy link
Copy Markdown
Collaborator

@erayd have you had chance to progress with this at all?

@erayd

erayd commented Jul 27, 2025

Copy link
Copy Markdown
Contributor Author

@erayd have you had chance to progress with this at all?

I'm waiting for feedback currently. It's still unclear to me whether the devs responsible for making the call actually want this functionality, and I don't want to sink a significant amount of work into generalising this feature for merge if it will never be accepted - hence why I'm asking that question here first. @GUVWAF's comments are useful from a technical standpoint, but don't answer that core query.

If there's interest in accepting this feature set (or parts of it), then I'll go ahead and implement it properly. Comments on it generally have been limited in number, but positive.

@NomDeTom

Copy link
Copy Markdown
Collaborator

I raised this again in the discord (you can see what was written).

I think there's appetite for this, or at least parts of it, but it's a big PR as it stands, and each step requires some design choices. As I see it (and bear in mind I'm not a dev for this, and don't know all of the implications) it is made up of the following discrete parts:

  1. frequency shifting without reboot - has wider application, but needs work on message queue on changeover
  2. beacon messaging (motd) on a timescale as a function - needs tie-ins with the client apps, and some thought on how to prevent abuse and spam.
  3. remote control with the announcements on a PSK channel (which feels like the old Admin channel, tbh, and may come under the same criticism).

Have I got that right?

@erayd

erayd commented Jul 29, 2025

Copy link
Copy Markdown
Contributor Author

I raised this again in the discord (you can see what was written).

I can't find the relevant post sorry - if you were wanting a response, could you link it please?

Have I got that right?

Not quite. The logical parts are:

  1. Frequency switching on a per-packet basis (fast, no reboot needed, has queueing implications)
  2. Virtual node (sends nodeinfo, re-originates messages received via a defined PSK channel)
  3. Fixed LONG_FAST beacon to advertise modem settings (to allow auto-discovery by users w/ default settings)

1 & 2 are implemented currently, and has been running on top of 2.6.11 on a number of our high sites for several weeks now. However, the current implementation is not suitable for merge upstream. If the interest is there, then I'm happy to do the work to implement it as a more general feature that can be upstreamed.

3 is not yet implemented, but I intend to implement it for upstreaming if (1) is accepted. Will require coordination with the client apps to be useful. I don't see it as a spam vector, as we can enforce a long interval on it, and it doesn't contain any user-configurable text (these packets need to be tiny in order to minimise airtime).

@garthvh

garthvh commented Jul 29, 2025

Copy link
Copy Markdown
Member

It is pretty complex so hard to just drop gut feedback, we often wait for @GUVWAF to go first on routing related stuff. I added @oseiler2 as a reviewer as there are potentially security issues, and there is a proposal from @geeksville that is somewhat related #7440

@Jorropo would be interested in your thoughts as well.

Probably hard for most of the core team to really dig in until after defcon and the 2.7 beta release.

@NomDeTom

NomDeTom commented Aug 4, 2025

Copy link
Copy Markdown
Collaborator

I agree with @garthvh that this is very similar to #7440

@NomDeTom

Copy link
Copy Markdown
Collaborator

Hey, @erayd how's this coming along? Have you had chance to look at where to draw the split lines?

@erayd

erayd commented Sep 14, 2025

Copy link
Copy Markdown
Contributor Author

Hey, @erayd how's this coming along? Have you had chance to look at where to draw the split lines?

I'm still waiting for feedback - @garthvh indicated above that this would likely not be forthcoming until some point after defcon / 2.7 beta, so I'm in something of a holding pattern until I get a clear indication whether this kind of thing actually stands a chance of being merged.

Re the split lines - yes; I'm thinking the logical divisions are as per my first comment (so frequency switching, config beacon, and virtual node).

I've been working on a different feature in the meantime to address transport reliability - should have a (quite large unfortunately) PR for that this week sometime.

@erayd

erayd commented Oct 2, 2025

Copy link
Copy Markdown
Contributor Author

Still all silent on the feedback front, but I've had enough people say that they want the radio switching to warrant me doing at least that part of it regardless.

I will implement the radio switching stuff independently as soon as I have the replay thing out the door - so likely in a couple of weeks for now. Will drop progress notes in this PR until I have a separate one ready to go for that.

@korbinianbauer

Copy link
Copy Markdown
Contributor

I personally have no use for the virtual node concept. Or more accurately: The complexity of it and the amount of things that couldn't be taken for granted anymore feels overwhelming, to be honest.

However I'm convinced being able to change the radio settings (not only frequency, but also bandwidth and SF) on the fly is a great feature by itself.
It's also a prerequisite for some other good ideas that are already around.

@NomDeTom

NomDeTom commented Dec 9, 2025

Copy link
Copy Markdown
Collaborator

@thebentern does this need a 2.8 tag?

@NomDeTom NomDeTom moved this from Done to In Progress in @NomDeTom's proper routing Dec 12, 2025
@github-actions github-actions Bot removed the Stale Issues that will be closed if not triaged. label Dec 14, 2025
@github-actions github-actions Bot added the Stale Issues that will be closed if not triaged. label Jan 31, 2026
@github-actions github-actions Bot closed this Feb 10, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in @NomDeTom's proper routing Feb 10, 2026
@erayd

erayd commented Feb 10, 2026

Copy link
Copy Markdown
Contributor Author

Please reopen. This stuff is sleeping until I finish the replay feature, not dead.

@NomDeTom This appears to have been bot-marked as done in your list for some reason, and it's clearly not done...

@Komzpa

Komzpa commented May 17, 2026

Copy link
Copy Markdown
Contributor

I traced the broad CI failures to two issues:

  • the branch used MeshPacket fields that are not present in the generated protobufs: nonstandard_radio_config, modem_preset, and frequency_slot
  • the current native compiler also rejects one test fixture because 7 * 1e7 / 3 * 1e7 narrows from double to int32_t

I opened a repair PR against the source branch here:

Local verification on the repair branch:

clang-format --style=file:.trunk/configs/.clang-format --dry-run --Werror src/mesh/RadioLibInterface.cpp src/modules/MeshTipsModule.cpp src/modules/MeshTipsModule.h test/test_mqtt/MQTT.cpp
git diff --check
platformio run -e coverage
platformio test -e coverage -v --junit-output-path testreport.xml

A representative heltec-v3 build could not get past local PlatformIO toolchain mirror downloads for espressif/toolchain-riscv32-esp before compilation, so I left the full embedded matrix to GitHub CI once the source branch is updated.

@NomDeTom NomDeTom mentioned this pull request Jun 3, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Issues that will be closed if not triaged. triaged Reviewed by the team, has enough information and ready to work on now.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

8 participants