Mesh-wide, coordinated zero-hop 'virtual node' broadcast with multi-mode & multi-frequency capability#7183
Mesh-wide, coordinated zero-hop 'virtual node' broadcast with multi-mode & multi-frequency capability#7183erayd wants to merge 1 commit into
Conversation
Note that this commit has details hardcoded for the Wellington (NZ) mesh, and also requires the following patch to the protobufs: ----- diff --git a/meshtastic/mesh.proto b/meshtastic/mesh.proto index 03162d8..ec54c99 100644 --- a/meshtastic/mesh.proto +++ b/meshtastic/mesh.proto @@ -1393,6 +1393,21 @@ message MeshPacket { * Set by the firmware internally, clients are not supposed to set this. */ uint32 tx_after = 20; + + /* + * The modem preset to use fo rthis packet + */ + uint32 modem_preset = 21; + + /* + * The frequency slot to use for this packet + */ + uint32 frequency_slot = 22; + + /* + * Whether the packet has a nonstandard radio config + */ + bool nonstandard_radio_config = 23; } /* -----
|
If this were generalised for merge, I can see it logically splitting into the following: Preset / Frequency / Channel SwitchingIntegrated into the core, mostly in May require some changes to the queuing behaviour for optimal performance (as opposed to my current version, which just treats any queuing delay as acceptable). Virtual NodeIndependent module, with separate protobuf for config. Config options would be:
Should there be a configurable hop limit for this (perhaps in the vector list)? Or should it be forced to always zero? Config BeaconIndependent module, with a new app type & associated protobuf. At a configured interval, would switch to the default LONG_FAST slot and broadcast a zero-hop packet containing the following. After transmitting, the node would then return immediately to its normal radio settings.
Without overhead, the example packet above would fit into 21 bytes. I'm unsure how much overhead protobuf encoding would incur for the above - I'm not familiar enough with its wire format - but for this kind of payload raw encoding would be trivial if that encoding overhead is a concern (is it?). IMO this is small enough that with a sensible interval, it could reasonably scale to a very high numer of nodes, even in a dense event mesh - especially as it's a zero-hop packet. Keeping it small does matter, because we don't have a good picture of what the default channel util looks like (because we may be sending beacons to a modem config that we aren't normally resident on), and therefore cannot auto-throttle. Given that these are all fixed-length fields, backwards-compatibility can be maintained for raw encoding without a version bump in the event of future field changes by simply appending any new field data to the end of the existing packet (or using the reserved bits). Assuming it were enabled by default, which IMO would be a good idea, this would allow new users to automatically discover which settings are in use by other nearby nodes, without any separate coordination mechanism being required. This would require app support, but could allow a simple autoconfiguration mechanism that presents a list like the following, and allows users to tap one to join that particular mesh. If the hash of the tapped channel indicates a non-default key, the app could also prompt the user to enter the key.
Sending only the info above allows these packets to remain very small (and therefore imposing minimal airtime overhead on the default out-of-the-box LONG_FAST setup), and sending them zero-hop ensures that the user is presented with only the nodes that are legitimately within direct range. Including the device role & rebroadcast hints means that the user can also see at a glance whether there is a nearby device that will forward traffic. In the example above, the user might see that the ShortFast item that is displayed includes three nodes that will forward traffic (one router, two clients), but JoesSensors does not have any. |
GUVWAF
left a comment
There was a problem hiding this comment.
Interesting idea for sure. I left two comments.
| #if !MESHTASTIC_EXCLUDE_TIPS | ||
| } else if (MeshTipsModule::configureRadioForPacket(this, txp)) { | ||
| // We just switched radio config, so wait to ensure the new channel is available | ||
| setTransmitDelay(); |
There was a problem hiding this comment.
This is probably fine for the "Tips" broadcast, but during the switching time and further delays from e.g. CSMA/CA, you'll be on the non-default frequency and hence you cannot receive from the default frequency. Have you measured how long the switching takes such that we can optimize the delay?
Furthermore, during this time packets in the queue can get reordered or canceled, so you might be switching around without actually transmitting on the other frequency. I think in this period we should avoid changing the packet in front of the queue (except for canceling it).
There was a problem hiding this comment.
During... you'll be on the non-default frequency & cannot receive.
Yes. This is an expected tradeoff of using the feature - the radio can only do one thing at a time.
Have you measured how long the switching takes such that we can optimize the delay?
No formal measurement. However, the delay is quite small (milliseconds), and IMO reasonable to consider part of the tradeoff. Would formal measurements be useful? Happy to spend some time benchmarking radio times if so.
I think in this period we should avoid changing the packet in front of the queue (except for canceling it).
This is an extremely good point, and I agree - thanks for the suggestion.
There was a problem hiding this comment.
he delay is quite small (milliseconds), and IMO reasonable to consider part of the tradeoff
Milliseconds is fine indeed, provided people are at least aware of this tradeoff.
If it's really so small, I'm wondering if adding a normal delay() here would be better then to avoid any changes to the queue in the meantime?
There was a problem hiding this comment.
Possibly - I don't have any strong opinions about which mechanism is used, so long as there is enough time for it to listen and check whether the channel is busy before it starts trying to transmit.
| meshtastic_MeshPacket *p_LF20 = packetPool.allocCopy(*p); | ||
| service->sendToMesh(p, RX_SRC_LOCAL, false); | ||
|
|
||
| p_LF20->frequency_slot = 20; |
There was a problem hiding this comment.
Adding this and the modem preset on the MeshPacket will be rather costly as it's stored everywhere. I think it's better to only mark it as "non-standard" and then fetch the config from this module. If you want to support multiple configs, we could make it an enum, or maybe extend channel beyond the 8 logical channels.
There was a problem hiding this comment.
Storing it in config is not IMO flexible enough; it's designed to support any arbitrary preset+slot combo that the message on the tips channel indicates. However, the per-packet overhead could be minimised by storing a lookup table for non-standard packets (instead of additional attributes on the MeshPacket protobuf), thus ensuring that the overhead exists only for those packets which have a nonstandard egress mode. Would this be a suitable compromise?
There was a problem hiding this comment.
Sorry, I'm not following, how would the lookup table method work?
However, I now understand better how this is intended. You would use this to instruct another node to rebroadcast it on the modem preset/frequency slot you indicate, right? I thought it was for instructing it from your phone as handleReceived() is also called for packets coming from the phone, but only for DMs currently:
Line 186 in 29893e0
There was a problem hiding this comment.
Sorry, I'm not following, how would the lookup table method work?
If (p->nonstandard_radio_config) {
getPresetFromLookupTable();
getFrequencySlotFromLookupTable();
doTheThing();
}
The lookup table would just be an array of the same size as the TX queue max length, containing struct {packet_from, packet_id, preset, slot}. It means not incurring the memory overhead for every packet needed to store a preset & slot, but very slightly increased CPU usage (only for nonstandard packets) due to doing an up-to-16-item iteration through the array to find the right settings.
You would use this to instruct another node to rebroadcast it on the modem preset/frequency slot you indicate, right?
Correct. The point is to simultaneously instruct all our high sites to do this, so that we can hit our entire coverage footprint with a zero-hop packet on an arbitrary preset / slot.
Currently, we are using it to periodically send the following message on the LONG_FAST default slot (20):
"Did you know that most of our mesh uses a different radio setting to yours? For best connectivity, set your modem preset to SHORT_FAST, frequency slot 16, channel name ShortFast."
Zero-hop means it cannot leak outside our coverage footprint, and will not result in any other nodes attempting to rebroadcast it. It also guarantees that anybody receiving it is almost certainly within range of one of our SF16 high sites. The only situation where they would not be, but would still receive that message, is if they are at the end of a very borderline signal path where a LONG_FAST signal can make it through, but a SHORT_FAST one can't.
The packet path goes:
- $MY_CLIENT -> (SF16, Tips channel); then
- $NEAREST_HIGH_SITE -> (SF16, Tips channel); then
- $ALL_TIPS-CHANNEL_HIGH_SITES -> (LF20, LongFast channel, 0-hop); then
- $ALL_LF20_NODES_IN_RANGE
There was a problem hiding this comment.
Thanks for the explanation, sounds good. Considering that this is rather uncommon to be used, I would go for an approach like this.
While the encoded protobuf will not contain unused fields, the decoded (i.e. as the MeshPacket struct) version, which is used in several queues in the firmware, will need to allocate memory for the fields.
|
Definitely seems like a significant lack of feedback here, other than from @GUVWAF. Is there any objection if I clean this up and generalise it for proper inclusion in the firmware? Or is it preferred for me to maintain this as an independent patchset for just our local mesh? |
|
This is fascinating and interesting to me, I like it. Curious why a virtual node . If using the proper ID then recipients can tell where they received the message from. Does it make the packets appear identical so they aren’t duplicates? We have 3 presets in the Bay Area and I could see us using this feature. I’m also pleased to see switching of presets and frequencies without a reboot, which has always frustrated me |
Three reasons:
Yes. When a packet is broadcast using this mechanism, it is identical from every site that sends it. |
|
@erayd have you had chance to progress with this at all? |
I'm waiting for feedback currently. It's still unclear to me whether the devs responsible for making the call actually want this functionality, and I don't want to sink a significant amount of work into generalising this feature for merge if it will never be accepted - hence why I'm asking that question here first. @GUVWAF's comments are useful from a technical standpoint, but don't answer that core query. If there's interest in accepting this feature set (or parts of it), then I'll go ahead and implement it properly. Comments on it generally have been limited in number, but positive. |
|
I raised this again in the discord (you can see what was written). I think there's appetite for this, or at least parts of it, but it's a big PR as it stands, and each step requires some design choices. As I see it (and bear in mind I'm not a dev for this, and don't know all of the implications) it is made up of the following discrete parts:
Have I got that right? |
I can't find the relevant post sorry - if you were wanting a response, could you link it please?
Not quite. The logical parts are:
1 & 2 are implemented currently, and has been running on top of 2.6.11 on a number of our high sites for several weeks now. However, the current implementation is not suitable for merge upstream. If the interest is there, then I'm happy to do the work to implement it as a more general feature that can be upstreamed. 3 is not yet implemented, but I intend to implement it for upstreaming if (1) is accepted. Will require coordination with the client apps to be useful. I don't see it as a spam vector, as we can enforce a long interval on it, and it doesn't contain any user-configurable text (these packets need to be tiny in order to minimise airtime). |
|
It is pretty complex so hard to just drop gut feedback, we often wait for @GUVWAF to go first on routing related stuff. I added @oseiler2 as a reviewer as there are potentially security issues, and there is a proposal from @geeksville that is somewhat related #7440 @Jorropo would be interested in your thoughts as well. Probably hard for most of the core team to really dig in until after defcon and the 2.7 beta release. |
|
Hey, @erayd how's this coming along? Have you had chance to look at where to draw the split lines? |
I'm still waiting for feedback - @garthvh indicated above that this would likely not be forthcoming until some point after defcon / 2.7 beta, so I'm in something of a holding pattern until I get a clear indication whether this kind of thing actually stands a chance of being merged. Re the split lines - yes; I'm thinking the logical divisions are as per my first comment (so frequency switching, config beacon, and virtual node). I've been working on a different feature in the meantime to address transport reliability - should have a (quite large unfortunately) PR for that this week sometime. |
|
Still all silent on the feedback front, but I've had enough people say that they want the radio switching to warrant me doing at least that part of it regardless. I will implement the radio switching stuff independently as soon as I have the replay thing out the door - so likely in a couple of weeks for now. Will drop progress notes in this PR until I have a separate one ready to go for that. |
|
I personally have no use for the virtual node concept. Or more accurately: The complexity of it and the amount of things that couldn't be taken for granted anymore feels overwhelming, to be honest. However I'm convinced being able to change the radio settings (not only frequency, but also bandwidth and SF) on the fly is a great feature by itself. |
|
@thebentern does this need a 2.8 tag? |
|
Please reopen. This stuff is sleeping until I finish the replay feature, not dead. @NomDeTom This appears to have been bot-marked as done in your list for some reason, and it's clearly not done... |
|
I traced the broad CI failures to two issues:
I opened a repair PR against the source branch here:
Local verification on the repair branch: A representative |
I have recently created a firmware feature that is intended to help new users on our mesh, and wanted to check if there's any interest in having this (or some subset of it) merged upstream - is there? If so, I'll tidy it up into a more general form suitable for merging (e.g. no static config). If not, I'll just maintain it as a separate patch. It's currently installed on a number of our high sites on a 2.6.11 base for testing.
Please note that the code in this PR as it exists currently is not intended to be directly merged; rather it's a starting point for testing & discussion about what aspects of this work, if any, are desirable upstream.
The Feature:
The Reasons:
Sending messages as zero-hop prevents them from leaking - either to other neighbouring meshes, or to another radio mode across our SF/LF bridges. It means we can catch new users, and users from out of town, without spamming the rest of the mesh about stuff that they already know.
Is there any appetite to have this merged as a mainline feature, either in part or in full? Based on the earlier discord discussion, it seems like at least the mode switching part may be independently useful.
See also the discord thread here
Not currently implemented in this PR, but could be enabled by the mode-switch stuff: there's the potential to build an "I am over here" feature on top of it. So that nodes could e.g. every few hours briefly jump over to the default LONG_FAST slot, and send a "I am using these settings" packet, and then jump back. Would be a different, 'discovery' packet type, not a normal nodeinfo or text message. Upshot would be that, assuming it's enabled by default, new users could then automatically discover nodes in their area even if those are using different modem settings, and would know where to go without any central coordination needed to send them there.
🤝 Attestations
Please note that this PR requires the following protobuf changes in order to work: