Skip to content

Feature Request: alternative compression by replace some 2-byte symbols by 1-byte latin analogs (named Cyr2Lat)#400

Merged
zjs81 merged 11 commits into
zjs81:devfrom
HDDen:dev
May 10, 2026
Merged

Feature Request: alternative compression by replace some 2-byte symbols by 1-byte latin analogs (named Cyr2Lat)#400
zjs81 merged 11 commits into
zjs81:devfrom
HDDen:dev

Conversation

@HDDen

@HDDen HDDen commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Hello! This PR adds an alternative compression mode, which is useful when working with, for example, Cyrillic characters.

The idea is that Cyrillic characters are replaced with similar Latin ones, and compression is achieved through the difference in character size – in UTF-8, Cyrillic characters take up 2 bytes, whilst Latin characters take up 1 byte. In practice, on default dictionary, compression of between 18–25% is achieved without visual differences – the freed-up space can be used either to optimise packet size or to fit a greater amount of information within the limit.

Also, this compression mode is fully back-compatible with regular meshcore users, using any application. There is no distortion of information, in fact.

Editing the substitution dictionary is supported – for example, you can extend support to other languages, or if transform text to full-translateration mode (when cyrillic transliterates to full latin, it increases compression up to 40%).

Cyr2lat-compression is implemented for channels and contacts. It is can be enabled for each channel/contact in it's settings, but when enabled, it disables SMAZ.

PR is reopened, dart format . and flutter analyze is passed, branch based on latest dev

Screenshots 1 2 3

@HDDen HDDen changed the title Feature: alternative compression by replace some 2-byte symbols by 1-byte latin analogs (named Cyr2Lat) Feature Request: alternative compression by replace some 2-byte symbols by 1-byte latin analogs (named Cyr2Lat) Apr 22, 2026
@446564 446564 added the Feedback Requested Issues or pull requests needed feedback label Apr 22, 2026
@gjelsoe

gjelsoe commented Apr 22, 2026

Copy link
Copy Markdown

SMAZ er most efficient with English language, you might wanna look at Unishox2 as it can better handle UTF-8 as well.

Did made a hardware implantation of it for MeshCore as a proof of concept. It can be accessed here :
meshcore-dev/MeshCore#1959

@HDDen

HDDen commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

SMAZ er most efficient with English language, you might wanna look at Unishox2 as it can better handle UTF-8 as well.

Did made a hardware implantation of it for MeshCore as a proof of concept. It can be accessed here : meshcore-dev/MeshCore#1959

In the method I propose, the changes made to the text do not distort the actual information being conveyed in any way (a client without cyr2lat support will see the original text/will still be able to read the message that you intended to send). Cyr2lat is also applicable to both private messages and channels (which, incidentally, in my experience, are used more frequently). In other words, the final message will be just as readable as before conversion – and this is extremely useful for the official Meshcore client and any third-party client. In fact, this 18–25% reduction in size is a free optimisation that complements other classic compression algorithms perfectly, as it provides less data to the archiver.

Some measurements/statistics/demonstrations:

Пpивeт! Этo тecтoвoe cooбщeниe, в кoтopoм пpимeнeнo cжaтиe киpиллицы чepeз ee пoдмeнy нa лaтиницy. - 144 bytes
Привет! Это тестовое сообщение, в котором применено сжатие кириллицы через её подмену на латиницу. - 180 bytes
Difference is 20%

Bизyaльнo, oтличий нe зaмeтнo, к тoмy жe, этo cжaтиe oбpaтнo coвмecтимo c oфициaльным клиeнтoм Meshcore - 151 bytes
Визуально, отличий не заметно, к тому же, это сжатие обратно совместимо с официальным клиентом Meshcore - 181 bytes
Difference is 16%

Koнeчнo, мoжнo nonыmamьcя дo6aвumь cuмвoлoв k cmaндapmным нacmpoйkaм, нo эmo 6yдem yжe дoвoльнo cuльнo зaмemнo... - 152 bytes
Конечно, можно попытаться добавить символов к стандартным настройкам, но это будет уже довольно сильно заметно... - 207 bytes
Difference is 26%

Libo mozhno polnostyu zamenit otpravlyaemyy tekst na translit, pri etom ego ne pridetsya nabirat vruchnuyu. Glavnoe, chtoby eto ne meshalo soobschestvu. - 152 bytes
Либо можно полностью заменить отправляемый текст на транслит, при этом его не придётся набирать вручную. Главное, чтобы это не мешало сообществу. - 266 bytes
Difference is 42%

@ericszimmermann

ericszimmermann commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

I mean why not, I just have a small comment to make:
In General it is just a "search and replace bevor send" function, with a standard set for cyrillic at the moment. It could be usefull for other languages to.
So why not keep it more general ? And change from:

if( smaz_enabled ) return smaz_if_smaller(text);
else if( cyr2lat_enabled ) return cyr2lat(text);
else return text

to:

if( replace_enabled ) text = replace(text);
if( smaz_enabled ) text = smaz_if_smaller(text);
//else if( unishox2_enabled ) text = unishox2(text);
return text;

Best Regards Eric

@HDDen

HDDen commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

My view is that, since SMAZ is a dictionary-based compression method primarily designed for whole English words, replacing some characters with others will simply render it unusable for the resulting string. However, traditional compression methods (if we ever get them) will work.

I.e.:

if( smaz_enabled ) text = smaz_if_smaller(text);
else if( replace_enabled ) text = cyr2lat(text);
// if( unishox2_enabled ) text = unishox2(text);
return text;

@HDDen

HDDen commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

Hello! I’ve synchronised my changes with the latest dev branch, and I’ve also added the ability to save custom dictionaries as profiles that can be assigned individually to each channel or contact.

Screenshots 1 2

@zjs81

zjs81 commented Apr 28, 2026

Copy link
Copy Markdown
Owner

Hi, I'm currently reviewing this

@zjs81

zjs81 commented Apr 30, 2026

Copy link
Copy Markdown
Owner

I've come to the conclusion that the best course of action is just to build a SMAZ dict for each language. I'm currently working on that. I did test on the smaz method vs your method and using the SMAZ method is 50% more efficent in transfer volume size.

@HDDen

HDDen commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

I've come to the conclusion that the best course of action is just to build a SMAZ dict for each language. I'm currently working on that. I did test on the smaz method vs your method and using the SMAZ method is 50% more efficent in transfer volume size.

Hi! There’s one issue with smaz compression: clients that don’t support it (such as the official Meshcore) won’t be able to decode and view the original message. The proposed method of replacing characters, however, is backward-compatible, so the resulting message with the replaced characters remains readable.

@zjs81

zjs81 commented Apr 30, 2026

Copy link
Copy Markdown
Owner

I've come to the conclusion that the best course of action is just to build a SMAZ dict for each language. I'm currently working on that. I did test on the smaz method vs your method and using the SMAZ method is 50% more efficent in transfer volume size.

Hi! There’s one issue with smaz compression: clients that don’t support it (such as the official Meshcore) won’t be able to decode and view the original message. The proposed method of replacing characters, however, is backward-compatible, so the resulting message with the replaced characters remains readable.

This is acceptable as its the way it currently is. Also soon official clients wont even see SMAZ text on their chat screen anymore. meshcore-dev/MeshCore#2392

We will be switching to sending these as binary blobs.

@HDDen

HDDen commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

Thank you for your comment!
Am I right in understanding that the following scenario is envisaged: if we want to compress our messages using the smaz method, the other participants in the channel we're posting to won't be able to see those messages? And in order to see them, the other users must also be using an app that supports smaz?

I'm afraid that in this case there will be a divide between participants in public channels, as many users use the official / terminal / custom apps for meshcore, and meshcore_open users will be writing "into the void" if they want to optimise their outgoing traffic. I think smaz is undoubtedly more efficient, but it's only applicable either if a local Meshcore community in a particular city is just getting started, or for private channels, or for private messages. In cities where the community is already established, however, it will be very difficult to optimise the network by promoting smaz – for example, in my city, most people use the official Meshcore client without support for any kind of optimisation, and it is much easier to offer them a client that allows them to stay in touch with everyone and enjoy the benefits of shorter outgoing packets.

The compression method I am proposing is an attempt at optimisation within the context of an established community. meshcore_open users will be able to send optimised messages to public channels without separating themselves from the rest of the chat participants. Perhaps there is some chance of implementing this functionality?

Update: Synchronised changes with the dev branch 🙏

@zjs81

zjs81 commented May 1, 2026

Copy link
Copy Markdown
Owner

Thank you for your comment! Am I right in understanding that the following scenario is envisaged: if we want to compress our messages using the smaz method, the other participants in the channel we're posting to won't be able to see those messages? And in order to see them, the other users must also be using an app that supports smaz?

I'm afraid that in this case there will be a divide between participants in public channels, as many users use the official / terminal / custom apps for meshcore, and meshcore_open users will be writing "into the void" if they want to optimise their outgoing traffic. I think smaz is undoubtedly more efficient, but it's only applicable either if a local Meshcore community in a particular city is just getting started, or for private channels, or for private messages. In cities where the community is already established, however, it will be very difficult to optimise the network by promoting smaz – for example, in my city, most people use the official Meshcore client without support for any kind of optimisation, and it is much easier to offer them a client that allows them to stay in touch with everyone and enjoy the benefits of shorter outgoing packets.

The compression method I am proposing is an attempt at optimisation within the context of an established community. meshcore_open users will be able to send optimised messages to public channels without separating themselves from the rest of the chat participants. Perhaps there is some chance of implementing this functionality?

Update: Synchronised changes with the dev branch 🙏

My next goal is to add the word SMAZ to the top of the chat screen and when you click it SMAZ is enabled and a message will appear saying only clients using meshcore open or apps that have implemented SMAZ will be able to see these messages. It will show filled and unfilled to indicate if its on or off. I supposed that instead of SMAZ it could use an icon like this or something then when you click it a dialog appears with toggles for both so you can enable SMAZ or enable this and it could explain both.

Regular clients not being able to read SMAZ has been here since the apps conception. The only new thing coming with it is that regular users wouldnt see the base64 anymore. Another cool thing is using binary will make the SMAZ messages even smaller.

What do you think about what I proposed?

@HDDen

HDDen commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

I think SMAZ is a very good way to compress messages, but it’s difficult to apply to public channels with a mixed audience (users of meshcore_open as first part and all other application users as second part of audience). This means that public channels will remain without any optimisation until:

  1. or Liam adds full compression at the firmware level
  2. or SMAZ will be included in the official (and subsequently other) apps

As an example, when a fork of your app was created in an attempt to optimise sent text by changing the encoding, I can cite «Repachat». This is an iOS client in which, when the character limit was exceeded, the encoding switched from UTF-8 to UCF, where all characters are 1 byte in size. This fork did not catch on in our city community, as users of the standard apps saw unreadable messages, which caused discomfort for both readers and, ultimately, the sender, and they preferred to use the classic official version.

From this, we can conclude that people wants to optimise their messages (if only to fit more information within the character limit), but at the same time they want other channel members to be able to read these messages as well.

In fact, the proposed method is intended for channels with a mixed audience, where one does not want to lose the connection between its segments. For private channels and DM's is recommended use SMAZ.

Personally, I use meshcore_open because it offers a more informative view of packet flow, clearer statistics, and, in some respects, more convenient control. And I wouldn’t want to lose contact with the rest of the network, just as I’d like to be able to optimise my own network load.

@gjelsoe

gjelsoe commented May 2, 2026

Copy link
Copy Markdown

I've come to the conclusion that the best course of action is just to build a SMAZ dict for each language. I'm currently working on that. I did test on the smaz method vs your method and using the SMAZ method is 50% more efficent in transfer volume size.

Perhaps this will work better than SMAZ and Unishox2 but will must likely needed to be handled by the APP
Unishox2 can be handled by an ESP32 with no need for dict

https://github.com/dimapanov/mesh-compressor

@zjs81

zjs81 commented May 10, 2026

Copy link
Copy Markdown
Owner

I think we can merge this for now and re-evaluate in the future.

@zjs81 zjs81 merged commit 3ec3b05 into zjs81:dev May 10, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feedback Requested Issues or pull requests needed feedback

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants