Skip to content

Enhancement - optional transparent homoglyph encoding of a few characters in certain languages for more compact and efficient text messages#4491

Merged
jamesarich merged 10 commits into
meshtastic:mainfrom
LuigiVampa92:enhancement/homoglyph-encoding-for-text-messages
Feb 7, 2026
Merged

Enhancement - optional transparent homoglyph encoding of a few characters in certain languages for more compact and efficient text messages#4491
jamesarich merged 10 commits into
meshtastic:mainfrom
LuigiVampa92:enhancement/homoglyph-encoding-for-text-messages

Conversation

@LuigiVampa92

Copy link
Copy Markdown
Contributor

Hi everyone. First of all - thank you for an awesome project, I have been using it for a while and it works great, fun to tinker with, and very useful to solve some of my real life problems. I express my great gratitude to all the contributors.

I would like to suggest a small enhancement to the project.

The general idea - we can reduce the binary size of the text message by replacing some 2 byte characters to 1 byte characters without actually losing anything in the process.


The basic problem this change solves is this:

The text message that can be transmitted by Meshtastic has a maximum size limit, depending on the preset. In case of usual LongFast preset it is 200 characters. This is a reasonable amount for a short message. The thing is - it not literally the symbol count that matters, it is its binary size, and this is where the things get tricky.

In UTF8 unicode the characters of the latin alphabet occupy exactly 1 byte each, because they fit into the single byte without its highest bit (less than 128), and that gives us the originally intended capacity of 200 symbols for a single message. However, in the various languages their national alphabets use different non latin characters, and in their case each character occupies at least 2 bytes, which reduces the effective maximum transmitted message size by half - only to 100 non-latin symbols per a single message. In practice it is around 115-120 * characters (200/2 = 100 + some characters are spaces and special signs that occupy only 1 byte), but still that is not too much, and sometimes that is not enough to combine a message with a single complete thought, so it would be nice to have some more available capacity for that.


So the idea I came with is this:

In various national alphabets there are letters that basically repeat latin letters, but they still have their own dedicated character codes in unicode.

For instance, in cyrillic alphabets (also in greek and a few others) many characters have exactly the same visual appearance as the latin characters, but due to usage of their dedicated character codes they occupy 2 bytes each, though such letters can be easily replaced with their latin analogues without any loss of their meaning or visual change.

Some example of such cyrillic letters: "А", "В", "С", "Е", etc. So they can naturally be replaced with corresponding latin "A", "B", "C", "E" and we could save one byte per each of their occasion in the message.

These are fairly common symbols in cyrillic alphabtes - half of the vowels and very frequently used consonants. According to statistics, they form about 20-25% of all letters in the average text. Replacing them with latin characters will save the user the same proportion of available message space and can fit more letters into the message or to deliver the same text message with higher reliability (because it will become shorter). The average transmitted message volume can then be increased from ~115-120 characters to ~140-145 and sometimes even more.


I gonna give you a couple of examples:

(1) Let's take a look at the cyrillic text "Мештастик - это проект с открытым исходным кодом" ("Meshtastic is an open source project"). The original byte size of this message is 88 bytes: 40 cyrillic letters, 2 bytes each, + 7 spaces, 1 byte each + 1 special character, also 1 byte). By replacing the cyrillic characters with their corresponding latin character homoglypghs we will get this: "Meштacтик - этo пpoeкт c oткpытым иcxoдным кoдoм". It looks absolutely the same, but now it is just 72 bytes, because we have replaced letters "M", "e", "a", "c", "o", "p", "x" to latin and now we have only 24 cyrillic letters, 2 bytes each, 16 latin letters, 1 byte each, and spaces and special characters remain the same - 7 and 1 respectfully. So as you can see we have saved 16 bytes by using 16 latin letters instead of cyrillic, and that is roughly 20% - not bad.

Let's look at another example:

(2) Let's take this cyrillic text this time - "Всему свое время" ("All in good time"). The original is 30 bytes: 14 cyrillic letters, 2 bytes each + 2 spaces. After the homoglyph transform: "Bceмy cвoe вpeмя" is now just 21 bytes: just 5 cyrillic letters, 2 bytes each, 9 latin letters and 2 spaces. We have replaced "Bce y c oe pe" this time, 9 bytes saved, which is 30% of the message binary size this time.


In short - it is great! I have been using this for a while with my personal node, it works flawlessly, and I thought that it is a good idea to contribute this to the main project code.

So this pull request contain the following changes:
(1) I have added a new toggle to the settings tab, to the "application" group. The feature is can be turned on and off there and is disabled by default.
(2) I have added a new prefs file that stores the current state of this setting - whether it is turned off or on
(3) I have added the object that has a static method that accepts a string as a parameters and returns the transformed string
(4) In the Message implementation I check whether this feature is enabled and if it is, then all the characters from predefined list are replaced by corresponding latin characters

For now the replaced characters are from the basic cyrillic block (U+0400-U+04FF), and are used mostly in the languages of the eastern Europe, Balkan, some central Asian countries, etc, but maybe there are something else that can be handled the same way. I have left all the links to unicode characters descriptions in the comments inside the transformer implementation.

Please note that only 100% "reliable", completely visually identical characters are replaced. The characters that look like latin but have minor differences like various descenders, hooks, strokes, etc are not replaced with "simplified" latin appearance and will remain 2 byte unicode, as usual (though maybe in future it might make sense to make some "aggressive" mode that will replace them as well)

Here are some screenshots of how it works:


settings_screen_update
(settings screen update)


And some examples of me typing messages with disabled and enabled feature respectfully


Example 1:

Before After
text_01_cyrillic text_01_homoglyph

Example 2:

Before After
text_02_cyrillic text_02_homoglyph

Example 3:

Before After
text_03_long_cyrillic text_03_long_homoglyph

I have added a basic test that covers this implementation. Spotless, Detekt and tests are all passed, everything is ok. Hope I didn't miss anything.

I hope that you will consider merging this enhancement, since it is a huge iprovement for those who use languages that use cyrillic alphabets.

Thank you again!

@CLAassistant

CLAassistant commented Feb 7, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@DaneEvans

Copy link
Copy Markdown
Collaborator

That is the best MR overview I've read in a long time.
I'll take a look at the code itself, but it sounds like a good idea

@codecov

codecov Bot commented Feb 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.37736% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 10.79%. Comparing base (cab3940) to head (f088814).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
.../meshtastic/core/prefs/homoglyph/HomoglyphPrefs.kt 0.00% 11 Missing ⚠️
...g/meshtastic/feature/messaging/MessageViewModel.kt 0.00% 7 Missing ⚠️
...tic/feature/settings/radio/RadioConfigViewModel.kt 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##            main    #4491      +/-   ##
=========================================
+ Coverage   9.21%   10.79%   +1.58%     
=========================================
  Files        427      429       +2     
  Lines      14351    14408      +57     
  Branches    2389     2389              
=========================================
+ Hits        1322     1555     +233     
+ Misses     12776    12558     -218     
- Partials     253      295      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DaneEvans DaneEvans left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation too,
A few minor changes, but this looks pretty good to merge.

Comment thread core/strings/src/commonMain/composeResources/values/strings.xml Outdated
…for strings with some amount of homoglyphs, only homoglyphs, no homoglyphs, only latin characters, and only non-latin characters that are impossible to be presented by latin homoglyphs (PR 4491)

@jamesarich jamesarich left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good on my end, thanks for the thorough writeup and clean submission.
I'll ask the robot to do a review pass, but it shouldn't come up with much 👍

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional, user-controlled “homoglyph encoding” feature intended to reduce UTF-8 byte size of Cyrillic text messages by substituting visually identical ASCII characters, improving effective message capacity within byte limits.

Changes:

  • Adds a new application settings toggle for “Compact encoding for Cyrillic” backed by a new HomoglyphPrefs SharedPreferences store.
  • Introduces a HomoglyphCharacterStringTransformer and wires preference state into messaging UI/view models.
  • Adds unit tests for the transformer and updates feature/messaging test dependencies.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
feature/settings/src/main/kotlin/org/meshtastic/feature/settings/radio/RadioConfigViewModel.kt Injects homoglyph prefs and exposes a Flow/toggle API for Settings UI.
feature/settings/src/main/kotlin/org/meshtastic/feature/settings/SettingsScreen.kt Adds a switch in App Settings to enable/disable homoglyph encoding.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/MessageViewModel.kt Injects homoglyph prefs and exposes an enabled Flow to the UI layer.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/Message.kt Uses the pref to alter byte-counting behavior in the message input UI.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/HomoglyphCharacterStringTransformer.kt New transformer implementing Cyrillic→ASCII substitutions.
feature/messaging/src/test/kotlin/org/meshtastic/feature/messaging/HomoglyphCharacterTransformTest.kt New unit tests covering transformer behavior and byte-size shrinkage claims.
feature/messaging/build.gradle.kts Adds JUnit/MockK test dependencies for the messaging module.
core/strings/src/commonMain/composeResources/values/strings.xml Adds the user-facing setting label string.
core/prefs/src/main/kotlin/org/meshtastic/core/prefs/homoglyph/HomoglyphPrefs.kt New preference interface + implementation for the toggle.
core/prefs/src/main/kotlin/org/meshtastic/core/prefs/di/PrefsModule.kt Registers homoglyph prefs + SharedPreferences provider in Hilt DI.
Comments suppressed due to low confidence (1)

feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/Message.kt:479

  • Homoglyph encoding is only used to compute canSend/byte length, but the text that gets sent is still messageInputState.text (raw). This can enable sending messages that exceed maxByteSize and the optimization won't actually be applied to transmitted content. Apply the same transformation to the string passed to SendMessage (or move the transformation into the send pipeline, e.g., MessageViewModel.sendMessage, so all send paths are covered).
            MessageInput(
                isEnabled = connectionState.isConnected(),
                isHomoglyphEncodingEnabled = homoglyphEncodingEnabled,
                textFieldState = messageInputState,
                onSendMessage = {
                    val messageText = messageInputState.text.toString().trim()
                    if (messageText.isNotEmpty()) {
                        onEvent(MessageScreenEvent.SendMessage(messageText, replyingToPacketId))
                    }

Comment thread feature/messaging/build.gradle.kts Outdated
@LuigiVampa92

Copy link
Copy Markdown
Contributor Author

All good on my end, thanks for the thorough writeup and clean submission. I'll ask the robot to do a review pass, but it shouldn't come up with much 👍

Hi, James, thanks for the reply! I see that Copilot has raised some fair points, like the extra de facto unused dependencies I have copy-pasted from other modules, and a few other things. I gonna fix that and reply here again

…ment of cyrillic capital letter "ze" with a digit (PR 4491)
…RadioConfigState data class. Made homoglyphEncodingEnabledFlow immutable (PR 4491)
…sure that the feature is effective and consistent across indirect approaches to sending messages as well (quick-chat, reply, etc.) (PR 4491)
@LuigiVampa92

Copy link
Copy Markdown
Contributor Author

@DaneEvans @jamesarich I went through the list of issues that Copilot highlighted and committed the fixes.

I would be grateful if you could take a look and give feedback when it is convenient for you. Thanks!

@jamesarich jamesarich left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks again @LuigiVampa92 ! 👍

@jamesarich jamesarich added this pull request to the merge queue Feb 7, 2026
Merged via the queue into meshtastic:main with commit 4303bfa Feb 7, 2026
8 checks passed
@LuigiVampa92

Copy link
Copy Markdown
Contributor Author

Looks good, thanks again @LuigiVampa92 ! 👍

It is a pleasure :) Thank you again for your work. I hope that I will come up with some more ideas in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants