Enhancement - optional transparent homoglyph encoding of a few characters in certain languages for more compact and efficient text messages by LuigiVampa92 · Pull Request #4491 · meshtastic/Meshtastic-Android

LuigiVampa92 · 2026-02-07T01:24:02Z

Hi everyone. First of all - thank you for an awesome project, I have been using it for a while and it works great, fun to tinker with, and very useful to solve some of my real life problems. I express my great gratitude to all the contributors.

I would like to suggest a small enhancement to the project.

The general idea - we can reduce the binary size of the text message by replacing some 2 byte characters to 1 byte characters without actually losing anything in the process.

The basic problem this change solves is this:

The text message that can be transmitted by Meshtastic has a maximum size limit, depending on the preset. In case of usual LongFast preset it is 200 characters. This is a reasonable amount for a short message. The thing is - it not literally the symbol count that matters, it is its binary size, and this is where the things get tricky.

In UTF8 unicode the characters of the latin alphabet occupy exactly 1 byte each, because they fit into the single byte without its highest bit (less than 128), and that gives us the originally intended capacity of 200 symbols for a single message. However, in the various languages their national alphabets use different non latin characters, and in their case each character occupies at least 2 bytes, which reduces the effective maximum transmitted message size by half - only to 100 non-latin symbols per a single message. In practice it is around 115-120 * characters (200/2 = 100 + some characters are spaces and special signs that occupy only 1 byte), but still that is not too much, and sometimes that is not enough to combine a message with a single complete thought, so it would be nice to have some more available capacity for that.

So the idea I came with is this:

In various national alphabets there are letters that basically repeat latin letters, but they still have their own dedicated character codes in unicode.

For instance, in cyrillic alphabets (also in greek and a few others) many characters have exactly the same visual appearance as the latin characters, but due to usage of their dedicated character codes they occupy 2 bytes each, though such letters can be easily replaced with their latin analogues without any loss of their meaning or visual change.

Some example of such cyrillic letters: "А", "В", "С", "Е", etc. So they can naturally be replaced with corresponding latin "A", "B", "C", "E" and we could save one byte per each of their occasion in the message.

These are fairly common symbols in cyrillic alphabtes - half of the vowels and very frequently used consonants. According to statistics, they form about 20-25% of all letters in the average text. Replacing them with latin characters will save the user the same proportion of available message space and can fit more letters into the message or to deliver the same text message with higher reliability (because it will become shorter). The average transmitted message volume can then be increased from ~115-120 characters to ~140-145 and sometimes even more.

I gonna give you a couple of examples:

(1) Let's take a look at the cyrillic text "Мештастик - это проект с открытым исходным кодом" ("Meshtastic is an open source project"). The original byte size of this message is 88 bytes: 40 cyrillic letters, 2 bytes each, + 7 spaces, 1 byte each + 1 special character, also 1 byte). By replacing the cyrillic characters with their corresponding latin character homoglypghs we will get this: "Meштacтик - этo пpoeкт c oткpытым иcxoдным кoдoм". It looks absolutely the same, but now it is just 72 bytes, because we have replaced letters "M", "e", "a", "c", "o", "p", "x" to latin and now we have only 24 cyrillic letters, 2 bytes each, 16 latin letters, 1 byte each, and spaces and special characters remain the same - 7 and 1 respectfully. So as you can see we have saved 16 bytes by using 16 latin letters instead of cyrillic, and that is roughly 20% - not bad.

Let's look at another example:

(2) Let's take this cyrillic text this time - "Всему свое время" ("All in good time"). The original is 30 bytes: 14 cyrillic letters, 2 bytes each + 2 spaces. After the homoglyph transform: "Bceмy cвoe вpeмя" is now just 21 bytes: just 5 cyrillic letters, 2 bytes each, 9 latin letters and 2 spaces. We have replaced "Bce y c oe pe" this time, 9 bytes saved, which is 30% of the message binary size this time.

In short - it is great! I have been using this for a while with my personal node, it works flawlessly, and I thought that it is a good idea to contribute this to the main project code.

So this pull request contain the following changes:
(1) I have added a new toggle to the settings tab, to the "application" group. The feature is can be turned on and off there and is disabled by default.
(2) I have added a new prefs file that stores the current state of this setting - whether it is turned off or on
(3) I have added the object that has a static method that accepts a string as a parameters and returns the transformed string
(4) In the Message implementation I check whether this feature is enabled and if it is, then all the characters from predefined list are replaced by corresponding latin characters

For now the replaced characters are from the basic cyrillic block (U+0400-U+04FF), and are used mostly in the languages of the eastern Europe, Balkan, some central Asian countries, etc, but maybe there are something else that can be handled the same way. I have left all the links to unicode characters descriptions in the comments inside the transformer implementation.

Please note that only 100% "reliable", completely visually identical characters are replaced. The characters that look like latin but have minor differences like various descenders, hooks, strokes, etc are not replaced with "simplified" latin appearance and will remain 2 byte unicode, as usual (though maybe in future it might make sense to make some "aggressive" mode that will replace them as well)

Here are some screenshots of how it works:

(settings screen update)

And some examples of me typing messages with disabled and enabled feature respectfully

Example 1:

Before	After

Example 2:

Before	After

Example 3:

Before	After

I have added a basic test that covers this implementation. Spotless, Detekt and tests are all passed, everything is ok. Hope I didn't miss anything.

I hope that you will consider merging this enhancement, since it is a huge iprovement for those who use languages that use cyrillic alphabets.

Thank you again!

CLAassistant · 2026-02-07T01:24:08Z

All committers have signed the CLA.

CLAassistant · 2026-02-07T01:24:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

DaneEvans · 2026-02-07T02:08:47Z

That is the best MR overview I've read in a long time.
I'll take a look at the code itself, but it sounds like a good idea

codecov · 2026-02-07T02:19:51Z

Codecov Report

❌ Patch coverage is 60.37736% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 10.79%. Comparing base (cab3940) to head (f088814).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
.../meshtastic/core/prefs/homoglyph/HomoglyphPrefs.kt	0.00%	11 Missing ⚠️
...g/meshtastic/feature/messaging/MessageViewModel.kt	0.00%	7 Missing ⚠️
...tic/feature/settings/radio/RadioConfigViewModel.kt	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##            main    #4491      +/-   ##
=========================================
+ Coverage   9.21%   10.79%   +1.58%     
=========================================
  Files        427      429       +2     
  Lines      14351    14408      +57     
  Branches    2389     2389              
=========================================
+ Hits        1322     1555     +233     
+ Misses     12776    12558     -218     
- Partials     253      295      +42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DaneEvans

Good implementation too,
A few minor changes, but this looks pretty good to merge.

…491)

…(PR 4491)

…for strings with some amount of homoglyphs, only homoglyphs, no homoglyphs, only latin characters, and only non-latin characters that are impossible to be presented by latin homoglyphs (PR 4491)

jamesarich

All good on my end, thanks for the thorough writeup and clean submission.
I'll ask the robot to do a review pass, but it shouldn't come up with much 👍

Copilot

Pull request overview

Adds an optional, user-controlled “homoglyph encoding” feature intended to reduce UTF-8 byte size of Cyrillic text messages by substituting visually identical ASCII characters, improving effective message capacity within byte limits.

Changes:

Adds a new application settings toggle for “Compact encoding for Cyrillic” backed by a new HomoglyphPrefs SharedPreferences store.
Introduces a HomoglyphCharacterStringTransformer and wires preference state into messaging UI/view models.
Adds unit tests for the transformer and updates feature/messaging test dependencies.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
feature/settings/src/main/kotlin/org/meshtastic/feature/settings/radio/RadioConfigViewModel.kt	Injects homoglyph prefs and exposes a Flow/toggle API for Settings UI.
feature/settings/src/main/kotlin/org/meshtastic/feature/settings/SettingsScreen.kt	Adds a switch in App Settings to enable/disable homoglyph encoding.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/MessageViewModel.kt	Injects homoglyph prefs and exposes an enabled Flow to the UI layer.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/Message.kt	Uses the pref to alter byte-counting behavior in the message input UI.
feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/HomoglyphCharacterStringTransformer.kt	New transformer implementing Cyrillic→ASCII substitutions.
feature/messaging/src/test/kotlin/org/meshtastic/feature/messaging/HomoglyphCharacterTransformTest.kt	New unit tests covering transformer behavior and byte-size shrinkage claims.
feature/messaging/build.gradle.kts	Adds JUnit/MockK test dependencies for the messaging module.
core/strings/src/commonMain/composeResources/values/strings.xml	Adds the user-facing setting label string.
core/prefs/src/main/kotlin/org/meshtastic/core/prefs/homoglyph/HomoglyphPrefs.kt	New preference interface + implementation for the toggle.
core/prefs/src/main/kotlin/org/meshtastic/core/prefs/di/PrefsModule.kt	Registers homoglyph prefs + SharedPreferences provider in Hilt DI.

Comments suppressed due to low confidence (1)

feature/messaging/src/main/kotlin/org/meshtastic/feature/messaging/Message.kt:479

Homoglyph encoding is only used to compute canSend/byte length, but the text that gets sent is still messageInputState.text (raw). This can enable sending messages that exceed maxByteSize and the optimization won't actually be applied to transmitted content. Apply the same transformation to the string passed to SendMessage (or move the transformation into the send pipeline, e.g., MessageViewModel.sendMessage, so all send paths are covered).

            MessageInput(
                isEnabled = connectionState.isConnected(),
                isHomoglyphEncodingEnabled = homoglyphEncodingEnabled,
                textFieldState = messageInputState,
                onSendMessage = {
                    val messageText = messageInputState.text.toString().trim()
                    if (messageText.isNotEmpty()) {
                        onEvent(MessageScreenEvent.SendMessage(messageText, replyingToPacketId))
                    }

LuigiVampa92 · 2026-02-07T15:37:54Z

All good on my end, thanks for the thorough writeup and clean submission. I'll ask the robot to do a review pass, but it shouldn't come up with much 👍

Hi, James, thanks for the reply! I see that Copilot has raised some fair points, like the extra de facto unused dependencies I have copy-pasted from other modules, and a few other things. I gonna fix that and reply here again

…dule (PR 4491)

…e feature.messaging module (PR 4491)

…ment of cyrillic capital letter "ze" with a digit (PR 4491)

…RadioConfigState data class. Made homoglyphEncodingEnabledFlow immutable (PR 4491)

…sure that the feature is effective and consistent across indirect approaches to sending messages as well (quick-chat, reply, etc.) (PR 4491)

LuigiVampa92 · 2026-02-07T17:04:02Z

@DaneEvans @jamesarich I went through the list of issues that Copilot highlighted and committed the fixes.

I would be grateful if you could take a look and give feedback when it is convenient for you. Thanks!

jamesarich

Looks good, thanks again @LuigiVampa92 ! 👍

LuigiVampa92 · 2026-02-07T19:47:18Z

Looks good, thanks again @LuigiVampa92 ! 👍

It is a pleasure :) Thank you again for your work. I hope that I will come up with some more ideas in future

LuigiVampa92 added 2 commits February 7, 2026 02:16

Added homoglyph encoding for transmitted text messages

b732da9

Added unit test for homoglyph encoding of transmitted text messages

285b522

DaneEvans approved these changes Feb 7, 2026

View reviewed changes

LuigiVampa92 added 3 commits February 7, 2026 14:20

Better description for homoglyph encoding feature settings item (PR 4…

e63b1c5

…491)

More simple and short homoglyph transformer class description in doc …

f48aa55

…(PR 4491)

Added unit test coverage for homoglyph transformer. Added test cases …

4405ed9

…for strings with some amount of homoglyphs, only homoglyphs, no homoglyphs, only latin characters, and only non-latin characters that are impossible to be presented by latin homoglyphs (PR 4491)

jamesarich approved these changes Feb 7, 2026

View reviewed changes

jamesarich requested a review from Copilot February 7, 2026 14:28

Copilot started reviewing on behalf of jamesarich February 7, 2026 14:28 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

LuigiVampa92 added 5 commits February 7, 2026 19:22

Removed unused mockk dependency declaration from feature.messaging mo…

a2c6eac

…dule (PR 4491)

Fixed assert expressions in homoglyph transformation unit tests in th…

fa39cd4

…e feature.messaging module (PR 4491)

Added commentary with an explicit description about homoglyph replace…

0790231

…ment of cyrillic capital letter "ze" with a digit (PR 4491)

Removed unused parameter homoglyphCharactersEncodingEnabled from the …

ba146af

…RadioConfigState data class. Made homoglyphEncodingEnabledFlow immutable (PR 4491)

Applied homoglyph transformation to the sendMessage() method to make …

f088814

…sure that the feature is effective and consistent across indirect approaches to sending messages as well (quick-chat, reply, etc.) (PR 4491)

jamesarich approved these changes Feb 7, 2026

View reviewed changes

jamesarich added this pull request to the merge queue Feb 7, 2026

Merged via the queue into meshtastic:main with commit 4303bfa Feb 7, 2026
8 checks passed

jamesarich mentioned this pull request Feb 14, 2026

feat(settings): Only show homoglyph setting for Cyrillic locales #4559

Merged

denlenin mentioned this pull request Feb 23, 2026

[FEAT] homoglyph support Yeraze/meshmonitor#1997

Closed

Yeraze mentioned this pull request Feb 23, 2026

feat: add homoglyph message optimization setting Yeraze/meshmonitor#2000

Merged

6 tasks

skibidius mentioned this pull request Mar 12, 2026

Cyrillic compression meshcore-dev/MeshCore#2013

Open

Uh oh!

Uh oh!

Conversation

LuigiVampa92 commented Feb 7, 2026

Uh oh!

CLAassistant commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 7, 2026

Uh oh!

DaneEvans commented Feb 7, 2026

Uh oh!

codecov Bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DaneEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jamesarich left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LuigiVampa92 commented Feb 7, 2026

Uh oh!

LuigiVampa92 commented Feb 7, 2026

Uh oh!

jamesarich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LuigiVampa92 commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Feb 7, 2026 •

edited

Loading

codecov Bot commented Feb 7, 2026 •

edited

Loading