-
The trick:
~> Unicode := an encoding system that allows computers to exchange information regardless of the language used
~> The Unicode Bidirectional Algorithm is used to define if we are writing Left-to-Right or Right-to-Left.
~> For example: The
RLOcharacter (U+202e in unicode) is designed to support languages that are written right to left, such as Arabic and Hebrew. Copy/paste it in your editor and you will see that you are now writing Right to Left -
Why it could be harmful?
~> In fact it is not a new discovery, as these bugs/tricks have been known for like 20 years but it has to be widely and publicly reported. It is why a recent research (2021) give the vulnerability a name, Trojan Source.
~> An attacker could use this characters to produce source code whose tokens are logically encoded in a different order from the one in which they are display. It is particularly dangerous, in a system based on human code review/validation, so basically all open source project. Indeed, a malicious contributor can add malicious code lines that will be seen as comments by reviewers, and thus divert theirs attentions.See example (from nickboucher repository)
- Github print a warning when file contains unicode bidirection algorithms characters. (does Gitlab detect it?)
- nickboucher/bidi-viewer: View and Analyze trojan source in a React-based webapp
- If your text editors have a “strip all comments” mode, enable it.
In the same category of risk we have homoglyph:
-
A homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar.
~> In fact we are speaking about homograph (word composed of homoglyph)
~> For example
агіагуis a homogram ofariary~> For an attacker who wants malicious code and lure the reviewers, homoglyph are a good weapon. For example he could write a function with a name which is an homogram of an already existing one and then call the homoglyph. In that way, we could think he is calling the original one. See example (from nickboucher repository)
-
Another example is the IDN homograph attack:
~> The internationalized domain name (IDN) homograph attack is a technique used by malicious people to impersonate a website. With the internationalization of domain names, it is now possible to register an address with homoglyphs like ɡіτһυЬ.ϲоⅿ (and therefore create a similar or different website)
In python:
а = True # in a naive world, everybody is admin
# some stuff
a = False # finally the world is mean, we don't want admin anymore
if а:
print("You are an admin!")
else:
print("You are not admin.")The output:
You are an admin!
This trick is effecient to make believe the value of a variable changes. Especially for languages where assignment and initialization steps are identical. In this example we fake changing the value of a
- nickboucher/trojan-source repo: Expose vulnerability and many many examples
- Whitepaper: present the vulnerability
- article: explanation & illustration of the vuln with javascript
- dcode: make your own homoglyph