There’s a disturbing trend I’ve been seeing for several years: People speak English and use the words “Slavic” and “Cyrillic” in senses that look very weird to me.
First, here are their true senses.
“Slavic” refers to two main things:
A family of culturally, geographically, and genetically related ethnic groups of Eastern, Central, and Souther Europe: Belarusians, Bosniaks, Bulgarians, Croatians, Czechs, Macedonians, Poles, Russians, Rusyns, Serbs, Slovaks, Slovenes, Ukrainians, and several others. (What are those relations exactly is a topic of endless debates, but most people agree that there are some relations between them.)
The family of languages that most people who belong to these ethnic groups, as well as many people of other ethnic groups who live in the same places, speak.
“Cyrillic” refers to one thing: it is an alphabet that was developed for the language that was spoken over a thousand years ago in Southern Europe, now known as “Old Church Slavonic”. Later it was adapted for writing the Slavic languages of Eastern and Southern Europe (Belarusian, Bulgarian, Macedonian, Russian, Carpathian and Pannonian Rusyn, Serbian, Ukrainian), and many non-Slavic languages of Russia (Tatar, Chechen, Udmurt, Sakha, and many others) and countries around it (Kazakh, Mongolian, and more).
Now, note:
Not all languages that are written in a Cyrillic-based alphabet are Slavic.
Not all Slavic languages are written in a Cyrillic alphabet. Croatian, Czech, Polish, Slovak, and Slovene are pretty much always written in the Latin alphabet; Serbian is written in both Cyrillic and Latin (it’s complicated); Belarusian is written mostly in Cyrillic, but occasionally in Latin.
Russia is the biggest country that uses the Cyrillic alphabet by area, and Russian is the biggest language that uses it by the number of speakers, but the Cyrillic alphabet does not “belong” to Russia. It began in Southern Europe, where Bulgaria and North Macedonia are today. It “belongs”, whatever that means, to everyone who uses it—Bulgarians, Kazakhs, Mongols, Russians, Serbians, Ukrainians, and so on.
Russians are the biggest Slavic ethnic group, but Slavic identity doesn’t belong exclusively to them, or to any other ethnic group.
Some countries and regions that use a Cyrillic-based alphabet were in the past part of the Russian Empire or Soviet Union, but not all of them. Bulgaria, North Macedonia, Serbia, Mongolia, and several other countries that use the Cyrillic alphabet were never a part of it.
Slavic languages that are written in the Cyrillic alphabet are related, but not mutually intelligible. A Russian speaker who tries reading Belarusian, Bulgarian, Serbian, or Ukrainian will have to invest considerable effort, and even then won’t understand everything. And languages that aren’t Slavic are completely unreadable to someone who didn’t actually learn them. It’s as if an English speaker tried to read Turkish or Hungarian without any learning.
Not all people who speak a Slavic language are Slavic. Many Avars, Jews, Maris, Tatars, Udmurts, Uzbeks, and people of other ethnic groups who live in the areas where Slavic languages are spoken or used to be spoken often or always speak Russian, Ukrainian, Serbian or other Slavic languages. (That said, I’m not making an essentialist or “genetic” argument here about who is truly Slavic or not. By default, those people are at least partly Slavic by culture and language, but they are probably don’t call themselves Slavic by ethnic identity. Of course, every person is unique and ethnic identity is not rigid.)
Slavic people have some historical and cultural grudges, different interpretations of historical events, and even literal wars between them, but as far as I can tell, they all proudly call themselves Slavic, and they recognize each other as Slavic. For example, many Czechs are unhappy about the many years during which their country’s government was under the strong political influence of Russia (as Soviet Union), but they still call themselves Slavic and recognize Russians as Slavic. And a war is still raging between Russia and Ukraine, but both Russians and Ukrainians call themselves and each other Slavic.
Now, finally, here are the details of the thing that bothers me: I repeatedly see people using those two words to mean… God knows what. I can only guess. Sometimes, “Cyrillic” or “Slavic” actually just means “Russian”. Sometimes, “Slavic” means “Russian or Ukrainian or Polish”, but not Serbian or Czech, and I know this because people say “Slavic or Czech”, which doesn’t make a lot of sense because Serbian and Czech people are Slavic. Sometimes people say “Cyrillic or Ukrainian”, which is complete nonsense.
Where do I see it? I mostly hear “Slavic” in memes and short online videos, where people make jokes about their Slavic girlfriends, boyfriends, or parents. And they can say “Slavic girlfriends behave like this, but Czech girlfriends behave like this”—even though Czech girlfriends are actually Slavic, too. And I see “Cyrillic” on websites, where people say write things like “this part is localized for Cyrillic and Ukrainian”, which, again, is completely nonsensical because Cyrillic is a script and Ukrainian is a language. And sometimes, people speak about something like “the Cyrillic part of the Internet”, by which they usually mean “Russian”, and don’t include in it Ukrainian, Serbian, Kazakh, etc.
Why does this happen? When did these words change their meaning to something nonsensical? Does it have a specific source or is it just general ignorance? Attempt to be less repetitive and avoid the words “Russian” or “Ukrainian”?
I should mention another thing, which is possibly related, and which is very dark. In Russian, people sometimes write advertisement for renting apartment or job offers “for Slavic people”. This is obviously racist—they will tolerate Russians, Belarusians, or Ukrainians, but they don’t want to offer this to Azerbaijanis, Georgians, Jews, Uzbeks, etc. I suspect that the usage of “Slavic” by English speakers is somehow related to this; probably not intentionally and maliciously, but related nevertheless. It’s only a suspicion, and if you have a better idea, let me know.
If you’ve known me for some time, I’m exactly the same person I’ve always been. There’s a wide consensus that people are born this way and remain this way for life. The only difference is that now you and I know that my kind of personality was described by some doctors in some books, and they gave it a name.
Stretching after shoveling snow for about five hours in the aftermath of the January 2026 snowstorm in North America.
I was formally diagnosed by a doctor of psychology in January 2026, which is also the month I turned forty-six. For a bunch of reasons that are too long for this post, I’ve suspected that this is the name for what I am since at least 2015. I became almost sure about it in the middle of 2025, which is when I also decided to get a formal diagnosis. Some friends to whom I told about this ask me what led to this, and I’ll write about it separately someday.
Some people who know me may be very surprised to read that I’m autistic. Others will be surprised that it took me so long to figure it out. I understand both. When I read old posts in this blog, for example, I see how many of them are very typical autistic things to write, and I just wasn’t aware of it. Maybe I’ll make a list of those posts someday.
Humanity comprehends autism better these days than it did forty years ago. But not all people comprehend it well yet. I barely comprehend it well myself, as I’m only in the beginning of the journey to really grasp it. It’s quite possible that I’m writing some nonsense in this post! If you think that I’m wrong about something, do feel free to send me a correction as a comment or a private email.
Autistic people who are more similar to me are often told that they “don’t look autistic”. I don’t like hearing it, and the same is probably true for most of us, but I do understand why people think like that. Autism looks very different in different people. Some autistic people aren’t able to speak, and some do; some aren’t able to have families or jobs, and some are. And so on. That’s why it’s called a “spectrum” these days.
So what does it even mean? Autism is complex to describe. Compare it to left-handedness, for example: a consistent preference for using the left hand for writing and other fine motor tasks. That’s it, one short sentence. Autism is described as a much longer list of traits, and, very importantly, they must come as a bundle.
Described narrowly, and closely following the definition in the DSM, the guidebook that psychologists in the United States use to classify conditions, my kind of autism basically means the following seven things:
One: I have various difficulties with talking to people. They are not always huge, and perhaps if you talk to me, you won’t even notice them. Or perhaps you will. If you don’t notice them, please trust me that I do feel them constantly. Lots of people throughout my life, including people who love me, pointed out the unusual nature of my communication style to me, sometimes more kindly and constructively, and sometimes less so.
I often have great difficulty starting a conversation, especially when there are many people around. Or even when there’s just one person, but I’m not sure about something. And when I do speak, I sometimes say things that people get offended by, even though I absolutely didn’t mean to offend or patronize—I just meant to be direct or precise, which is supposed to be a good thing, but in that context, someone decided that it’s bad and misunderstood me. I completely fail to understand small talk in all languages (although perhaps it’s more related to item 2 or 3 in the list).
You may think that it’s just “shyness” or “awkwardness”, and in simple human language it’s kind of correct, but “autism” is more scientifically defined, and here’s the really important part: since it comes with a bunch of other traits, which are described later in this list, and which aren’t obviously related to “shyness”, it is, well, not just “shyness”. (Also, someone once described me as having “the opposite of stage fright”, and in some contexts this is a very good description, so I’m not always “shy”.)
Two: I have various difficulties understanding nonverbal communication. I usually understand spoken and written language well, often too well: I understand what people say literally, and I don’t easily “read between the lines”, whether written or spoken. It also repeatedly frustrates me that people read too much between the lines of what I said, which results in their “hearing” things I didn’t actually say or mean. I intensely crave harmony and coherence between what is said or written and what the reality is.
I’m also often bad at understanding facial expressions, hand gestures, and other elements of body language. It’s not like I don’t understand them at all, but throughout my life, people told me countless times that they tried to hint something to me, and I didn’t understand what they thought I should have. I also have trouble making gestures or facial expressions myself: people very often say that I have a weird smile or that they think that my face is angry, even though I’m totally not angry at that moment.
Related to this is also the fact that I cannot maintain eye contact for more than a split second with anyone except exactly three people: my spouse and two children. (Difficulty with eye contact is probably one of the best known autistic traits, but in the DSM, it’s a part of this wider trait.)
A selfie on Lilac Street in East Providence, Rhode Island, a place that is very meaningful and very random at the same time.
Three: I don’t entirely understand relationships, both professional and personal. Even with people I love the most. I have some friends, but not a lot. It’s not even necessarily bad, but it’s definitely noticeable. And if I wanted to make more friends, I wouldn’t totally know how; it happens according to some magic that I don’t get. It’s kind of easier for me to make friends based on shared interests (more on that later), and while having shared interests is probably helpful at making friends for all people, it’s much more acute for me. When I do get closer to a person, it’s hard for me to understand if they are a friend or just a good acquaintance with whom I have a shared interest. I also get fatigued after meeting with many people, for example, at family gatherings, or work and school events—not because I don’t like those people, but because being next to people, even people I love, quickly tires me.
Four: I often make all kinds of seemingly meaningless repetitive movements or sounds, and over the years people have told me many times that they are unusual or even disturbing. A few examples of repetitive things that I do are shaking my fingers and hands, especially the middle and ring fingers on the right hand; drumming with my teeth (if only I could record the amazing jazz, funk, and classic rock beats I make there!); twisting my facial hair; repeating weird words, usually when no one is listening; fidgeting with coins, guitar picks, nail clippers, or other small things. (If people tell me that those things are disturbing, I do my best to stop myself when I’m next to them. Autism is not a good excuse to disturb people if the autistic person can reasonably avoid it. But note that the word “reasonably” does a lot of work here: I can usually do it, and if I can’t, then I can usually just walk away. But some autistic people cannot, so please treat them with understanding, patience, and kindness.)
Five: I really love routines and certain ways of doing things, and I really hate being forced to change them without an exceptionally convincing reason. Example 1: I go to the same supermarket most of the time, and my shopping list is organized not just by the things I want to buy, but also by the sequence in which I’ll find them on my way from the entrance, through the aisles, and to the cashier, and I get horribly annoyed when a product I often buy is moved to another shelf. Example 2: I do most of the kitchen work at home, and I have a very specific way of organizing everything in the drawers, cupboards, and the dishwasher, and if something is not in its right place, I’ll get either horribly confused and dysfunctional, or very upset and possibly screaming (which is not good, but it may happen, and I cannot quite control it). Example 3: I hate moving to a new house or even moving furniture within the house. Those are just three examples out of dozens.
A photo with the Belarusian musician Lavon Volski, who has a song called “Nobody Man”, with the lyrics: “The Nobody Man knows everything much better than we all. The Nobody Man listened to Sonic Youth and read Albert Camus. The Nobody Man is me.” I didn’t read Albert Camus and I probably don’t know everything much better than everyone else, although some people sometimes say that I do. I do love Sonic Youth, though! Lavon got the reference immediately.
Six: I am very interested in certain things. Like, very. Some of those things are nearly lifelong, most notably languages, music, and public transit. Some are coming and going, like dog breeds (early 1990s), the history of Russian nationalism (from 1999 until 2004 or so, and occasionally coming back), Pink Floyd discography (coming and going every year or two), history of Scientology (coming and going from 1997 until 2014 or so), Free Software (since 1998), the Perl programming language (from 1999 until 2009), editing Wikipedia and related projects (since 2004), Belarus (since 2006, and still intensifying), Catalonia (since 2007), and various other things.
(Comment 1: To avoid any misunderstandings, it doesn’t mean that I am, or ever was, a Russian nationalist or a Scientologist. Comment 2: I don’t really know why some things become a special interest and others don’t. As far as I know, no one does. I think it’s one of the most interesting questions about autism.)
Seven: I experience sensory perception of some things that is different from the way most other people experience them. There are sounds that I hear well even though people next to me hear them very faintly or not at all. Sometimes those sounds greatly disturb me, even though they don’t disturb anyone around nearly as much. For example, the noise of aluminum snack packages and plastic bags makes me either unable to do anything or very irritated. And lately, as my son got into solving Rubik’s cubes, the sound of those things has been the absolute bane of my existence. Those things, which to most people are not much more than easy-to-ignore rustling or whirring, make my ears feel they are being jackhammered. Headphones sometimes help with this a bit, but not always.
Another related issue is that lightbulbs above a certain brightness (above 3000 K and 1000 lm) make me nearly blind and cause me great discomfort, even though others find them pretty usual or even convenient. Strobe lights at concerts are a disaster, too: I love concerts, and most concert lighting is fine, but strobe lights make me unable to look at the stage. And the smell of some home or office cleaning supplies completely overwhelms my senses to the point that I can’t function very much, even though other people in the same place barely notice it.
I also easily notice wrong spelling, punctuation, or fonts in texts—I wrote about an example of this here a few weeks ago. This may sound unrelated to other things in this list item, but my psychologist told me that it is related, so I guess it is.
This is a photo of the Merriam-Webster’s Collegiate Dictionary, twelfth edition. The letter ə (Latin schwa) in the word “dən” is printed using a different font.
And that’s the end of the list.
See how I said that I’m describing it “narrowly”, and I still had to write a list of seven items, with many sentences in each of them? That’s what makes autism complex, and it’s just the tip of this iceberg. The list above goes according to the seven basic autism diagnostic criteria in the DSM, which is the mainstream scientific, academic, professional definition. Those seven criteria appear on the first page of the Autism Spectrum Disorder description in the DSM; there are ten more pages of details, a lot of which are very interesting, and to a lot of which I conform, too, but this post is already getting too long.
But I really should also mention that in addition to the formal academic definition, there’s also the autistic culture, or, more widely, the neurodivergent community culture. It has loosely defined its own informal, but pretty well-pronounced traits, such as wearing (or not wearing) certain clothes, eating (or not eating) certain foods, having certain relationship practices, etc. It also has its own jargon words, such as “catastrophizing”, “delayed processing”, “double empathy”, “monotropism”, “shutdown”, “spiky profile”, “stimming”, and many more. I can’t find any of these terms in the DSM (although maybe I didn’t search well), but they are making their way into academic articles on the topic, and some of them may become completely mainstream and scientific someday. (Here’s one glossary of this jargon, here’s another. I love glossaries! Maybe I’ll compile one myself.)
Hugging my daughter, which is the real meaning of life. Some books in the back are Even-Shoshan Dictionary, Merriam-Webster Synonyms Dictionary, American Heritage Dictionary, Thurston Moore autobiography (Sonic Youth again!), Eliezer Ben-Yehuda biography, and Yehudit Ravits song book, and these are quite meaningful, too. A few seconds before this hug, she told me in Hebrew: “Dad, I know all the things that you love: other languages, books, and music.” She understands me so well.
This culture has developed in the last few decades, as the autistic community came together online and in real life and started figuring out things about itself that mainstream scientists and therapists were too slow to get. While it definitely doesn’t mean that the informal autistic community is right about everything or that its members agree about everything, I do get the impression that even though most people in it are not professional psychologists or neurologists, it is remarkably robust at understanding itself. Discovering this online community in 2025 was one of the most empowering things that ever happened to me; I feel like I absolutely belong there.
Autism explains a lot about me.
My love for editing Wikipedia, for example: a broken link, a poorly organized category of articles, an incorrect reference, a typo, a missing article about a topic I am familiar with—I’ve always known that I have a heightened sensitivity to those things, and I just couldn’t give it a name. When I saw that wikis let me easily correct them, I started doing it, and couldn’t stop. I’m certainly not saying that one has to be autistic to edit Wikipedia, but I’ve heard lots and lots of people saying over the years that there is a disproportionate number of autistic people among Wikipedia editors, and many of them possibly aren’t aware of their autism, just like I wasn’t aware of mine. (A lot of these claims are hypothetical or anecdotal, but I could find two data-driven surveys that substantiate this: Dutch Wikipedia editors survey 2018 and German Wikipedia editors survey 2025; if you know about more research on this, please do tell me.)
I’m Jewish, and although my family is not religious, we do try to have a nice meal every Friday evening. One of the traditions of these meals is to have two loaves of bread, usually a challah. Usually we just buy them in a store, but I baked these myself. They are braided like challah, but they are without egg, and they are made of rye flour, whereas usual challah is made of white wheat flour. I love rye bread. I also love sourdough, but I never tried baking it myself. I can’t say that I love making weird smiles in photos, but I just don’t quite know how to make non-weird smiles.
The same goes for my enormous love for languages and letters and texts and books—I learned to read early (thanks, mom!), and reading and writing were a fantastic way to learn and communicate at my own pace, without having to synchronize with people who keep talking and saying unexpected things. Books—and later, websites—have always been wonderful for me because I can reread them if I didn’t understand something, and they won’t get tired of my clarification questions.
Language in general fascinates me because it is the infrastructure of people’s communication, and I love how it is completely arbitrary, yet systematic; studying Linguistics in the university explained it all so well to me. Different linguists have different reasons for going into this field, but for me, an easy explanation is that trying to understand something about this infrastructure is my overcompensation for having frequent misunderstandings with so many people. And foreign languages are wonderful, too, because I’ve always felt different from most people, and foreign languages are one of the most notable and beautiful ways in which people are different and diverse. Each foreign language is a puzzle that can be solved with some effort, and solving this puzzle is endlessly rewarding. Put those things together, and bam, I became the specialist on languages in Wikipedia.
Same for music. Music is a sensory delight, and I now understand that I probably experience it far more intensely than other people do. When it has any kind of rhythm, it stimulates my body. When it has no clear rhythm, it stimulates my thinking (my favorite example of such piece of music is Piece for Jetsun Dolma by Thurston Moore, but there are many others). That’s why, for example, I love going to concerts, but I usually (albeit not always) prefer to do it alone: I’m there for the music itself, not for socializing. And that’s why music in general, and specific artists in particular (not only Sonic Youth and Pink Floyd, far from it) become my special interests and I easily learn their discographies, including full track lists, by heart. Is it any wonder that the first articles I edited in Wikipedia—in English, in Hebrew, and in Catalan—were about musicians?
The photos in this post mostly show Amir Aharoni, the point being that he is mostly just a dude who happens to be autistic. Neither of the very cool-looking dudes in this photo is Amir Aharoni. I don’t know who they are. If you are one of them, or if you know them, please tell me. I photographed them on the 1 train in the New York subway because they looked very Russian, which doesn’t necessarily mean that they actually are Russian, but which did make me fantasize for a moment that I am in the Moscow metro and not in New York. On that January day, I was at a Wikipedia event in Columbia University in the morning and at a Meshell Ndegeocello concert at the Blue Note in the evening, and I took a subway train to get from one point to the other. It was a day of absolute bliss because it included all my special interests. (Except the seating at the Blue Note. That club has mostly excellent music and mostly horrible seating arrangements. Like the two dudes in the photo, this probably doesn’t have much to do with autism.)
Same for public transportation systems. Those are systems, they are largely predictable, they aren’t chaotic like cars, their maps and schedules can be learned by heart. When I was eight or so in the late 1980s, I learned the map of the Moscow Metro with around 120 stations by heart. It wasn’t even intentional—I just wasn’t able not to learn it after taking the metro frequently and looking at this map. I could also take long bus rides in Moscow with my eyes closed and say exactly where the bus is at any time because I feel all the turns and stops. Like, I actually did it several times for fun, and my parents and friends were weirded out.
And the smell of subways! It’s more or less the same in the whole world. Some people don’t enjoy it, and I can understand why, but to me, it’s wonderful. When I moved to Israel, which didn’t have a working subway at all in 1991, I missed it, but when the Carmelit, the subway in Haifa, was reopened, I entered it and felt that wonderful aroma again. I’ve always known that it was not nostalgia for Moscow—it was the aroma of a system that I can appreciate. (Theoretically, I could put this special interest together with Wikipedia, too, but I don’t actually do it much. I only contribute a little to writing about subways and other public transit systems on Wikipedia. The people who do it are absolute heroes. I can’t tell for sure, of course, but it is quite possible that, um, some of them are autistic.)
Ironically, my great and prolonged interest in Wikipedia is perhaps a thing that delayed my realization that I’m likely neurodivergent. Being in the Wikipedia community and interacting with quite a lot of people who openly call themselves neurodivergent made me repeatedly wonder: “What’s special about them? Their description of how they experience the world is very similar to how I experience the world, and I’m not neurodivergent.” That was a mistake: I experience the world like that, and my neurodivergent friends experience the world like that, but most other people don’t. Which means that I am neurodivergent. I fully realized it only in 2025.
And one more thing. As I was reading the seventeen-page report that the psychologist gave me in the end of the diagnostic process, I found the part called “Behavioral Observations” particularly fun to read. It described how I behaved during the evaluation process in the psychologist’s office and how I filled the online forms for it. Among other things, it said:
He used the word “curious” many times throughout the evaluation.
This is a very good description of me, because I love being curious! I love discovering things, being asked an interesting or relevant question, and enthusiastically and explicitly acknowledging that something is, as a matter of fact, curious. At least to me. Some people would also describe this as a “verbal stim” in the autistic community jargon, and it’s perhaps appropriate. However, verbal stims are sometimes meaningless. While I do say meaningless words sometimes, when I say that something is curious, I mean it. And that’s also the most central thing that Wikipedia is about: truly endless curiosity, wanting to learn things, adding pieces to the perpetually incomplete puzzle, and sincerely wanting to help other people to learn those things more easily and freely.
Occasionally, I enjoy craft beer. I could describe how it’s also a sensory delight for me as an autistic person, but I won’t. Not every great thing is necessarily a sensory delight for autistic people. Good craft beer is tasty, that’s it. If you consume any alcohol, please do it responsibly and don’t drink too much, no matter how delicious or fun it is. Narragansett is a brewery in Rhode Island, not far from where I live at the moment, and it’s named after the area’s native people. Tuletorn is a microbrewery in Tallinn; in Estonian, “tule” means light and “torn” means “tower”, so “tuletorn” means “lighthouse”. Have I mentioned that I love languages?
Am I going to write a lot about autism here now? At the moment, I don’t plan to start writing explicitly about autism a lot. I mostly plan to keep writing nerdy things about Wikipedia and languages and maybe music and maybe random things from my life. In a way, this blog has been mostly about autism all along, just without calling it by this name, because I didn’t know it myself. But go figure, now that I know that it’s an important part of my personality and identity, perhaps I’ll start writing specifically about it.
Am I happy that I got the diagnosis? Yes, I am. Perhaps someday humanity’s attitude to this will completely change, and the diagnosis will have a different name, or become completely unnecessary. But with the way we work now, I’m happy to understand my personality better and have a name for it.
How is this understanding going to change my life? I don’t know! At the moment, I just hope that the few more decades that I probably have in this universe will be easier to navigate now that I know all this stuff. And maybe it won’t be much easier, and that’s OK, too; I’ve learned something, and if you’ve read at least some of this post, you’ve learned something, too. If it makes you behave more kindly to autistic people or to learn something interesting about yourself, that’s already a good thing.
(I was also diagnosed with ADHD, but I don’t yet have an idea of how to write a blog post about it. Trust me, however, that it’s very meaningful, too.)
Wired: “Which four languages do you speak? You lived in three countries, so I assume you know at least three, but I said ‘four’ out of politeness.”
This actually happened.
I’m a linguist, and when people find out about it, they very often ask how many languages I speak. I hate it, and many other linguists told me the same. Different linguists may explain it differently, but for me, the main I hate it is that I cannot easily answer it with a simple number. “Speak” is a spectrum—I speak some languages perfectly or nearly perfectly (Russian, Hebrew, English); some others I can speak not so perfectly (Catalan, and on a good day maybe Italian, Spanish, and Polish); yet others I fantasize that I speak, but I have almost no practice (Belarusian, French, Portuguese, Esperanto); a few more I can read a bit, but not speak (Ukrainian, Swedish, Arabic, Lithuanian).
But if you ask me—or, well, some other linguist—the same question in a cooler way, like in the example above, maybe we won’t hate it.
If you are reading this post and thinking that it is not about my being a linguist, you are very cool.
I have several weird and mostly useless super-powers. Some of them are actually super-powers that don’t have a rational explanation; I’ll leave them for other posts. The one I’ll talk about here does have a rational explanation.
There is a technology called Teletext. It began in the 1970s, was somewhat popular in some countries in the 1990s, and perhaps it still exists, but it is largely superseded now by websites and by the on-screen text from cable and satellite set-top boxes. It worked together with broadcast television: some extra data was sent together with the TV signal, and if your TV set supported Teletext, you could push a button on your remote control, and the picture would be fully or partially replaced with some letters or crude pixel graphics.
When Teletext replaced the picture partially, it was used to show subtitles for translation or accessibility. When it replaced the picture fully, it could show news, TV programming schedules, weather forecasts, tourist guides, government information, trivia games, and other things for which websites and set-top boxes are used today.
We moved to Israel in 1991 and bought a TV a few months later. I quickly figured out that our new TV set supports Teletext and learned to use it. I loved it! It’s possible that I used Teletext more than actually watching TV! On Israeli TV, the content of Teletext was mostly in Hebrew, and it was very useful for me for learning the language. The way in which you navigate the pages there is by typing a three-digit page number, and I still remember lots of those page numbers. I remember that 100s were for news and TV guides. There was also a range for children with games and stories, probably 300s. And there was a range of pages in English, mostly with tourist guides.
I loved it for a couple of years. I learned a lot of Hebrew from it. Back then, Israeli broadcast TV had only one channel operated by the government, and that’s the only one that our antenna could receive where we lived for the first couple of years. So we only watched the Israeli Channel 1, but it was great for me, because it had Teletext.
Then in 1993 we moved and got cable TV.
Oh boy.
Cable TV had dozens of channels,¹ but the most important one, of course, was MTV. The MTV we had in Israel was not the one from the U.S., but MTV Europe,² which was broadcast from the U.K. The music³ there probably made a stronger cultural impact on my personality than any other thing ever… but that is not really the topic of this post.
The topic of this post is Teletext. MTV Europe had Teletext!
But the Teletext on MTV Europe was broken! It had occasional Latin letters, but almost everything was written in gibberish in Hebrew letters.
It annoyed me greatly: I had already fallen in love with Teletext on Israeli Channel 1, and I immediately fell in love with the music on MTV, and I really wanted to make them work together for me. Initially, I thought that it’s just a malfunction and that something gets garbled because it’s cable and not antenna or because the signal comes from far away.
Pretty quickly, however, I started noticing patterns. It looked liked text—with words, spaces, punctuation, and sentences. And the crude pixel graphics looked OK. So I started trying to decipher it, and realized that the uppercase Latin letters just worked correctly, but the lowercase Latin letters were replaced with Hebrew letters!
The reason that it worked that way is that the people who set up Teletext on Israeli Channel 1 wanted to support both Hebrew and English, but the Teletext technology supported only 7-bit encodings. “Encoding” is a standard that gives numbers to each text symbol that appears on computers: letters, digits, punctuation marks, and so forth. There are many encodings. “7-bit” basically means that the computer can only understand 128 symbols (2⁷ = 128), which means that in a 7-bit encoding, there’s enough room for capital Latin letters, small Latin letters, digits, basic punctuation and math symbols, and not much more—certainly not a whole another alphabet. This was long before Unicode, an encoding with room for all the alphabets of the world, became widely available in the 2000s.
So the Israeli Broadcasting Authority probably got TV sets sold in Israel to show Hebrew letters instead of small Latin letters. Remember that I mentioned that the Israeli Teletext had a range of pages in English for tourists? It was all in capital letters! I actually did notice that it’s all-caps back when I started using it in 1991, but I didn’t pay attention to it, and I thought that it’s just a limitation of how Teletext works. It was, indeed, a limitation, but not the kind of limitation that I thought it was.
Luckily, the lowercase letters were replaced consistently according to a system that I figured out in a few minutes once I realized what was going on: ב was a, ג was b, and so forth. Now that I’m trying to reconstruct it, the symbol for א was probably not a letter, but ` or &.
So I made myself a table and started reading: music news, singles and albums charts in every European country, programming schedule, song playlists on some upcoming special broadcasts like MTV Unplugged, and even personal ads.⁴ At first, I was reading slowly because I had to peek at my list all the time, but after a few weeks, I memorized it and just read it fluently.
Like most families do, we had the TV set in our living room. My parents couldn’t understand why am I staring for hours at black screens with complete gibberish instead of staring at music videos with beautiful dancing people, but I guess that it didn’t bother them too much. I kept doing it for years.
Fast-forward to 1998. I started working as a sysadmin in a place with a lot of computers that were very old already then, but since it was in Israel, they had to support the Hebrew language. In one of my first days there, I noticed a colleague being annoyed about a printout from a dot-matrix printer. I looked at it and saw that the first word is Username. At first, it didn’t even occur to me that Uוםבמעוף is what it actually says.⁵ My colleague was frustrated because he expected it to be in English, but it was in gibberish in Hebrew letters.
You can imagine where it goes from here: I was able to read that printout with zero effort because the ancient server that produced it used the same Hebrew encoding system that Teletext used, and that I had been practicing for years on MTV! My colleague was impressed.
After that, I had no more opportunities to actually use this super-power, but if I ever see such text again, I’d still remember how to read it.
Useless, but fun to tell about to fellow geeks of languages and computers.
¹ It had several Russian channels, which changed a lot since we left Russia in 1991. The Soviet Union and its censorship completely disappeared, TV became diverse and commercial (for better and worse), and a bunch of new channels were added. It also had Arabic, German, Spanish, French, and Turkish channels, which were not so useful to us because we didn’t know the language, but since I do love learning languages, I occasionally watched them and tried to guess what words mean. I remember, for example, that I figured out myself that the words اليوم, heute, hoy, aujourd’hui, and bugün mean “today”—it’s a word that frequently appears on the screen in announcements of TV programs that will be shown later.
² We also had MTV Asia, which later became Channel [V], and which mostly showed Indian music.
³ On MTV Europe, it was the time of Eurodance (2 Unlimited, Dr. Alban, Haddaway), Britpop (Blur, Oasis, Suede), Grunge (Nirvana, Alice in Chains, Pearl Jam), other kinds of “Alternative” (R.E.M., Therapy?, Björk, dEUS), boy bands (Take That, Boyzone), vestiges of hair metal (Guns N’ Roses, Bon Jovi), and occasional hip hop and R&B (Coolio, Mariah Carey, Jodeci). Back then, I felt that the cool thing to do is to love Britpop, Grunge, and other “Alternative” things, and to hate Eurodance, boy bands, hair metal, and R&B. I changed my mind about it thanks to GusGus, Oliver Lake, and, believe it or not, my physical education teacher. But that’s a topic for a different post.
⁴ The personal ads were only for U.K., if I recall correctly. That was also the first time I saw personal ads from gays and lesbians. I remember being pleasantly surprised by how casual and normal they were. I knew what gay and lesbian people are, but back then, they were almost never discussed on the Russian TV, and not too much on Israeli TV either.
⁵ I had to type it backwards. Unfortunately, WordPress doesn’t allow me to use the <bdo> HTML tag or the unicode-bidi: bidi-override CSS rule. They are very rarely needed, but they would be appropriate here.
I’m a forty-five-year-old man, and I have realized only recently that a lot of grown-up and seemingly healthy people of various ages, genders, income levels, political persuasions, and professional and ethnic backgrounds often say and write words or whole sentences without having the slightest idea of what these words and sentences mean.
I am literally not able to do it.
I mean, it can definitely happen that I will say a word and what I mean by it is different from what a dictionary says. Or that I say a word and the person to whom I speaking understands it differently from how I do.
But when I say something, I know what it means, even if what I know about it is different from what you know about it.
But I am realizing that there are people that say things without understanding them, and they really don’t mind. I am certain that they don’t know what they are saying. And yet they say it.
I just can’t do it. If I hear a word or a name or an expression, and I don’t understand what it means, I won’t use it in my own speech until I understand it by asking someone or looking it up in a dictionary.
I just can’t do it. I don’t understand how it is possible. And yet I somehow need to live on the same planet with people who do it regularly.
Well over 200 languages are spoken in the African country of Cameroon. Two of them are French and English, which are official because colonialism. Many people in Cameroon know them, but definitely not everyone. Two other big languages are spoken in Cameroon and in many other countries: Fula and Arabic.
All the other languages spoken in Cameroon, however, are unique to that country. And in none of them there is an edition of Wikipedia. There are very successful editions of Wikipedia in English, French, and Arabic, and there is also a (much) smaller one in Fula, but not everyone in Cameroon knows these big languages, so if you speak one of them, and you don’t know English, French, Arabic, or Fula, and you want to read Wikipedia, well, tough luck.
Unless, that is, you want to write it yourself, in which case I’m here to help as much as I can.
There are other countries in Africa in which there are many languages, but there is some Wikipedia activity in some of them. In Ghana, there’s pretty good activity in Twi, Fante, Dagbani, Moore, and some other languages. In Nigeria, there’s pretty good activity in Hausa, Yoruba, Igbo, Fula, Tyap, and others. In Mozambique, there’s a serious attempt to start a Wikipedia in Makhuwa.
A dance competition in the city of Douala. African women dressed in matching black, red, and yellow outfits dancing on sand. Near them, men in costumes dancing and playing drums. Photo by Johnkekam from Wikimedia Commons, CC-BY-SA 4.0.
At the Wikimania conference, which took place in August 2024 in Katowice, Poland, I met Minette, a woman from Cameroon. As I usually do with people from all countries, I asked her which languages she speaks other than English, and since she was from Cameroon, I was especially excited. “Ngiemboon”, she answered. I had never heard about it before that, but that’s never a problem. I asked if she would be willing to try to start an edition of Wikipedia in her language, and she agreed, because why not.
So I added Ngiemboon as one of the languages that translatewiki supports. She started contributing translations of MediaWiki user interface strings in it, and then she told me that there’s another language in which she thinks there should be an edition of Wikipedia: Duala (also spelled Douala), the main local language of the city of the same name, which is the country’s largest city and economic capital. It’s not her own language, she said, but it’s one of the country’s most important languages, which many people learn as a second language, and she knows a few people who speak it and would be interested in helping the effort.
Besides, she loves songs in it. And isn’t that, like, one of the best reasons to want to help start a Wikipedia in a language?
So I configured Duala for translatewiki, too. And what do you know—her friends actually came in, contributed a lot of translations on translatewiki, and Duala became the first language of Cameroon to cross the threshold for inclusion in MediaWiki as a user interface language!
This is just the first achievement. The real road is still ahead: To create a full-fledged Wikipedia in their language, they need to start writing articles. They haven’t written any yet, but I am sure that with the energy they have, they can do it.
It doesn’t matter what is your native language—it has value, it can be written, it can be used on computers, and can be a way to share your knowledge. Wikipedia doesn’t have to be written in a “big” language like English, French, or Russian. There can be a Wikipedia in your language, and if you want to invest some effort in writing it, I’ll do all I can to help you do it.
I sometimes see a bad mistake in a Hebrew translation of a message in MediaWiki, the software that runs Wikipedia. I check who made it, and see that it’s myself, a few years ago.
I post something like this on social networks every few weeks, but this time it’s special.
Here’s the timeline of a silly bug that took way, way too long to fix, and it’s 100% my fault:
March 21, 2017, morning: I translate the message mobile-frontend-joined-years, which says “Joined X years go” next to the username on the mobile site. The message takes two parameters: $1 and $2. $1 is the gender of the user, so that it will be possible to write the verb “joined” in the correct grammatical gender. It’s irrelevant for English, but relevant for many languages. $2 is the number of years, so that it will be possible to write “year” or “years” correctly in “1 year”, “2 years”, “4 years”, etc. With Hebrew, the plural forms of years are a tad more complicated than in English: for the specific case of two years, a different plural ending is used: “3 shaním”, “7 shaním”, etc., but when it’s two, the word is “shnatáyim” (and the number is not written at all, because the ending already means that it’s 2). The way to do it in translations is to use the {{PLURAL}} syntax. In English, it looks like {{PLURAL:$2|$2 year|$2 years}}. In Hebrew (in transliteration into Latin letters), it would have to look like this: {{PLURAL:$2|shaná|shnatáyim|$2 shaná|$2 shaním}} (the first form is for 1 year, the second is for 2 years, the third is for some special cases where the singular ending is used even though the number is not 1, and the last form is for all other numbers).
March 21, 2017, midday: A few hours later, I notice that I made a mistake and used $1 everywhere instead of using $1 for gender and $2 for plural. The likely reason I noticed it at that time is that I checked what things have to be translated. Since I’ve tried to keep the translation into Hebrew at 100% most of the time since 2010, there are usually few of those. Even though I had translated it already a few hours earlier, this thing showed up because our translation auto-validation system noticed that the $2 parameter is missing from the translation, and messages with mistakes are shown together with messages that are not yet translated. And here’s where I made another mistake: I fixed the appearance of $1 to $2 within the “PLURAL” values, but I left it as $1 in the beginning. As a result, the algorithm was trying to determine the plural form based on the gender, which obviously cannot work.
And because now both $1 and $2 were used in the translation, our translation auto-validation system stopped noticing it as a mistake. I’m not blaming the system or the people who developed it. Auto-validation systems are built for catching the most obvious bugs. In theory, it can be improved to catch this bug, but it would require some work, and it’s probably not the most important thing to fix.
August 28, 2017: It came to my attention that “Joined 2 years ago” appears in Hebrew as “2 shaním” and not as “shnatáyim”. I checked the translation, but didn’t notice that it says “$2” instead of “$1”. Instead, I reported a bug. Two developers tried to look at it and help, but didn’t come to any conclusions.
July 23, 2020: A developer of the mobile web interface, who was probably going through old and forgotten bug reports, wondered whether this is still a bug. I said that it is.
July 24, 2020: The same developer noticed that the translation says PLURAL:$1 instead of PLURAL:$2 and wrote a comment in the bug report. I didn’t notice it, even though I reported the bug. I guess I can use “the summer of 2020 was all weird COVID days!!” as an excuse.
July 25, 2024: An anonymous Hebrew Wikipedia user noticed that issue again. I started investigating it, came upon my own old bug report, and finally noticed the correctly-identified problem in the translation.
Immediately after that, I fixed the translation.
Over those seven years, I had several opportunities to fix this bug, and I didn’t do it. Now, I finally did.
And you know, this is horribly embarrassing, but I am really happy that this Hebrew Wikipedia user reported this bug today. And I wonder why doesn’t this happen much more often and in many more languages. There are definitely more bugs like this—some of them are in Hebrew, and I just haven’t noticed them yet, and many of them are in other languages. For example, when I was fixing this bug in Hebrew, I noticed that the translation of the same message into Belarusian had the same problem, and fixed it. Every now and then, I sporadically notice bugs of this kind in all kinds of languages, and I fix them when it doesn’t require actually knowing the language. Why doesn’t it bother people more often that some things are incompletely or incorrectly translated?
It sometimes happens in people’s lives that someone tells them something that sounds true and obvious at the time. It turns out that it actually is objectively true, and it is also obvious, or at least sensible, to the person who hears it, but it’s not obvious to other people. But it was obvious to them, so they think that it is obvious to everyone else, even though it isn’t.
It happens to everyone, and we are probably all bad at consistently noticing it, remembering it, and reflecting on it.
This post is an attempt to reflect on one such occurrence in my life; there were many others.
(Comment: This whole post is just my opinion. It doesn’t represent anyone else. In particular, it doesn’t represent other translatewiki.net administrators, MediaWiki developers or localizers, Wikipedia editors, or the Wikimedia Foundation.)
There’s the translatewiki.net website, where the user interface of MediaWiki, the software that powers Wikipedia, as well as of some other Free Software projects, is translated to many languages. This kind of translation is also called “localization”. I mentioned it several times on this blog, most importantly at Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia, 2020 Edition.
Siebrand Mazeland used to be the community manager for that website. Now he’s less active there, and, although it’s a bit weird to say it, and it’s not really official, these days I kind of act like one of its community managers.
In 2010 or so, Siebrand heard something about a bug in the support of Wikipedia for a certain language. I don’t remember which language it was or what the bug was. Maybe I myself reported something in the display of Hebrew user interface strings, or maybe it was somebody else complaining about something in another language. But I do remember what happened next. Siebrand examined the bug and, with his typical candor, said: “The fix is to complete the localization”.
What he meant is that one of the causes of that bug, and perhaps the only cause, was that the volunteers who were translating the user interface into that language didn’t translate all the strings for that feature (strings are also known as “messages” in MediaWiki developers’ and localizers’ jargon). So instead of rushing to complain about a bug, they should have completed the localization first.
To generalize it, the functionality of all software depends, among many other things, on the completeness of user interface strings. They are essentially a part of the algorithm. They are more presentation than logic, but the end user doesn’t care about those minor distinctions—the end user wants to get their job done.
Those strings are usually written in one language—often English, but occasionally Japanese, Russian, French, or another one. In some software products, they may be translated into other languages. If the translation is incomplete, then the product may work incorrectly in some ways. On the simplest level, users who want to use that product in one language will see the user interface strings in another language that they possibly can’t read. However, it may go beyond that: writing systems for some languages require special fonts, applying which to letters from another writing system may cause weird appearance; strings that are supposed to be shown from left to right will be shown from right to left or vice versa; text size that is good for one language can be wrong for another; and so forth.
In many cases, simply completing the translation may quietly fix all those bugs. Now, there are reasons why the translation is incomplete: it may be hard to find people who know both English and this language well; the potential translator is a volunteer who is busy with other stuff; the language lacks necessary technical terminology to make the translations, and while this is not a blocker —new terms can be coined along the way—, this may slow things down; a potential translator has good will and wants to volunteer their time, but hasn’t had a chance to use the product and doesn’t understand the messages’ context well enough to make a translation; etc. But in theory, if there is a volunteer who has relevant knowledge and time, then completing the translation, by itself, fixes a lot of bugs.
Of course, it may also happen that the software actually has other bugs that completing the localization won’t fix, but that’s not the kind of bugs I’m talking about in this post. Or, going even further, software developers can go the extra mile and try to make their product work well even if the localization is incomplete. While this is usually commendable, it’s still better for the localizers to complete the localization. After all, it should be done anyway.
That’s one of the main things that motivate me to maintain the localization of MediaWiki and its extensions into Hebrew at 100%. From the perspective of the end users who speak Hebrew, they get a complete user experience in their language. And from my perspective, if there’s a bug in how something works in Wikipedia in Hebrew, then at least I can be sure that the reason for it is not that the translation is incomplete.
As one of the administrators of translatewiki, I try my best to make complete localization in all languages not just possible, but easy.¹ It directly flows out of Wikimedia’s famous vision statement:
Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.
I love this vision, and I take the words “Every single human being” and “all knowledge” seriously; they implicitly mean “all languages”, not just for the content, but also for the user interface of the software that people use to read and write this content.
If you speak Hindi, for example, and you need to search for something in the Hindi Wikipedia, but the search form works only in English, and you don’t know English, finding what you need will be somewhere between hard and impossible, even if the content is actually written in Hindi somewhere. (Comment #1: If you think that everyone who knows Hindi and uses computers also knows English, you are wrong. Comment #2: Hindi is just one example; the same applies to all languages.)
Granted, it’s not always actually easy to complete the localization. A few paragraphs above, I gave several general examples of why it can be hard in practice. In the particular case of translatewiki.net, there are several additional, specific reasons. For example, translatewiki.net was never properly adapted to mobile screens, and it’s increasingly a big problem. There are other examples, and all of them are, in essence, bugs. I can’t promise to fix them tomorrow, but I acknowledge them, and I hope that some day we’ll find the resources to fix them.
Many years have passed since I heard Siebrand Mazeland saying that the fix is to complete the localization. Soon after I heard it, I started dedicating at least a few minutes every day to living by that principle, but only today I bothered to reflect on it and write this post. The reason I did it today is surprising: I tried to do something about my American health insurance (just a check-up, I’m well, thanks). I logged in to my dental insurance company’s website, and… OMFG:
What you can see here is that some things are in Hebrew, and some aren’t. If you don’t understand the Hebrew parts, that’s OK, because you aren’t supposed to: they are for Hebrew speakers. But you should note that some parts are in English, and they are all supposed to be in Hebrew.
For example, you can see that the exclamation point is at the wrong end of “Welcome, Amir!“. The comma is placed unusually, too. That’s because they oriented the direction of the page from right to left for Hebrew, but didn’t translate the word “Welcome” in the user interface.² If they did translate it, the bug wouldn’t be there: it would correctly appear as “ברוך בואך, Amir!“, and no fixes in the code would be necessary.
You can also see a wrong exclamation point in the end of “Thanks for being a Guardian member!“.
There are also less obvious bugs here. You can also see that in the word “WIKIMEDIA” under the “Group ID” dropdown, the letter “W” is only partly seen. That’s also a typical RTL bug: the menu may be too narrow for a long string, so the string can be visually truncated, but it should happen at the end of the string and not in the beginning. Because the software here thinks that the end is on the left, the beginning gets truncated instead. This is not exactly an issue that can be fixed just by completing the localization, but if the localization were complete, it would be easier to notice it.
There are even more issues that you don’t notice if you don’t know Hebrew. For example, there’s a button with a weird label at the top right. Most Hebrew speakers will understand that label as “a famous website”, which is probably not what it is supposed to say. It’s more likely that it’s supposed to say “published web page”, and the translator made a mistake. Completing the translation correctly would fix this mistake: a thorough translator would review their work, check all the usages of the relevant words, and likely come up with a correct translation. (And maybe the translation is not even made by a human but by machine translation software, in which case it’s the product manager’s mistake. Software should never, ever be released with user interface strings that were machine-translated and not checked by a human.)
Judging by the logo at the top, the dental insurance company used an off-the-shelf IBM product for managing clients’ info. If I ask IBM or the insurance company nicely, will they let me complete the localization of this product, fixing the existing translation mistakes, and filing the rest of the bugs in their bug tracking software, all without asking for anything in return? Maybe I’ll actually try to do it, but I strongly suspect that they will reject this proposal and think that I’m very weird. In case you wonder, I actually tried doing it with some companies, and that’s what happened most of the time.
And this attitude is a bug. It’s not a bug in code, but it is very much a problem in product management and attitude toward business.
If you want to tell me “Amir, why don’t you just switch to English and save yourself the hassle”, then I have two answers for you.
The first answer is described in detail in a blog post I wrote many years ago: The Software Localization Paradox. Briefly: Sure, I can save myself the hassle, but if I don’t notice it and speak about it, then who will?
The second answer is basically the same, but with more pathos. It’s a quote from Avot 1:14, one of the most famous and cited pieces of Jewish literature outside the Bible: If I am not for myself, who is for me? But if I am for my own self, what am I? And if not now, when? I’m sure that many cultures have proverbs that express similar ideas, but this particular proverb is ours.
And if you want to tell me, “Amir, what is wrong with you? Why does it even cross your mind to want to help not one, but two ultramegarich companies for free?”, then you are quite right, idealistically. But pragmatically, it’s more complicated.
Wikimedia understands the importance of localization and lets volunteers translate everything. So do many other Free Software projects. But experience and observation taught me that for-profit corporations don’t prioritize good support for languages unless regulation forces them to do it or they have exceptionally strong reasons to think that it will be good for their income or marketing.
It did happen a few times that corporations that develop non-Free software let volunteers localize it: Facebook, WhatsApp, and Waze are somewhat famous examples; Twitter used to do it (but stopped long ago); and Microsoft occasionally lets people do such things. Also, Quora reached out to me to review the localization before they launched in Hebrew and even incorporated some of my suggestions.³
Very often, however, corporations don’t want to do this at all, and when they do it, they often don’t do it very well. But people who don’t know English want—and often need!—to use their products. And I never get tired of reminding everyone that most people don’t know English.
So for the sake of most humanity, someone has to make all software, including the non-Free products, better localized, and localizable. Of course, it’s not feasible or sustainable that I alone will do it as a volunteer, even for one language. I barely have time to do it for one language in one product (MediaWiki). But that’s why I am thinking of it: I would be not so much helping a rich corporation here as I would be helping people who don’t know English.
Something has to change in the software development world. It would, of course, be nice if all software became Freely-licensed, but if that doesn’t happen, it would be nice if non-Free software would be more open to accepting localization from volunteers. I don’t know how will this change happen, but it is necessary.
If you bothered to read until here, thank you. I wanted to finish with two things:
To thank Siebrand Mazeland again for doing so much to lay the foundations of the MediaWiki localization and the translatewiki community, and for saying that the fix is to complete the localization. It may have been an off-hand remark at the time, but it turned out that there was much to elaborate on.
To ask you, the reader: If you know any language other than English, please use all apps, websites, and devices in this language as much as you can, bother to report bugs in its localization to that language, and invest some time and effort into volunteering to complete the localization of this software to your language. Localizing the software that runs Wikipedia would be great. Localizing OpenStreetMap is a good idea, too, and it’s done on the same website. Other projects that are good for humanity and that accept volunteer localization are Mozilla, Signal, WordPress, and BeMyEyes. There are many others.⁴ It’s one of the best things that you can do for the people who speak your language and for humanity in general.
¹ And here’s another acknowledgement and reflection: This sentence is based on the first chapter of one of the most classic books about software development in general and about Free Software in particular: Programming Perl by Larry Wall (with Randal L. Schwartz, Tom Christiansen, and Jon Orwant): “Computer languages differ not so much in what they make possible, but in what they make easy”. The same is true for software localization platforms. The sentence about the end user wanting to get their job done is inspired by that book, too.
² I don’t expect them to have my name translated. While it’s quite desirable, it’s understandably difficult, and there are almost no software products that can store people’s names in multiple languages. Facebook kind of tries, but does not totally succeed. Maybe it will work well some day.
³ Unfortunately, as far as I can tell, Quora abandoned the development of the version in Hebrew and in all other non-English languages in 2022, and in 2023, they abandoned the English version, too.
⁴ But please think twice before volunteering to localize blockchain or AI projects. I heard several times about volunteers who invested their time into such things, and I was sad that they wasted their volunteering time on this pointlessness. Almost all blockchain projects are pointless. With AI projects, it’s more complicated: some of them are actually useful, but many are not. So I’m not saying “don’t do it”, but I am saying “think twice”.
I am one of the people who implemented the language selector on Wikipedia, one of the World Wide Web’s most multilingual sites. Because of that, and because I’ve loved languages since I was five, I’m generally obsessed with language selection interfaces everywhere: websites, apps, self-service kiosks, airplane entertainment systems, cars, smart headphones, and so on.
So I was obviously thrilled to see a language selector as a minor plot device in the movie Atlas starring Jennifer Lopez. Most movie critics were quick to pan it, but I’m occasionally curious about “so bad it’s good” movies and like many other people these days, I’m curious about the portrayal of artificial “intelligence” in art, so I bothered to watch it. It is indeed not too brilliant: J.Lo’s acting is pretty OK, and the story has some sensible ideas about AI, but it also has ideas that are very silly and self-contradicting, as well as too much CGI, too many references to the Terminator, Alien, and Blade Runner franchises, and a generally lazily-written script. Though it’s mildly entertaining, you probably have better ways to spend two hours.
However, if I don’t write something about the language selector there, who will? So let’s go:
A screenshot from Netflix. A futuristic interface for selecting languages: three columns of buttons with names of languages and flags. At the top, a closed caption: “Francais [speaks French]”. The other details are described in the rest of the post.
This selector appears at about 33 minutes into the movie.
A few general comments first.
Representing languages using flags is common in language selection interfaces, but it’s a very bad practice. This interface has many examples of why it’s bad, which I’ll discuss in detail.
If my calculations are correct, the movie mostly takes place in the year 2071. The language selector is designed to look like something from that year, but it actually looks a lot like a language selector from a contemporary video game, for example Brawl Stars:
(I’m not much of a gamer, but I’ve got a feeling that there are games whose language selectors are even more similar to the one in Atlas. If you have an example, let me know.)
Some of the languages in the Atlas selector are unusual and don’t quite exist as separately-named languages today. Are the producers suggesting that they’ll exist as independent software user interface languages in 2071? Are those inside jokes by people in the production crew? Or are those just goofs? I don’t know, but I’ll try to add a few guesses along the way. Please remember that those are just guesses.
I am failing to find logic in the order of the languages. It’s not alphabetical by the original language name, not by the English language name, not by ISO language code. Maybe it’s just random. Maybe it’s based on some currently-existing software. I just don’t know.
And of course, it’s generally weird that any software in 2071 needs a manual language selector, especially in the context shown in the film—setting up a piece of electronic equipment after turning it on for the first time. Already today, automatic language detection works fairly well in both text and audio, so by 2071, manual selection should be completely unnecessary. Quite likely, the production designers wanted to poke fun at modern software instead of imagining how it may actually look like in 2071.
Now, let’s finally take a look at the languages themselves, going by columns from left to right.
Right at the top, we have something quite odd. The label says “Hejazi”. It’s written in broken Arabic because the designers, as it very often happens, didn’t bother to ask native speakers to proofread. The letters appear disconnected and are written from left to right, and not from right to left. The flag is a bit similar to the Palestinian, Jordanian, and Sudanese flags, but with a different order of colors. According to Wikipedia, it was indeed used by the Kingdom of Hejaz, a short-lived country that existed for a few years after the First World War, and eventually merged with Saudi Arabia. Hejaz is a geographical region in the West of the Arabian Peninsula, and a particular variety of Arabic is spoken there, but to the best of my knowledge, the people who speak it mostly write in standard Arabic, which is treated separately here later.
Next we have German and Spanish, about which there isn’t much to add except that those languages are represented by the flags of Germany and Spain, even though both languages are spoken in multiple countries.
Chinese is also mostly uneventful—it uses the PRC flag and is just labeled “Chinese”, without “traditional” and “simplified” in parentheses.
Portuguese is represented by the flag of Brazil, even though it’s also spoken in Portugal, Angola, Mozambique, and several other countries.
Then we have Turkish, Tagalog, and Ukrainian, about which there’s not much to say, except that Ukrainian is present here, but Russian isn’t! Does it mean anything? No idea.
Not much to say about Czech and Italian.
English is represented by the United States flag and not by England, U.K., or India (which, depending on how you count, may be the nation with the largest number of English speakers). I don’t understand why is Spanish represented by a European country, while English and Portuguese are represented by American countries.
Not much to say about Korean, Swedish, and Japanese, but there’s a comment about Swedish later.
Finnish is labeled “Suomalainen”. This word describes a Finnish person, and is also used as the adjective “Finnish” for describing some things, but not the Finnish language.
Then we have “Arabic”. Like Hejazi, it’s written from left to right and in disconnected letters. The flag is fuzzy, but it’s probably the UAE one. Arabic is spoken in many countries, and over the years, I’ve seen lots of flags representing the Arabic language: Saudi Arabia’s Shahada flag, Palestinian, Jordanian, or UAE flags, the Arabic letter Ayin, etc.
“Bajan” is the Barbadian creole. Today it is spoken by many people, but not written much. Was there someone Barbadian in the filming crew? Does anyone suggest that it will be a big established language used in software user interfaces in 2071 or is it just a joke? (I didn’t know that “Bajan” is a word for describing the Barbadian culture before watching the film, and it’s probably the most useful thing I learned from it.)
Hausa is represented by the flag of Nigeria. This language is also spoken in Niger and in some other countries. It’s one of the world’s biggest languages, and it’s particularly important in all of Western Africa. Nigeria is a heavily multilingual country, however, and Hausa is just one of its four big languages, the other three being Yoruba, Igbo, and Fula. So it’s not a very good idea to use the Nigerian flag for this.
Catalan is represented by the Catalan independence activists’ flag, with the blue chevron and the star, known as Estelada. The official flag of the autonomous community of Catalonia is yellow with the four red stripes and without the chevron and the star, and it will probably remain its flag if it ever becomes independent. Are they hinting that Catalonia will achieve independence by 2071? Paying tribute to the fact that Catalan is heavily present in many websites and apps? Or just being sloppy?
“Kryuol” is the Jamaican English-based creole. I didn’t know that “Kryuol” is one of its names, but it looks like it appears on some websites, such as this, so it’s probably not a mistake. Like Barbadian, it’s not written much these days, but maybe it will be written more in the future.
“Sranan” is the language of Suriname, a creole based mostly on English and Dutch. There is a Wikipedia in it, but I haven’t seen it written elsewhere.
Next comes one of the oddest entries: “Åland”. Today, it is a name of an island, which is a Swedish-speaking self-administering territory of Finland. About thirty thousand people live there. There is an Åland Swedish dialect, and I cannot say how different it is from standard Swedish, which appears in this selector separately. Will it develop to an independent language by 2071? Maybe, but it’s still odd to see it in the list. Maybe Åland and Suriname will be revealed as the world centers of AI innovation in the sequel? (Netflix, if you’re producing a sequel and use this idea, consider giving me a lifetime ad-free subscription or something.)
And the last one is Azerbaijani. It’s written strangely. Like the names of other languages, its name is written in all-caps: “AZƎRBAYCANLI”. The third letter is Ǝ, which is the capital counterpart of ǝ. It is incorrect, because the name of this language must be written with the letter Ə, which is the capital counterpart of… ə! The small letters look the same, but the capital letters are different. It’s one of the most confusing things in the extended Latin alphabet, and the production designers fell for this trap. Also, the name of the language is usually written with the suffix -CA and not the suffix -LI. As it is with the name of the Finnish language in the same screen, this word is more appropriate for an Azerbaijani person than for the Azerbaijani language.
So there. Some of the issues are usual and common today: broken Arabic, and wrong character for Azerbaijani.
The most surprising thing is probably the dialects or creoles that are minor or barely existing today: Åland, Hejazi, Bajan, Sranan, Kryuol. Not something that is seen often. Since some of them are Caribbean, perhaps it’s Lopez’s tribute to her Puerto Rican background? But then why aren’t Haitian Creole and Papiamento there, considering that they are much more prominent? I have no answer.
If you see a language selector in any other movies or in any other interesting place, please let me know!
Practically every point in that blog post is either a meaningless generality written in corporatespeak or a demonstrable lie. You don’t need specialized engineering knowledge or access to internal information to see it. You just need common sense.
User feedback shows that with AI Overviews, people have higher satisfaction with their search results…
Which people? Everyone? I don’t. I sharply reduced my use of Google search because I no longer trust it.
… and they’re asking longer, more complex questions that they know Google can now help with.
The word “help” is doing a lot of work here. Google can output a piece of text in response. Is this piece of text actually helpful?
AI Overviews work very differently than chatbots and other LLM products that people may have tried out.
No, they don’t. They work exactly the same. Both technologies automatically produce some text that was not written by a human.
They’re not simply generating an output based on training data.
No. They are, in fact, simply generating an output based on training data.
When AI Overviews get it wrong, it’s usually for other reasons: misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available. (These are challenges that occur with other Search features too.)
This is one of the few true things in this blog post, but it shows why this feature is completely pointless!
I mean, it’s nice that she doesn’t blame the users here for writing bad queries, but admits that the software that her team developed is bad at interpreting them.
And here’s an even more important thing: Despite the long-standing impression that “you can find everything on Google”, the “AI” innovations of the last couple of years help us realize that there are actually many topics about which there is not a lot of info online. And large language models are not going to solve this problem.
This approach is highly effective.
What does this even mean? “We are able to show more ads and improve our bottom line for the last quarter?”
Overall, our tests show that our accuracy rate for AI Overviews is on par with another popular feature in Search — featured snippets — which also uses AI systems to identify and show key info with links to web content.
This is probably the biggest lie of all in that whle post.
There is no comprehensive test or measure for accuracy! It is logically impossible to make one!
At most, there is some internal metric that middle managers present to senior managers, and it may show that the rate is “positive” according to internal company logic. However, it has absolutely nothing to do with what millions of web users actually need.
This is comparable to metrics of quality of machine translation, such as BLEU and NIST. There are methodologies and formulas behind them, but they are only useful for discussions among researchers, developers, and product and project managers, and they have very limited usefulness at predicting the correctness of the translation of a text that hasn’t yet been tested. Developers have to use those metrics because project managers love metrics, but most of them admit that they are not very good, and such a metric can never become perfect.
In a small number of cases, we have seen AI Overviews misinterpret language on webpages and present inaccurate information.
Yes, thanks again for admitting that computers are not supposed to interpret language in the first place. Humans are supposed to do it.
I could go on, but I have better things to do, like publishing three longish blog posts of my own. One is coming very soon, and it’s going to be fun, at least for me.
In response to accusations of monopolistic behavior, Google has been saying for years that competition is just a click away. It’s true, and it’s good. My experience with DuckDuckGo in the last few days has been perfectly fine.
That said, Google should still be tried for monopolistic behavior. And I kind of wish that there was regulation that prevents the deliberate destruction of fundamental public goods operated by commercial companies, but I guess that it would be very hard to legislate.
In the meantime, let’s try not to be silent about Google’s lies, and let’s consider using the competitors.