February 8 CX Update: Fixed Infinite Loops, More Machine Translation Support, and Improved Suggestions

First of all, congratulations to all Content Translation users: There are now 50,000 published articles! In the Wikimedia Blog you can read more on that, along with a round-up of Content Translation’s first year.

Because of some technical issues, scheduled deployments of new features were again delayed for a few weeks. On February 4th they were finally resumed, and here are the most important updates:

  • If a user started a translation, deleted it, and then started it again, the translation interface would go into an “infinite loop” of loading, and become unusable. This is now fixed. (bug report)
  • Featured articles are now shown as suggestions only if there are no other useful suggestions to show. (bug report)
  • The link from the dashboard to the tool that shows articles that don’t exist in your language is removed, on the premise that the integrated suggestions are more useful.
  • Machine translation using Yandex is now available for Albanian, Armenian, Bashkir, Polish and Uzbek.

January 15 CX Update: Personalized Suggestions, More Machine Translations, and Other Fixes

Happy new year, happy birthday Wikipedia, and happy birthday ContentTranslation, which was deployed to the first eight languages a year ago!

ContentTranslation updates are back after a delay, during which there were no usual scheduled software deployments because of year-end holidays, fundraising and absences.

The most important recently-released feature is Personalized Suggestions. The suggestions tab now shows automatically selected suggestions of articles to translate according to the user’s editing history. This feature was developed together with Leila Zia, Ellery Wulczyn and Robert West from the Wikimedia Foundation’s Research team, as well as Jure Leskovec, Robert’s advisor at CS faculty at Stanford.

Translation using the Apertium engine from Hindi to Urdu is now enabled by default. Translation using Yandex was enabled from English, Ukrainian and Belarusian into Russian. More languages may be added soon.

Failure to publish a translation because of AbuseFilter is now shown more clearly: the AbuseFilter message is displayed to the translator, so it will be much easier to fix it. We are researching other common publishing errors daily in order to get them fixed, too.

The colors of the alerts and the notifications at the top of the translation interface were updated, and now they are in shades of red for errors and shades of green for positive notifications.

Finally, a bug which was showing incorrect data for the last week of the year in the statistics chart was fixed.

The team is now working on improving the suggestions system further, monitoring and fixing errors at restoring and publishing translations, improving performance, and upgrading the translation storage in a way that will allow machine translation developers to improve their translation engines (the “Parallel Corpora” feature). Expect more details about this in future posts.

November 1 CX Update: Starred Suggestions and Translation Interface Bug Fixes

After a delay in the deployment of new features to Wikimedia sites for the last couple of weeks, this week we are back to normal deployment schedule, and we have several significant updates.

The major new feature is the ability to mark suggested articles as something that you want to translate later by “starring” them (task description), as well as discarding suggestions in which you are not interested. This update is another step for making sophisticated and personalized lists of article to translate, which are designed to help translators be more efficient in completing the coverage of encyclopedic topics in their languages. For more details about the state of the Translation Suggestions, see the recently published Wikimedia Blog post: Article suggestions—a new feature for Content Translation.

Other than that, these bug fixes were deployed:

  • Images without caption were not properly published, which was confusing, because it appeared in the translation view, but not in the article. This is now fixed. (bug report)
  • Adding a red link was not, by itself, triggering auto-saving. Now it does. (bug report)
  • Long words in the titles of the columns in the translation interface were shown only partially. Now they are wrapped so the whole title would be seen. (bug report)

October 23 CX Update: Machine Translation and Suggestions in More Languages, and Other Fixes

Last week there was no automatic software deployment to Wikipedia sites for technical reasons, so there are relatively few CX updates this time. The usual update schedule is supposed to resume next week.

The following fixes were deployed recently:

  • While a translation in progress was being loaded, the translation column was empty. This was confusing. Now a loading indicator is shown at the top. (bug report)
  • The automatic selection of languages in which suggestions are shown is improved. (bug report)
  • Suggestions are now enabled in the following languages:
    • from English to all languages
    • from German, Hebrew, Italian, Polish, Swedish, Vietnamese, Finnish and Dutch to English
    • from Simple English to Gujarati and Hindi
    • from Swedish to Finnish
    • from Swedish and Norwegian Bokmål to Norwegian Nynorsk
  • New language pairs are added to Apertium machine translation:
    • Arabic to Maltese
    • Maltese to Arabic
    • Spanish to Italian
    • Italian to Spanish
    • Icelandic to Swedish
    • Swedish to Icelandic
    • Romanian to Spanish

October 8 CX Update: Suggestions in New Languages, Fixes in Interlanguage Links and RTL Images, and More

This week we were mostly working on new capabilities in the Suggestions feature, and we also fixed a few bugs:

  • When translating from a language written left-to-right to a language written right-to-left, images with explicit alignment were aligned incorrectly in the published article. This was fixed. (bug report)
  • Long names of languages could sometimes be badly displayed. This is improved now, even for smaller screens. (bug report)
  • The language and page selector was hidden when the window was resized. This is fixed now. (bug report)
  • The “interlanguage links” to pages that are not translated to these languages yet are supposed to appear in the gray color. This worked correctly in the Vector skin, but they appeared as red in Monobook and Modern skins. They now appear as gray in all three skins. (bug report)
  • We were still getting errors related to interlanguage links entry point updates, but they should be fixed now, such as appearance of irrelevant language variants (like Brazilian Portuguese) or broken gadgets. (bug report)
  • The second that you selected the language in the filter at the top of the dashboard, you had to click it twice. This was fixed. (bug report)

The suggestions feature was deployed to new languages: From English to Afrikaans, Asturian, Bengali, Galician, Gujarati, Malayalam, Tamil and Ukrainian, from Catalan to Occitan, and from Bulgarian to Macedonian.

October 1 CX Update: Translating From Any Namespace, Suggestions, Links and RTL Fixes, and more

There are a lot of updates in ContentTranslation this week!

The Suggestions feature is now enabled in more language pairs: from English to Arabic, Esperanto, Hindi, Dutch and Vietnamese, and also from Swedish to Danish. More languages coming soon! (task description)

In the languages in which the Suggestions feature is enabled, the suggestion list now has “infinite scroll”: as you scroll to its end, more suggestions will be deployed. (bug report)

The scrolling of the dashboard is now smoother and less jumpy when it has many items. This was especially relevant for the Suggestions tab, which is usually long. (bug report)

Though Content Translation is supposed to be able to translate wiki pages from any namespace and not just articles, it was sometimes impossible to load a page if it was not in the main namespace. For example, if you would try translate a page in the User namespace in the English Wikipedia to Spanish, the source page wouldn’t be loaded. This is now fixed and you really can translate pages from any namespace. In particular, this should be useful for translating help pages and policy pages in Help and Wikipedia namespaces, and also to the Medical Translation Project, in which particular stable versions of pages that are recommended for translation are kept in the Wikipedia space. (bug report)

The Universal Language Selector is now used for selecting the language of the source and the target languages filter in the dashboard. Several issues with the suggestions feature were fixed as well, such as the “null” language that sometimes appeared in the language selector in the suggestions list. (bug report)

Translations in progress couldn’t be deleted in some cases, in particular if several people worked on the same title. This is now fixed. (bug report)

The graphs in the Content Translation statistics page are now grouped by topic, to make reading the page shorter and easier to read. The different data graphs are now accessible using tabs on the top of the charts. (bug report)

While translating, links in the source column pointed to incorrect target: a page with the same title in the target wiki. Now they point correctly to a page in the source wiki. (bug report)

Some paragraph alignment issues were fixed. (bug report)

In wikis in languages written from right to left the top personal menu was displaying in the reverse direction. This was fixed. (bug report)

 

September 24 CX Update: Suggestions in New Languages, Fixes in Statistics and Link Adaptation, and More

We are getting to the end of the quarter and it comes with a big bunch of updates and bug fixes.

The translation suggestions feature was deployed to more language pairs:

A sea beach, with green hills in the back
Vatersay is an island in Scotland, and it’s the subject of the 3,000th published translation in the Catalan Wikipedia
  • English → French
  • English → Spanish
  • English → Russian
  • English → Chinese
  • English → Turkish
  • English → Japanese
  • English → Italian
  • Spanish → Catalan
  • Spanish → English
  • French → English

These pairs are based on the most popular pairs of languages for translation based on Content Translation statistics. There are still several issues with the translation suggestion feature, such as better selection of default languages, changing the selected languages, and others, which we plan to address very soon.

Two issues were addressed in the statistics page:

  1. If there were no translations to a language in the previous week, the statistics page would show a growth trend of “NaN%” (“NaN” means “not a number”). This was caused by a division-by-zero bug, and now it’s fixed and the number is shown correctly. (bug report)
  2. The titles and the labels of the charts on the Content Translation Statistics page, which was recently revamped, were updated to be more descriptive. Please update their translations to your language. (bug report)

Other notable updates:

  • Clicking a link in the source language for which there is no corresponding page in the target language was adding a useless link to the translation. This was fixed. (bug report)
  • When using Content Translation in Norwegian Bokmål, the “Find articles missing in your language” (“Finn sider som mangler på språket ditt”) tool was broken because it used a wrong language code. Now it works correctly. (bug report)
  • The issue of the server requests that Content Translation makes when loading an article for reading should be solved now (bug report), and it’s not causing irrelevant gray links.

Finally, we congratulate the Catalan Wikipedia upon publishing the 3,000th translated page: Vatersay, an island in Scotland.

Wrong Gray Interlanguage Links Bug Is Fixed

Last week we deployed a change that reduced the number of server requests that Content Translation makes when loading an article for reading (bug report). This introduced another bug, however: in some cases incorrect gray links with language codes such as “en-us” appeared in the list. This change was reverted, so wrong links don’t appear any longer, but the extra request is back as well. This should be fixed soon, hopefully without breaking other things.

September 17 CX Update: Translation Suggestions and Improved Statistics Page

Several major Content Translation software updates were deployed this week.

The first version of the new article suggestions feature is deployed on the Portuguese Wikipedia. It shows a “Suggestions” button in the translation dashboard, in addition to “In progress” and “Published”. In this first version, clicking the Suggestions button will show a list of featured articles in the English Wikipedia that don’t yet have a version in Portuguese. We plan to add more languages and more types of suggestions in the near future.

Several major updates were done to the Content Translation Statistics special page:

  • The number of pages that were published and later deleted is now shown. (task description)
  • The trend of translations per week is now shown in addition to the all-time tally. (task description)
  • The numbers of published translations and translations in progress were shown in separate charts. This was taking too much space, so now they are shown in one chart in different colors. (code change)
  • The numbers of translations to the current wiki’s language and to all languages were shown in the same chart, which made understanding the number for the current language hard, because by now it’s usually much smaller than the tally for all languages (you are translating a lot! It’s great!). Now these charts are separated, so you can clearly see the growth for your language separately. (code change)
  • Language names in the bar charts were sometimes overflowing on other chart elements. The display was adjusted so that now this shouldn’t happen. (bug report)

Another issue of note: Every time an article was loaded for reading, Content Translation was loading extra information from the server in order to display the gray interlanguage link that help you translate an article to your language. It is now possible to display this link without making this request, so we removed it and Content Translation will waste less time and bandwidth. Thanks to the tireless technical contributor Derk-Jan “TheDJ” Hartman for noticing this. This, however, introduced another problem: in some cases incorrect gray links with language codes such as “en-us” may appear in the list. We hope to fix this soon.

Pellegrino Turri is the 20,000th Article Created Using Content Translation

The article Pellegrino Turri, translated from English to Italian by user MassimoGuarnieri, is the 20,000th page published using Content Translation since the tool was first enabled as a beta feature in January 2015.

Turri was a 19th-century Italian inventor best known for building one of the first typewriters, which he made for a blind friend of his, Countess Carolina Fantoni da Fivizzano. Their story also inspired a novel, The Blind Contessa’s New Machine by Carey Wallace.

About 1200 articles have been published every week in all languages since the the Content Translation beta feature was enabled in Wikipedia in all languages in early July.

We are enormously thankful to each and every one of the many hundreds of people who are participating in this: new editors and veteran Wikipedians who translate articles, help others make translations better, report constructive bugs, write translation guides adapted to their home wikis, making useful feature suggestions, and fixing technical issues in their wikis that breaks ContentTranslation. We are humbled to see that our work is helping the editors community to fulfill Wikimedia’s famous mission statement—a world in which every single human being can freely share in the sum of all knowledge.

Stay tuned, as we are going to announce more updates that will make the translators’ and the editors’ work even more efficient and comfortable.