Skip to content

Conversation

@TrystanLea
Copy link
Member

@TrystanLea TrystanLea commented Jul 5, 2025

Proposal: Replace gettext with JSON-based i18n system

Reviewing pull requests I am reminded that I've left this topic of multilingual support hanging for far too long.

This is a proposal for simple integrated multilingual support using JSON language files, it's not complete yet, just an initial crude implementation. The batch conversion of all .po files and all "_(" functions to "tr(" was relatively painless.

Convert all .po files:

php Lib/po2json.php

Convert all "_(" functions:

find . -type f -name "*.php" -exec sed -i 's/_(/tr(/g' {} +

Still need to handle: dgettext functions

Why make this change?

  • Remove external dependencies: No more system-wide locale setup or gettext compilation.

What we're giving up

  • No complex pluralization support: Are we actually making use of gettext pluralization at the moment?
  • Translator tooling: Moving away from established .po file workflows
  • Memory efficiency: JSON parsing vs compiled .mo files (likely negligible impact for our use case, it seems fast enough)

Proposed approach

  • Simple JSON structure organized by domain/Module etc (matching current usage)

Questions for discussion

  1. Any concerns about losing complex pluralization support?
  2. Any concerns about losing gettext's context support (msgctxt) we are not using this at the moment I believe?

I think there's a case for making this change even if we loose perfect translation. All languages will be available out of the box as standard without system wide locale generation and that should encourage further translation as it will be more evident that locales are in use. I envisage adding both CLI tools for template language file generation and perhaps an integrated UI tool modifying the JSON files. I wonder if it's possible to use translation API's for first pass translations?

@reedy @chaveiro @gablau @alexandrecuer

@TrystanLea
Copy link
Member Author

Example language file:

https://github.com/emoncms/emoncms/blob/json_i18n/Modules/feed/locale/fr_FR.json

{
    "Tag": "Étiquette",
    "Feed ID": "Identifiant Flux",
    "Feed Interval": "Intervalle Flux",
    "Feed Start Time": "Temps de départ Flux",
    "Realtime": "Temps réel",
    "Daily": "Quotidien",
    "Feed API Help": "Aide de l'API Flux",
    "Feeds": "Flux",
    "Collapse": "Replier",
    "Expand": "Étendre",
    "Select all": "Tout sélectionner",
    "Unselect all": "Tout désélectionner",
    ...

@TrystanLea
Copy link
Member Author

TrystanLea commented Jul 5, 2025

Crude core.php implementation so far (initial concept thanks to copilot), need to specify context here, e.g which module, dgettext replacement etc.

https://github.com/emoncms/emoncms/blob/json_i18n/core.php#L261

function load_language_files($path, $context = false)
{
    // Determine current language
    global $session;
    $lang = isset($session['lang']) ? $session['lang'] : 'en_GB';
    if ($lang == 'en') $lang = 'en_GB';

    //echo "Loading language files for $lang in $path with domain $context<br>";

    // Build path to JSON translation file
    $json_file = rtrim($path, '/')."/$lang.json";
    if (file_exists($json_file)) {
        $translations = json_decode(file_get_contents($json_file), true);
        if (is_array($translations)) {
            if (!$context) {
                // If domain is messages, we can use the translations directly
                $GLOBALS['translations'] = $translations;
            } else {
                // For other context specific translations:
                if (!isset($GLOBALS['context_translations'])) {
                    $GLOBALS['context_translations'] = array();
                }
                $GLOBALS['context_translations'][$context] = $translations;
            }
        }
    }
}

function tr($text)
{
    return isset($GLOBALS['translations'][$text]) && $GLOBALS['translations'][$text] !== ''
        ? $GLOBALS['translations'][$text]
        : $text;
}

function ctx_tr($context, $text)
{
    if ($context && isset($GLOBALS['context_translations'][$context]) && isset($GLOBALS['context_translations'][$context][$text])) {
        // If context is set and translation exists in context, return it
        return $GLOBALS['context_translations'][$context][$text];
    }
    return $text;
}

@TrystanLea
Copy link
Member Author

This now includes a script that generates the language files, just run e.g for welsh :)

php Lib/gen_locale.php cy_GB

Then copy and paste the json into ChatGPT for a quick translation, it pretty good in welsh to be fair (as a welsh speaker) ;)
I had an issue with ' characters in the translation interfering with javascript string combination.. had to escape a few places..

@TrystanLea
Copy link
Member Author

Im actually quite happy with this now, i've got the context translations working, @reedy @chaveiro @gablau @alexandrecuer, could one of you test if this breaks anything, there are associated changes to the other non core modules that I can push up if this is a workable direction.

@gablau
Copy link
Contributor

gablau commented Jul 5, 2025

hi @TrystanLea,
I'll start by saying that I only read the code you wrote for this pull, so I didn't test the code.

However, the idea of ​​switching to json as a structure for translations seems good to me, and the idea of ​​abandoning gettext is even better.

Just a couple more things:

  1. how can we optimize translations for javascript files? see here
    Maybe by loading the json file of the selected language directly in the frontend?

  2. as I already did in my first attempt we could use redis server to cache translations so as not to have to load all the files at each HTTP request, maybe it will be faster?

If I find some time I will try to run this new version of emoncms.
And of course I will give a complete review to the Italian translation

Thanks for your work
Gabriele

@TrystanLea
Copy link
Member Author

TrystanLea commented Jul 6, 2025

Thanks @gablau great that you think it could be a good approach!

I do notice a few things I've missed that the gettext version is translating but this new version is not. I will hopefully get those sorted shortly.

  1. Yes we could look at doing the js translations differently.
  2. Agreed, caching could help or optimisation around menu translations as we end up loading every language file just for the menu system..

@alexandrecuer
Copy link
Contributor

alexandrecuer commented Jul 6, 2025

created a docker image with your version from the json_i18n branch

docker pull alexjunk/emoncms:alpine3.20_emoncms11.8.6

Did a quick test, and there were no errors during the built. There are some when you launch the container, probably more because of mosquitto PHP.
@TrystanLea: you should merge this:
openenergymonitor/Mosquitto-PHP#1

I switched from English to French without any problem in the admin module.
image

graph module stays in english but I guess it is still the gettext code (?)
image

@TrystanLea : the docker build does not install the symlinked modules as it searches in each module repo for a branch with the same name (json_i18n) which dont yet exists...when working on those modules, should be no problem if you create a branch with exactly the same name : json_i18n

@TrystanLea
Copy link
Member Author

Hmm, perhaps a script to automatically translate keys is not the best approach. Using the chat window in vs code and adding context there works pretty well and allows for more careful review of changes..

@TrystanLea
Copy link
Member Author

Thanks @reedy, that's a useful reference, I don't really see the benefit tbh:

  • It makes the original view/template files harder to read as your missing the source translation
  • It seems harder to track changes in the main language, the current approach provides a clear way to flag these changes as requiring translation attention.
  • It's a much larger migration effort as we'd have to generate all those keys..

@TrystanLea
Copy link
Member Author

A quick look at optimisation potential, it looks like loading the json translation files for every module that has a menu item adds about 3-3.5 ms on my laptop. This takes the time it takes to load the menu system from around 5-5.5 ms up to ~8.5 ms.

Loading a text heavy page such as the input/api page goes from around 21ms up to 25ms approx ~4ms additional in welsh vs the base english.

Suggests if we do optimise something it's probably the way the menu system is translated.

The easiest approach is probably to translate the menu system via the theme translation files.. this would be non perfect from a modularity perspective but could be a way to shave some milliseconds - if it is a problem which Im not sure that it is? Does 3 milliseconds matter?

@alexandrecuer
Copy link
Contributor

Thanks @reedy, that's a useful reference, I don't really see the benefit tbh:

* It makes the original view/template files harder to read as your missing the source translation

* It seems harder to track changes in the main language, the current approach provides a clear way to flag these changes as requiring translation attention.

* It's a much larger migration effort as we'd have to generate all those keys..

Sure there is a price to pay (more time for the migration) but it is more clean. Maybe something in between could be a solution ?
the transkation key could be :

  • the full english string for short messages, nobody will change those strings that are short and precise
  • a description for longer texts, as they are not so many
    If you change a description in the main language (english), you will not break the others translations. But maybe you prefer to fall back on the english words and so to see that you have to correct a translation ? Not sure to be clear enough anyway...

@TrystanLea
Copy link
Member Author

But maybe you prefer to fall back on the english words and so to see that you have to correct a translation

Thanks @alexandrecuer, Yes, I do prefer to fall back on the english words. With the logging of the old translation for reference we at least have a way to recover and edit if needed the translations that need fixing.

How about we use the current approach to start with, we could always review this in a year or so time, if small changes keep annoyingly breaking existing translations?

@TrystanLea
Copy link
Member Author

Added a bit of text to indicate the translation status of the selected language :) the status is built and saved to a json file when scripts/translation/status.php is ran.

image

@TrystanLea
Copy link
Member Author

TrystanLea commented Jul 17, 2025

Keen to merge this so that we can continue with other items without having to worry about merge conflicts :)

Rough plan for future development in a separate pull request / development effort at a later date:

  • Review and perhaps refactor the javascript translation implementation
  • Consider optimising the menu system translation, e.g hold menu translation in Theme/locale to avoid loading all language files just for the menu system.
  • Remove existing gettext files

@alexandrecuer
Copy link
Contributor

Keen to merge this so that we can continue with other items without having to worry about merge conflicts :)

Rough plan for future development in a separate pull request / development effort at a later date:

* Review and perhaps refactor the javascript translation implementation

* Consider optimising the menu system translation, e.g hold menu translation in Theme/locale to avoid loading all language files just for the menu  system.

* Remove existing gettext files

I will review the french translations for the input process descriptions after the merge

@TrystanLea TrystanLea merged commit c1663b4 into master Jul 17, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants