Skip to content

Import translations from babel and cleveref#6852

Merged
laurmaedje merged 4 commits intotypst:mainfrom
eltos:eltos-translations
Oct 3, 2025
Merged

Import translations from babel and cleveref#6852
laurmaedje merged 4 commits intotypst:mainfrom
eltos:eltos-translations

Conversation

@eltos
Copy link
Contributor

@eltos eltos commented Sep 2, 2025

Imports all available translations from babel and cleveref.

Closes #6581 by following up on this comment:

This makes me think that we should probably have done that from the start. Just researching the LaTeX name data once and supporting all the languages they support immediately instead of having the terms decided by contributors one by one. I think that came up once in the past, but nobody ever followed up on it...

The PR has multiple commits:

  • 35d6a1f imports all translations, overwriting those that already existed in typst
  • 717ba1a reverts the overwritten terms, providing a clean diff which could be reviewed by a native speaker
    - term used by babel/cleveref
    + term used by typst as of today
  • 8595bb9 adds the translations to lang.rs
  • 4c27710 adds language constants and fixes a wrong assignment of Filipino/Tagalog

Overall, the PR does not change any of the existing translations, only adds missing terms and locals ones.

Import script

I used the following python script to parse babel and cleveref sources, extract the localized terms and generate typst translations. Since these contain many more terms than currently used by typst (some including abbreviations and plurals), I post the script here in case it might be useful in the future:

babel_cleveref_to_typst.py

Notes

  • Not all translations are complete in babel/cleveref, I have also imported incomplete translations if there is at least one term translated
  • Capitalization of terms frequently differs between babel/cleveref and typst (page vs Page), as well as between locals. As stated above, existing terms are kept as-is, including capitalization. New terms are imported as-is, with whatever capitalization is used by babel/cleveref for the respective locale.
  • For some locals, cleveref provides abbreviations (Fig. vs. Figure), but I chose to import the form babel provides. I noticed some inconsistent use of long-form vs. abbreviation in babel as well as typst translations. My feeling is, that this is more a question of editor style, but I don't know if some languages have a strong preference for either, a question which only a native speaker can answer, so I just kept them as-is.
  • Both babel and cleveref are licensed under the LaTeX Project Public License. I believe there is no problem, as this is not a derived work and "[The LPPL] license places no restrictions on works that are unrelated to the Work". EDIT: A notice was added to the notice file. However, I am not qualified to make a legal statement about the use of the extracted terms under typst's Apache-2.0 license.

Future

The following follow-up features might be useful, but are out of scope for this PR:

  • Add singular/plural terms and natively combine multiple adjacent reference in typst using the plural forms, simmilar to how multiple adjacent citations are combined
  • Provide a function to get translated terms in the different forms (long/abbreviation, singular/plural, capitalized/in-text) for use by templates or packages, such as https://typst.app/universe/package/smartaref

@eltos eltos force-pushed the eltos-translations branch from 036815e to 8354d2c Compare September 2, 2025 17:49
@eltos eltos marked this pull request as ready for review September 2, 2025 17:58
@laurmaedje
Copy link
Member

I believe there is no problem, as this is not a derived work

Is it not? But regardless, it should be good if we just add an entry to the NOTICE file. The hyphenation patterns Typst uses are also LPPL and LPPL and Apache-2 are compatible.

@eltos
Copy link
Contributor Author

eltos commented Sep 5, 2025

add an entry to the NOTICE file

Done

@mewmew
Copy link
Contributor

mewmew commented Sep 23, 2025

Thanks for working on this issue @eltos! Really happy to see the machine generated consistent handling of translations <3

@laurmaedje
Copy link
Member

laurmaedje commented Oct 3, 2025

I'd like to land this in 0.14, but #6619 introduced a few new terms and resulted in a few conflicts. If you have time to take a look, that would be appreciated! Otherwise, I'd do it myself.

I also plan to do rebase & merge (or fast-forward if possible) to keep the individual commits as they are very useful for posterity, so a force push would be ideal.

@eltos eltos force-pushed the eltos-translations branch from a9a35a5 to 9c340be Compare October 3, 2025 11:57
@eltos
Copy link
Contributor Author

eltos commented Oct 3, 2025

I have rebased onto main and added "footnote" translations while keeping the commit structure.
The terms "email" and "telephone" are unfortunately neither available in babel nor cleveref.

EDIT: NOTICE file edit squashed with import commit, PR description updated.

@eltos eltos force-pushed the eltos-translations branch from 9c340be to 4c27710 Compare October 3, 2025 12:10
@laurmaedje laurmaedje merged commit 4c27710 into typst:main Oct 3, 2025
8 checks passed
@laurmaedje
Copy link
Member

Thank you. This is great work!

@Andrew15-5
Copy link
Contributor

Are all ISO 639-1/2/3 language codes were included in the Lang impl? Does it add something useful if most of them are unused (more than 200, IIUC)?

@Fevol Fevol mentioned this pull request Oct 5, 2025
@eltos eltos mentioned this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consistent handling of i18n translations for Figure, Section, etc

4 participants