Skip to content

fix: Gettext formatter merges duplicate plural entries#2347

Merged
andrii-bodnar merged 22 commits intolingui:mainfrom
garikkh:gettext-plural-dups
Dec 10, 2025
Merged

fix: Gettext formatter merges duplicate plural entries#2347
andrii-bodnar merged 22 commits intolingui:mainfrom
garikkh:gettext-plural-dups

Conversation

@garikkh
Copy link
Contributor

@garikkh garikkh commented Oct 14, 2025

Description

Fixes #2346

When using po-gettext formatter, plural calls with identical strings but different variables:

<div>
  {plural(count, {
    one: "one book",
    other: "many books",
  })}
</div>
<div>
  {plural(anotherCount, {
    one: "one book",
    other: "many books",
  })}
</div>

generated duplicate msgid entries in PO files, which violates PO format rules.

#. js-lingui:icu=%7BanotherCount%2C+plural%2C+one+%7Bone+book%7D+other+%7Bmany+books%7D%7D&pluralize_on=anotherCount
#: src/App.tsx:68
msgid "one book"
msgid_plural "many books"
msgstr[0] "one book"
msgstr[1] "many books"

#. js-lingui:icu=%7Bcount%2C+plural%2C+one+%7Bone+book%7D+other+%7Bmany+books%7D%7D&pluralize_on=count
#: src/App.tsx:61
msgid "one book"
msgid_plural "many books"
msgstr[0] "one book"
msgstr[1] "many books"

Added pre-processing step that merges duplicate plural catalog entries before serialization.

  1. Groups catalog entries by their plural case content (ignoring variable names)
  2. Merges entries with identical plural strings into a single catalog entry
  3. Combines all source location references from merged entries
  4. Prevents duplicate msgid/msgid_plural entries in generated PO files

With this change the PO now looks like:

#. js-lingui:icu={$var, plural, one {one book} other {many books}}&pluralize_on=count,anotherCount
msgid "one book"
msgid_plural "many books"
msgstr[0] "one book"
msgstr[1] "many books"

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Examples update

Fixes #2346

Checklist

  • I have read the CONTRIBUTING and CODE_OF_CONDUCT docs
  • I have added tests that prove my fix is effective or that my feature works
  • I have added the necessary documentation (if appropriate) No relevant docs

@vercel
Copy link

vercel bot commented Oct 14, 2025

@garikkh is attempting to deploy a commit to the Crowdin Team on Vercel.

A member of the Team first needs to authorize it.

@garikkh garikkh changed the title bugfix: Gettext formatter merges duplicate plural entries fix: Gettext formatter merges duplicate plural entries Oct 14, 2025
@garikkh garikkh marked this pull request as draft October 14, 2025 23:58
@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.70%. Comparing base (6bb8983) to head (7439a28).
⚠️ Report is 241 commits behind head on main.

Files with missing lines Patch % Lines
packages/format-po-gettext/src/po-gettext.ts 90.00% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2347      +/-   ##
==========================================
- Coverage   77.05%   76.70%   -0.35%     
==========================================
  Files          84      100      +16     
  Lines        2157     2739     +582     
  Branches      555      713     +158     
==========================================
+ Hits         1662     2101     +439     
- Misses        382      509     +127     
- Partials      113      129      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@garikkh
Copy link
Contributor Author

garikkh commented Oct 15, 2025

Ah, didn't work properly, need to continue working on it

@garikkh garikkh marked this pull request as ready for review October 15, 2025 23:22
@timofei-iatsenko
Copy link
Collaborator

timofei-iatsenko commented Oct 16, 2025

I think implementation is suboptimal, instead of introducing another type of comment, simply print the same comments many times

#. js-lingui:icu=%7BanotherCount%2C+plural%2C+one+%7Bone+book%7D+other+%7Bmany+books%7D%7D&pluralize_on=anotherCount
#. js-lingui:icu=%7Bcount%2C+plural%2C+one+%7Bone+book%7D+other+%7Bmany+books%7D%7D&pluralize_on=count
#: src/App.tsx:61
msgid "one book"
msgid_plural "many books"
msgstr[0] "one book"
msgstr[1] "many books"

When deserializing just take to consideration that there could be multiple comments, flat map 1 message + 1 comment and use existing methods to process that.

@garikkh
Copy link
Contributor Author

garikkh commented Oct 16, 2025

I considered that approach, but it seemed messier to me when you have many duplicates. In our codebase, we've only been using lingui a few months, but already have cases with ~7 duplicates. So now you'd have a PO entry with 7 ICU comments instead of 1 really long comment.

@garikkh
Copy link
Contributor Author

garikkh commented Oct 16, 2025

instead of introducing another type of comment

Also wanted to note, this isn't introducing another comment - it's just using the icu comment. the ICU comment is a URLSearchParams, so all I did is add another key here.

  const contextComment = item.extractedComments
    .find((comment) => comment.startsWith(ctxPrefix))
    ?.substring(ctxPrefix.length)
  const ctx = new URLSearchParams(contextComment)

This solution is a little more complex, but IMO is more elegant/consistent

@garikkh
Copy link
Contributor Author

garikkh commented Nov 10, 2025

bump @timofei-iatsenko

@timofei-iatsenko
Copy link
Collaborator

@garikkh i pushed a commit to your branch with a refactoring and some changes:

  • instead of extra other_pluralize_vars param it uses existing pluralize_on just listing all variables in it.
  • it doesn't use a first UCI message in a merged message, instead it replaces original value to some dummy var placeholder. There is no sense semntically to use a first occurance, it's confusing, so instead it make it generic.
  • i refactored code, removed duplication, added a typesafety for ctx object.
  • one extra change - i made an context comment more readable like so: js-lingui:icu={var, plural, one {one book} other {many books}}&pluralize_on=count,anotherCount,thirdCount,manyCount,superLongVariableNameIsOkayCount

I will update PR description and docs, so it could be used as release notes.

@timofei-iatsenko
Copy link
Collaborator

timofei-iatsenko commented Dec 10, 2025

Some TMS might not allow duplicate msgid, even with different comments, and should use mergePlurals option to merge these entries into a single PO entry during extraction.

From how I understand PO spec, the msgid should be uniq, so i'm not sure there are TMSes which would process duplicated msgid without errors. So i believe this mergePlurals should be enabled by default because disabling it leads to broken/invalid po file.

@vercel
Copy link

vercel bot commented Dec 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
js-lingui Ready Ready Preview Dec 10, 2025 3:40pm

@andrii-bodnar andrii-bodnar merged commit d72b23f into lingui:main Dec 10, 2025
13 checks passed
@garikkh
Copy link
Contributor Author

garikkh commented Dec 15, 2025

Thank you for the changes + merge!

Unfortunately the last change you mentioned breaks our TMS again

Same issue as in #2004 - if a comment gets too long, our TMS breaks it up into multiple lines, and multiple lines don't get parsed correctly in lingui

With this change, we have an entry:

#. jsi18nicu={hours, plural, one {{hours} hr} other {{hours} hrs}}&pluralize_on=hours
msgid "{hours} hr"
msgid_plural "{hours} hrs"
msgstr[0] "{hours} hr"
msgstr[1] "{hours} hrs"

That our TMS changes to:

#. jsi18nicu={hours, plural, one {{hours} hr} other {{hours}
#. hrs}}&pluralize_on=hours
msgid "{hours} hr"
msgid_plural "{hours} hrs"
msgstr[0] ""
msgstr[1] ""
msgstr[2] ""

And cannot be parsed when doing lingui compile step.

Looking for feedback here, either we revert the changes for having ICU readable, or change the compiling/PO parser to combine the comments. FWIW I attempted the latter, and its fairly involved.

@timofei-iatsenko
Copy link
Collaborator

@garikkh kchh 💩 what if we will add a way to override serialize/deserialize function for this context comment? So you would be able to override and do whatever you want with it.

@garikkh
Copy link
Contributor Author

garikkh commented Dec 17, 2025

@timofei-iatsenko how do you see that working?

Are you talking about an option to override these two functions: https://github.com/lingui/js-lingui/pull/2347/files#diff-b1e7f700a5d1ca961f671624ce8d6d9c4bbee951afafa93523db9393d0afdfd9R319-R365

Would you want to take it one step further and handle ctxPrefix in this step too? And remove the "customICUPrefix" config option?

@timofei-iatsenko
Copy link
Collaborator

@garikkh yes i'm speaking about these two functions.

Would you want to take it one step further and handle ctxPrefix in this step too?

I also have been thinking about this, but it seems it would be hard to find this comment from all others comment, and if we will change that - this will be a breaking change. But the idea is OK, i will investigate is it possible technically or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate PO entries when using gettext and plurals

3 participants