Skip to content

build: convert CLDR locale extraction from Gulp to Bazel tool#42230

Closed
devversion wants to merge 8 commits intoangular:masterfrom
devversion:build/cldr-bazel-2
Closed

build: convert CLDR locale extraction from Gulp to Bazel tool#42230
devversion wants to merge 8 commits intoangular:masterfrom
devversion:build/cldr-bazel-2

Conversation

@devversion
Copy link
Copy Markdown
Member

@devversion devversion commented May 21, 2021

See individual commits.

This PR initially contained the update to CLDR 39 too, but I've decided to switch back to CLDR 37 so that
we can land this (without having to wait for a new major). I've also compared old generated files with the new
generated files to make sure nothing is missing/changed in terms of the locale data.

@google-cla google-cla bot added the cla: yes label May 21, 2021
@zarend zarend added the area: build & ci Related the build and CI infrastructure of the project label May 21, 2021
@ngbot ngbot bot added this to the Backlog milestone May 21, 2021
@devversion devversion force-pushed the build/cldr-bazel-2 branch 7 times, most recently from ae7cff0 to 56fcaa6 Compare May 23, 2021 12:08
@devversion devversion changed the title [WIP] build: convert CLDR locale extraction from Gulp to Bazel tool build: convert CLDR locale extraction from Gulp to Bazel tool May 23, 2021
@devversion devversion requested a review from josephperrott May 23, 2021 12:41
@devversion devversion marked this pull request as ready for review May 23, 2021 12:41
@pullapprove pullapprove bot requested a review from AndrewKushnir May 23, 2021 12:41
@petebacondarwin petebacondarwin added action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release labels May 23, 2021
Copy link
Copy Markdown
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @devversion! I love the way you can skip from TS to Skylark and back so fluently. Here is my initial feedback. No serious blockers that I can see. Mainly just a bunch of NITs and suggestions.

@devversion
Copy link
Copy Markdown
Member Author

@petebacondarwin Thanks for reviewing so thoroughly! I've addressed the feedback. While doing the requested cleanup of a few things within the old extract.js script, I also went ahead and simplified the base currency generation a bit.

Previously, we generated two objects of base currencies. Once for the currencies.ts file in common/src/i18n, and one just for comparisons. I've changed it so that we only have to generate one type of BaseCurrencies object. The old extract.js script generated the two different objects so that it could do comparisons using toString() (the locale currencies do not contain the fraction digits, while the base currencies does; -> this is the reason for the addDigits flag in generateBaseCurrencies)

Copy link
Copy Markdown
Contributor

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I really like the changes you made to the binaries.

Copy link
Copy Markdown
Contributor

@AndrewKushnir AndrewKushnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devversion looks awesome! 👍

Thanks for documenting the logic, this is really helpful. While you have all the context, I was thinking that it might also be beneficial to add a README.md file (into the packages/common/locales/generate-locales-tool folder) to describe some aspects of the process and optmiizations from high-level perspective (while keeping the details in the code as well).

@devversion devversion force-pushed the build/cldr-bazel-2 branch from b2dcca8 to 6316245 Compare May 25, 2021 19:51
@devversion
Copy link
Copy Markdown
Member Author

@petebacondarwin Thanks! Addressed the feedback.
@AndrewKushnir I've added a little README. It's far from being very detailed, but might make some things more clear (and can serve as a foundation). Please have another look!

@devversion devversion added target: minor This PR is targeted for the next minor release and removed target: patch This PR is targeted for the next patch release labels May 25, 2021
@AndrewKushnir
Copy link
Copy Markdown
Contributor

@devversion great, thanks for adding the readme file 👍

alxhub pushed a commit that referenced this pull request Jun 14, 2021
Given that the locale files are now generated through
Bazel, the files are no longer checked-in and the
legacy TSC compilation fails due to imports resolving
to non-existent files. We fix this for the legacy
saucelabs job by copying the generated TS files into
the sources (which is acceptable for the isolated CI job)

PR Close #42230
alxhub pushed a commit that referenced this pull request Jun 14, 2021
The CLDR extraction tool has been reworked to run as part of Bazel.
This adds a initial readme explaining what the tool generates. It's
far from a detailed description but it can serve as foundation for more
detailed explanations.

PR Close #42230
alxhub pushed a commit that referenced this pull request Jun 14, 2021
In the past, the closure file has been generated so that all individual
locale files were imported individually. This resulted in a huge
slow-down in g3 due to the large amount of imports.

With 90bd984 this changed so that we
inline the locale data for the g3 closure locale file. Also the file
only contained data for locales being supported by Closure. For this a
list of locales has been extracted from Closure Compiler, as well as a
list of locale aliases.

This logic is prone to CLDR version updates, and also broke as part of
the Gulp -> Bazel migration where this logic has been slightly modified
but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3,
but the locale data had the name of the alias locale that provided the
data at index zero (which represents the locale name). Note that the
locale names at index zero always could differentiate from the requested
`goog.LOCALE` due to the aliasing logic. This just didn't come up before.

We simplify this logic by generating a `goog.LOCALE` case for all
locales CLDR provides data for. We don't need to bother about aliasing
because with the refactorings to the CLDR generation tool, all locales
are built (which also captures the aliases), and we can generate the locale
file on the fly (which has not been done before).

PR Close #42230
alxhub pushed a commit that referenced this pull request Jun 14, 2021
Within Google, closure compiler is used for dealing with translations.
We generate a closure-compatible locale file that allows for
registration within Angular, so that Closure i18n works well together
with Angular applications. Closure compiler does not limit its
locales to BCP47-canonical locale identifiers. This commit updates
the generation logic so that we also support deprecated (but aliased)
locale identifiers, or other aliases which are likely used within
Closure. We use CLDR's alias supplemental data for this. It instructs
us to alias `iw` to `he` for example. `iw` is still supported in Closure.

Note that we do not manually extract all locales supported in Closure;
instead we only support the CLDR canonical locales (as done before) +
common aliases that CLDR provides data for. We are not aware of other
locale aliases within Closure that wouldn't be part of the CLDR aliases.
If there would be, then Angular/Closure would fail accordingly.

PR Close #42230
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit to alxhub/angular that referenced this pull request Jun 16, 2021
alxhub added a commit that referenced this pull request Jun 16, 2021
alxhub added a commit that referenced this pull request Jun 16, 2021
alxhub added a commit that referenced this pull request Jun 16, 2021
alxhub added a commit that referenced this pull request Jun 16, 2021
@devversion
Copy link
Copy Markdown
Member Author

devversion commented Jun 18, 2021

It seems like I was too optimistic about no longer hard-coding the locales and aliases used by Closure Library for i18n. I've updated the closure locale file generation so that the files matches with the file from the Gulp generation.

I've also re-compared the public locale data, plus this time the closure locale file. Looks like this time we should be really safe (also in terms of g3 now). devversion/cldr-build-comparison@f948034

@jessicajaniuk
Copy link
Copy Markdown
Contributor

@devversion looks like this needs a rebase and a finished presubmit.

@devversion
Copy link
Copy Markdown
Member Author

@jessicajaniuk thanks for the heads up. Rebased. yeah, I'm waiting for this change to be ready in g3 (we want to manually test the change in an app that previously failed due to this change)

@AndrewKushnir
Copy link
Copy Markdown
Contributor

@devversion it looks like there is a merge conflict with the main branch (after merging other changes earlier). Could you please resolve the conflict when you get a chance?

I've also added the "blocked" label for now since we'd need to perform extra testing in Google's codebase before landing the change (to avoid rollbacks).

@AndrewKushnir
Copy link
Copy Markdown
Contributor

FYI, removing from the merge queue for now (by removing the "merge" label), since some extra checks should be performed in g3 before we can land this change.

This is a pre-refactor commit allowing us to move
the CLDR locale generation to Bazel where files would
no longer be checked-in, except for the `closure-locale`
file that is synced into Google3.
Converts the CLDR locale extraction script to a Bazel tool.
This allows us to generate locale files within Bazel, so that
locales don't need to live as sources within the repo. Also
it allows us to get rid of the legacy Gulp tooling.

The migration of the Gulp script to a Bazel tool involved the
following things:

  1. Basic conversion of the `extract.js` script to TypeScript.
     This mostly was about adding explicit types. e.g. adding `locale:
     string` or `localeData: CldrStatic`.

  2. Split-up into separate files. Instead of keeping the large
     `extract.js` file, the tool has been split into separate files.
     The logic remains the same, just that code is more readable and
     maintainable.

  3. Introduction of a new `index.ts` file that is the entry-point
     for the Bazel tool. Previously the Gulp tool just generated
     all locale files, the default locale and base currency files
     at once. The new entry-point accepts a mode to be passed as
     first process argument. based on that argument, either locales
     are generated into a specified directory, or the default locale,
     base currencies or closure file is generated.

     This allows us to generate files with a Bazel genrule where
     we simply run the tool and specify the outputs. Note: It's
     necessary to have multiple modes because files live in separate
     locations. e.g. the default locale in `@angular/core`, but the
     rest in `@angular/common`.

  4. Removal of the `cldr-data-downloader` and custom CLDR resolution
     logic. Within Bazel we cannot run a downloader using network.

     We switch this to something more Bazel idiomatic with better
     caching. For this a new repository rule is introduced that
     downloads the CLDR JSON repository and extracts it. Within
     that rule we determine the supported locales so that they
     can be used to pre-declare outputs (for the locales) within
     Bazel analysis phase. This allows us to add the generated locale
     files to a `ts_library` (which we want to have for better testing,
     and consistent JS transpilation).

     Note that the removal of `cldr-data-downloader` also requires us to
     add logic for detecting locales without data. The CLDR data
     downloader overwrote the `availableLocales.json` file with a file
     that only lists locales that CLDR provides data for. We use the
     official `availableLocales` file CLDR provides, but filter out
     locales for which no data is available. This is needed until we
     update to CLDR 39 where data is available for all such locales
     listed in `availableLocales.json`.
Introduces a few Starlark macros for running the new Bazel
CLDR generation tool. Wires up the new tool so that locales
are generated properly. Also updates the existing
`closure-locale` file to match the new output generated by the Bazel tool.

This commit also re-adds a few locale files that aren't
generated by CLDR 37, but have been accidentally left in
the repository as the Gulp script never removed old locales
from previous CLDR versions. This problem is solved with the
Bazel generation of locale files, but for now we re-add these
old CLDR 33 locale files to not break developers relying on these
(even though the locale data indicies are incorrect; but there might
be users accessing the data directly)
Given that the locale files are now generated through
Bazel, the files are no longer checked-in and the
legacy TSC compilation fails due to imports resolving
to non-existent files. We fix this for the legacy
saucelabs job by copying the generated TS files into
the sources (which is acceptable for the isolated CI job)
The CLDR extraction tool has been reworked to run as part of Bazel.
This adds a initial readme explaining what the tool generates. It's
far from a detailed description but it can serve as foundation for more
detailed explanations.
In the past, the closure file has been generated so that all individual
locale files were imported individually. This resulted in a huge
slow-down in g3 due to the large amount of imports.

With 90bd984 this changed so that we
inline the locale data for the g3 closure locale file. Also the file
only contained data for locales being supported by Closure. For this a
list of locales has been extracted from Closure Compiler, as well as a
list of locale aliases.

This logic is prone to CLDR version updates, and also broke as part of
the Gulp -> Bazel migration where this logic has been slightly modified
but caused issues in G3. e.g. a locale `zh-Hant` was requested in g3,
but the locale data had the name of the alias locale that provided the
data at index zero (which represents the locale name). Note that the
locale names at index zero always could differentiate from the requested
`goog.LOCALE` due to the aliasing logic. This just didn't come up before.

We simplify this logic by generating a `goog.LOCALE` case for all
locales CLDR provides data for. We don't need to bother about aliasing
because with the refactorings to the CLDR generation tool, all locales
are built (which also captures the aliases), and we can generate the locale
file on the fly (which has not been done before).
Within Google, closure compiler is used for dealing with translations.
We generate a closure-compatible locale file that allows for
registration within Angular, so that Closure i18n works well together
with Angular applications. Closure compiler does not limit its
locales to BCP47-canonical locale identifiers. This commit updates
the generation logic so that we also support deprecated (but aliased)
locale identifiers, or other aliases which are likely used within
Closure. We use CLDR's alias supplemental data for this. It instructs
us to alias `iw` to `he` for example. `iw` is still supported in Closure.

Note that we do not manually extract all locales supported in Closure;
instead we only support the CLDR canonical locales (as done before) +
common aliases that CLDR provides data for. We are not aware of other
locale aliases within Closure that wouldn't be part of the CLDR aliases.
If there would be, then Angular/Closure would fail accordingly.
With the refactoring from a Gulp task to a Bazel too, we tried switching
away from the hard-coded list of locales and aliases for the Closure
Locale file generation. After multiple attempts of landing this, it
turned out that Closure Compiler/Closure Library relies on locale
identifiers CLDR does not capture within it's `availableLocales.json`
or `aliases.json` data.

Closure Library does not use any unknown locale identifiers here. The
locale identifiers can be resolved within CLDR using the bundle lookup
algorithm that is specified as part of CLDR; instead the problem is that
the locale identifiers do not follow any reasonable pattern and
therefore it's extremely difficult to generate them automatically (it's
almost like we'd need to build up _all_ possible combinations). Instead
of doing that, we just use the hard-coded locales and aliases from the
old Closure Locale generation script.
@angular-automatic-lock-bot
Copy link
Copy Markdown

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

action: merge The PR is ready for merge by the caretaker area: build & ci Related the build and CI infrastructure of the project cla: yes risk: high target: minor This PR is targeted for the next minor release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants