Skip to content

feat: support locale-specific customFormatters#403

Merged
eemeli merged 20 commits intomessageformat:mainfrom
cdaringe:feat/add-locale-specific-custom-formatters
Aug 3, 2023
Merged

feat: support locale-specific customFormatters#403
eemeli merged 20 commits intomessageformat:mainfrom
cdaringe:feat/add-locale-specific-custom-formatters

Conversation

@cdaringe
Copy link
Contributor

@cdaringe cdaringe commented Jun 13, 2023

Problem Statement

Given a locale & a message with a custom formatter requirement, I want my compiled output to use a formatter tailored to that locale, but cannot due to limitations in the current API.

Closes #402

Motivating Context

Presume that some custom formatter is expensive to load for all supported locales.

In the current API, my formatter must do something like this:

formatter: (value, locale) => 
  isUs(locale)
  ? formatUs(value)
  : isMx(locale)
  ? formatMx(value)
//  ...SNIP N locales ...
  : isFinland(value)
  ? formatFn(value)
  : value
// compiled-output.js
import { fmt } from "formatters/formatter-that-knows-about-all-locales-under-the-sun"
export const msg = d => fmt(d.foo)

Presume that each formatting function is expensive, say 1kb. Now my app must install (N-1) kb of wasted formatters on each application load.

Solution

Support locale-aware custom formatters.

// compiled-output.js
import { fmt as fmt_enUS } from "formatters/en-US"
export const msg = d => fmt_enUS(d.foo)
// compiled-output.js
import { fmt as fmt_esMX } from "formatters/en-MX"
export const msg = d => fmt_enMX(d.foo)

In the new state, customFormatters do not need to bundle all locale concerns into a single module.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 13, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

"Alex Sexton <alexsexton@gmail.com>",
"Eemeli Aro <eemeli@gmail.com>"
"Eemeli Aro <eemeli@gmail.com>",
"cdaringe<cdaringe@gmail.com>"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that bad form 😄 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind this, but please do at least match the format of the other entries.

@cdaringe cdaringe force-pushed the feat/add-locale-specific-custom-formatters branch from 0ff135e to 2d2aa7c Compare June 13, 2023 22:56
@cdaringe cdaringe force-pushed the feat/add-locale-specific-custom-formatters branch from 2d2aa7c to 5e6ca52 Compare June 13, 2023 23:02
setFormatter(key: string) {
if (this.runtimeIncludes(key, 'formatter')) return;
let cf = this.options.customFormatters[key];
setFormatter(runtimeFormatterKey: string, originalFormatterKey?: string) {
Copy link
Contributor Author

@cdaringe cdaringe Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. setFormatter("phone__en", "phone")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments would be clearer for me with reversed order, i.e. "the key of the thing we're importing", and optionally "the alias we use for it internally". I'm also not a fan of really long names for local variables, esp. when combined with our 80 char width style.

/**
* @warn Provide the formatter implementation, but do not mutate the
* underlying function reference.
*/
Copy link
Contributor Author

@cdaringe cdaringe Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object.assign in the prior version was continuously writing to the formatting function. A function is an object, so of course this works. This behavior created some friction for me. In the prior implementation, the base formatter function-object was getting mutated via Object.assign on each call to setFormatter. In this new implementation, each formatting function/object gets its own protected memory. In this regard, the new locale specific formatters don't squash each others backing references within Object.assign. In other words, I've removed the mutability.

.bind() creates a new function... but doesn't bring the prototype. so, i have to splat in the prototype to make a copy. further, the runtime serializer downstream serializes the function reference to javascript with just a plain String(fn) call. if i do a .bind(), then the String(fn) turns into function() { [native code] }. That's why there is a custom toString() impl in there too.

TLDR, Object.assign was continuously crushing the formatter impl, but i needed each locale function protected/isolated, otherwise, the Object.assign would overwrite it.

This all works great, as demonstrated by the test suite :)

@cdaringe cdaringe changed the title feat: init locale-specific formatting modules feat: support locale-specific customFormatters Jun 13, 2023
formatter: LocaleModuleCustomFormatter;
arg?: 'string' | 'raw' | 'options';
id?: string;
module?: (locale: string) => string;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these keys could be deduped, but they were small enough i just dupe'd em (e.g. arg, id)

Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick glance, but I'll need to find time for a proper review. I'm a bit concerned about the expansion of state during the compilation; the plural arg is a bit badly named, but I would've thought that it was already tracking the current locale?

@cdaringe
Copy link
Contributor Author

cdaringe commented Jun 14, 2023

I'm a bit concerned about the expansion of state during the compilation

justifiably so! dig in at your convenience, let me know your thoughts

the plural arg is a bit badly named, but I would've thought that it was already tracking the current locale?

the plural/plurals doesn't track locale. it has a locale field, but it's not honest. it's really just the lang, disguised as the locale.

for recollection:

plurals = {
  // Record<LANG_STRING, PluralData>
  en: { cardinals: ..., lc: 'en', locale: 'en' }
}

Logically, this makes sense. A plural data-structure shouldn't know about a locale--plural rules are lang driven and not locale associative. If anything, a locale data-structure could have a reference to a plural ruleset, but not vice-versa.

Thus, i couldn't use the lc from plural. I need the actual/full locale code, e.g. en-US, not just the lang.

@cdaringe
Copy link
Contributor Author

cdaringe commented Jun 14, 2023

I've now also updated & fixed the docs.

@cdaringe
Copy link
Contributor Author

Hey @eemeli, i know i'm blowing up your mentions this week 😄 . I'll go silent in all other forums if it helps! Would you be willing to give me some 👀 on this anytime soon?

@cdaringe cdaringe requested a review from eemeli June 26, 2023 21:02
@eemeli
Copy link
Member

eemeli commented Jun 30, 2023

Hi, sorry for the delay here -- been on vacation (and still am...) & trying to stay afk. Back for a couple of days next week.

Reviewing this PR and how it fits in with the rest, I'm starting to get the sense that there's some premature caching happening with the '*' locale. This ends up being related to #361 as well. The current locale should be available within the plural object. See for example the usage here:

args.push(JSON.stringify(this.plural.locale));

Not sure if/when I'll have time for this myself, but for '*' the plurals array should probably be initialised not with all locales, but with the default locale and some marker indicating the wildcard. Then around here

const pl = (plurals && lc && plurals[lc]) || plural;

the lookup should set the locale in the plural object according to the key.

@cdaringe
Copy link
Contributor Author

cdaringe commented Jul 1, 2023

No worries. Please, ignore this, and enjoy your vacation :)

The current locale should be available within the plural object

Indeed! However, locale is not in actuality available. As I mention in my comment above, locale is not actually locale at all. It's pretty much just lang, which is too lossy.

@eemeli
Copy link
Member

eemeli commented Jul 2, 2023

plural.locale is only the primary subtag if you use * as the constructor locale; if you provide an array of locale identifiers there, they are used exactly.

This should probably be documented better.

@cdaringe
Copy link
Contributor Author

cdaringe commented Jul 12, 2023

Hm. Interesting. Shucks, I feel like that there may be a modeling issue in the existing plurals interface? I would ask "is a subtag a locale?". I'd suggest it is not. Either way, it's probably worth me learning OR us collaborating on what the correct nomenclature is for lang + region unambiguously. Perhaps PluralObject warrants some refinement?

array should probably be initialised not with all locales, but with the default locale ... the lookup should set the locale in the plural object according to the key

i'm ...kinda following, but not 100%. i'm not confident enough to send a patch on this yet. i think you're talking about in the Compiler::compile method, maybe creating a new PluralObject on the fly with narrowed stuff.

If something were to happen with the above, would it be a mandatory precursor to this PR?

@eemeli
Copy link
Member

eemeli commented Jul 17, 2023

Hm. Interesting. Shucks, I feel like that there may be a modeling issue in the existing plurals interface? I would ask "is a subtag a locale?". I'd suggest it is not. Either way, it's probably worth me learning OR us collaborating on what the correct nomenclature is for lang + region unambiguously. Perhaps PluralObject warrants some refinement?

Not entirely sure what you're asking for here. To clarify, plural.locale is a BCP 47 language tag identifying a locale. Such a locale identifier requires at least a language tag, but may also include other subtags, such as a region identifier.

The plural objects are created during the MF constructor call, in two different ways:

  1. If the given list of locales is explicit, we get an MF instance supporting exactly those locales.
  2. If the user asks for all locales with '*', plural objects are created for all supported languages, i.e. locales with only the primary tag (and pt-PT, but that's not relevant here).

The disparity in expectations arises when the second path is taken, and a user uses localeCodeFromKey to identify a supported locale code. If that key is something that looks like a BCP 47 locale identifier (e.g. en-CA as a key mapping to the en locale code), they may expect for that key to have some relevance for custom formatters, but it doesn't. Only the resolved locale code matters.

I don't think there's really a sensible way to get around this, and I'm not sure that it should be fixed. If you're doing complicated things, you should probably not use '*' but instead explicitly declare your supported locales.

array should probably be initialised not with all locales, but with the default locale ... the lookup should set the locale in the plural object according to the key

i'm ...kinda following, but not 100%. i'm not confident enough to send a patch on this yet. i think you're talking about in the Compiler::compile method, maybe creating a new PluralObject on the fly with narrowed stuff.

If something were to happen with the above, would it be a mandatory precursor to this PR?

No, that's optimisation that should be doable entirely separately. The next step with this PR is to get rid of the locale normalisation, additional state around that, and just use plural.locale. Also, please drop the docs updates; that gets pushed to the site from the main branch, so needs to wait for a version release.

Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

@cdaringe
Copy link
Contributor Author

😵 Got it! Thanks for setting me straight. I now understand much better and see your perspective.

I've adapted the PR to your request, and it has shrunk up a bit 😄 !

@cdaringe cdaringe requested a review from eemeli July 18, 2023 22:02
Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed and partial review; too many things on my to-do list atm...

Only got part of the way through the compiler; haven't looked yet at what's happening with the Object.assign/.bind() stuff or anything further than that.

Comment on lines +31 to +34
/* istanbul ignore next reason: impossible case */
if (typeof fn.module !== 'string') {
throw new Error(`expected string module name`);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type narrowing only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed! 72396da

Comment on lines +176 to +179
const formattingModuleRequest =
formatter && 'module' in formatter ? formatter.module : null;
const isLocaleSpecificFormattingModule =
typeof formattingModuleRequest === 'function';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporary variable is more confusing than not.

Suggested change
const formattingModuleRequest =
formatter && 'module' in formatter ? formatter.module : null;
const isLocaleSpecificFormattingModule =
typeof formattingModuleRequest === 'function';
const isLocaleSpecificFormattingModule = typeof formatter?.module === 'function';

Also, "isLocaleSpecificFormattingModule" is an excessively long name for a local variable. With changes below, you can probably drop it rather than renaming it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +190 to +192
if (!isLocaleSpecificFormattingModule) {
args.push(JSON.stringify(this.plural.locale));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to be reverted. We still want the locale as an argument because we should not assume that each locale will have an independent implementation. It should still be possible to e.g. say that all English (en, en-US, en-CA, en-UK, ...) use the same implementation, but possibly with some region-specific considerations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setFormatter(key: string) {
if (this.runtimeIncludes(key, 'formatter')) return;
let cf = this.options.customFormatters[key];
setFormatter(runtimeFormatterKey: string, originalFormatterKey?: string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments would be clearer for me with reversed order, i.e. "the key of the thing we're importing", and optionally "the alias we use for it internally". I'm also not a fan of really long names for local variables, esp. when combined with our 80 char width style.

@cdaringe
Copy link
Contributor Author

cdaringe commented Jul 24, 2023

  1. locale is now always a param in formatting fn invocation, per discussion
  2. long var names compacted, per discussion

[setFormatter] These arguments would be clearer for me with reversed order

This method is used in many places, and reversing the params would be a breaking change requiring a bulk edit in this module. Give me another review pass now that it's been through a first round of requested edits and assert that you still want this.

@cdaringe cdaringe requested a review from eemeli July 24, 2023 16:49
Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got through to the end. In general I think the approach is good, but we should be able to make the changes here more surgical; see inline comments.

Comment on lines 176 to 179
const formattingModuleRequest =
formatter && 'module' in formatter ? formatter.module : null;
const isModuleFn = typeof formattingModuleRequest === 'function';
if (!this.options.customFormatters[token.key]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const formattingModuleRequest =
formatter && 'module' in formatter ? formatter.module : null;
const isModuleFn = typeof formattingModuleRequest === 'function';
if (!this.options.customFormatters[token.key]) {
const isModuleFn = typeof formatter?.module === 'function';
if (!formatter) {

Copy link
Contributor Author

@cdaringe cdaringe Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wish we could do that, but TS doesn't allow you access an indexed field unless it's narrowed to an indexable type. string isn't indexable by string, so TS barfs. TS demands that you narrow to a form where 'module' is guaranteed on an object form indexable with formatter, so... no dice. nonetheless, in the spirit of your comment, i condensed it down to a single expression. a681a58

(...args: any[]): unknown;
id?: string | null;
module?: string | null;
module?: string | ((_: { locale: string }) => string) | null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should never be a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. fixed, 72396da


this.runtime[key] = Object.assign(
cf.formatter,
cloneFunction(cfo.formatter),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now why mutating the formatter can be a bad idea: When it's not a built-in formatter that we're providing, it means we're modifying a value that a user has provided.

So using .bind() here is appropriate. See below for why you ought to inline the code for that, as you had it previously.

} else if (isFormatterKey(key)) {
this.runtime[key] = Object.assign(
Formatters[key],
cloneFunction(Formatters[key]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, here the formatter is an internal one, so we don't need to clone it and can mutate it without ill effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's your call. i am biased to not mutating the underlying fn ever, out of what I perceive to be general purpose best practices, but that is entirely subjective :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just mutate it, and inline the cloneFunction() above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, 9ee2bf1

? {
id: identifier(cf.id),
module:
typeof cf.module === 'function' && this.plural.locale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking this.plural.locale is incorrect here. If it may be null, then either it needs to be coerced to an empty string, or the function needs to be able to deal with a null input.

Fixing this will let you drop the compile-module.ts change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, fixed 72396da

Comment on lines +55 to +57
export type CustomFormatter =
| DefaultCustomFormatter
| LocaleModuleCustomFormatter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're now always including the locale code, am I right that the changes to this signature should be rolled back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, otherwise I’m in no better position than when I started without the PR. the function form is precisely what lets me generate locale based module imports. The param is runtime, versus compile time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function form of module allows that. This is the signature of the function that'll be called during the formatting. After #403 (comment), it'll always include the locale argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. i misinterpreted. my bad. addressed in a681a58

formatter: LocaleModuleCustomFormatter;
arg?: 'string' | 'raw' | 'options';
id?: string;
module?: (_: { locale: string }) => string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this expect the locale to be wrapped within an object? Wouldn't it be simpler to use (locale: string) => string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it that way originally, but i changed it to the object form so it could be extended without being a breaking change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locale is good to keep as a separate argument. For example, that's the way it is in the MessageFormat constructor. Adding more arguments can be done later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i can get on board with that. refactored in ee31644

@eemeli
Copy link
Member

eemeli commented Jul 25, 2023

[setFormatter] These arguments would be clearer for me with reversed order

This method is used in many places, and reversing the params would be a breaking change requiring a bulk edit in this module. Give me another review pass now that it's been through a first round of requested edits and assert that you still want this.

Let's return to this after the current round of changes. My main concern is that with separate terms for the local variable name and the import name, those terms ought to correspond to the names used for the JS import { foo as bar } ... statement, where this is considered as "aliasing".

@cdaringe cdaringe requested a review from eemeli July 31, 2023 20:53
@cdaringe
Copy link
Contributor Author

Alrighty, pass #3, complete! Back to you!

formatter &&
'module' in formatter &&
typeof formatter.module === 'function';
if (!this.options.customFormatters[token.key]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!this.options.customFormatters[token.key]) {
if (!formatter) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, I think. See inline replies.

"Alex Sexton <alexsexton@gmail.com>",
"Eemeli Aro <eemeli@gmail.com>"
"Eemeli Aro <eemeli@gmail.com>",
"cdaringe<cdaringe@gmail.com>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind this, but please do at least match the format of the other entries.

} else if (isFormatterKey(key)) {
this.runtime[key] = Object.assign(
Formatters[key],
cloneFunction(Formatters[key]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just mutate it, and inline the cloneFunction() above.

@cdaringe
Copy link
Contributor Author

cdaringe commented Aug 1, 2023

pass 4 complete. back to you! thanks for the feedback. i think the PR is in much better shape that it was on first pass

@cdaringe cdaringe requested a review from eemeli August 1, 2023 22:49
Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few code style issues in setFormatter(), but we can take care of them separately later.

The docs at docs/custom-formatters.md will need to get updated together with the next release. @cdaringe, would you be able/interested to submit a separate PR for that?

@cdaringe
Copy link
Contributor Author

cdaringe commented Aug 2, 2023

sound good. please see #408

@eemeli eemeli merged commit e0087bf into messageformat:main Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Support customFormatter with locale specific modules

2 participants