Skip to content
This repository was archived by the owner on Jan 25, 2022. It is now read-only.
This repository was archived by the owner on Jan 25, 2022. It is now read-only.

Using Unicode locale ID vs BCP 47 in our spec #63

@nciric

Description

@nciric

@littledan this is a proposal we could work into our Locale spec, if we can get group to agree on the change.

Current spec (and most of the constructors) expect bcp-47 locale id. A cleaner approach would be to use Unicode locale ID, see here for differences:

http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#BCP_47_Conformance

It does not allow for the full syntax of [BCP47]:

  • No irregular or BCP47 grandfathered tags are allowed
  • No extlang subtags are allowed
  • A tag must not start with the subtag "x". Thus a privateuse (eg x-abc) can only be after a language subtag like "und"

It allows for certain additions:

  • For field separator characters, the "_" character can be used as well as the "-" used in [BCP47].
  • "root" to indicate the generic locale used as the parent of all languages in the CLDR data model.
  • Certain codes that are private-use in BCP-47 and ISO are given semantics by LDML.
  • Each macrolanguage has an identified primary encompassed language. That encompassed language is treated as an alias for the macrolanguage, and thus is replaced when canonicalizing.
  • The language tag may begin with a script rather than a language (specialized use only).

There are multiple problems with bcp-47 tags, from slightly annoying grandfathered tags (source of most Locale bugs in v8), to script mapping.

For example:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions