Skip to content

Conversation

@cmb69
Copy link
Member

@cmb69 cmb69 commented Jun 14, 2021

Canonicalization converts the locale to ICU format[1]. However, the
lookup described in RFC 4647, section 3.4, is about POSIX format. To
make that lookup work for ICU format, we also need to cater to keyword
separators.

The results are somewhat unexpected, but apparently canonical lookup is
explicitly supposed to return canonical language tags[2].

[1] https://unicode-org.github.io/icu/userguide/locale/#canonicalization
[2] https://github.com/php/php-src/blob/php-7.4.20/ext/intl/locale/locale_methods.c#L1504

Canonicalization converts the locale to ICU format[1].  However, the
lookup described in RFC 4647, section 3.4, is about POSIX format.  To
make that lookup work for ICU format, we also need to cater to keyword
separators.

The results are somewhat unexpected, but apparently canonical lookup is
explicitly supposed to return canonical language tags[2].

[1] <https://unicode-org.github.io/icu/userguide/locale/#canonicalization>
[2] <https://github.com/php/php-src/blob/php-7.4.20/ext/intl/locale/locale_methods.c#L1504>
@cmb69 cmb69 added the Bug label Jun 14, 2021
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really familiar with the area, but looks reasonable.

@cmb69 cmb69 closed this in 0f1b17e Jun 16, 2021
@cmb69 cmb69 deleted the cmb/72809 branch June 16, 2021 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants