Skip to content

Assign the same CJK width to canonically equivalent strings#52

Merged
Manishearth merged 2 commits intounicode-rs:masterfrom
Jules-Bertholet:canonically-equivalent-eaw
May 22, 2024
Merged

Assign the same CJK width to canonically equivalent strings#52
Manishearth merged 2 commits intounicode-rs:masterfrom
Jules-Bertholet:canonically-equivalent-eaw

Conversation

@Jules-Bertholet
Copy link
Copy Markdown
Contributor

UAX 11:

Modern Rendering Practice. […] The set of characters with mappings to legacy character sets that have been assigned ambiguous width constitute a superset of the set of such characters that may be rendered as wide characters in a given context. In particular, an application might find it useful to treat characters from alphabetic scripts as narrow by default. Conversely, many of the symbols in the Unicode Standard have no mappings to legacy character sets, yet they may be rendered as “wide” characters if they appear in an East Asian context. An implementation might therefore elect to treat them as ambiguous even though they are classified as neutral here.

"Treat characters from alphabetic scripts as narrow by default" is the biggest change this PR makes. To achieve full canonical equivalence, we also need to adjust the width of a few mathematical symbols with diagonal strikethrough, and of U+0387 GREEK ANO TELEIA.

@Manishearth Manishearth merged commit d00d357 into unicode-rs:master May 22, 2024
@Jules-Bertholet Jules-Bertholet deleted the canonically-equivalent-eaw branch May 22, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants