Editorial: Remove special-casing of U+200C and U+200D#3074
Conversation
gibson042
left a comment
There was a problem hiding this comment.
This is excellent, and I have confirmed it's consistency with the Unicode Standard, Version 15.1.0 draft. But we probably shouldn't merge until it is actually released.
Indeed. There is no “do not merge (yet)” issue label (should we add that?), so I’ve added “needs data” for now. |
You could mark it as a draft? But the comment will do fine also. |
| <p>|IdentifierName| and |ReservedWord| are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. |ReservedWord| is an enumerated subset of |IdentifierName|. The syntactic grammar defines |Identifier| as an |IdentifierName| that is not a |ReservedWord|. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode Standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.</p> | ||
| <emu-note> | ||
| <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an |IdentifierName|.</p> | ||
| <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|.</p> |
There was a problem hiding this comment.
Underscores are already a part of ID_Continue. Let's call out just our divergences here.
| <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|.</p> | |
| <p>This standard specifies specific code point additions: U+005F (LOW LINE) is permitted as the first code point of an |IdentifierName| and U+0024 (DOLLAR SIGN) is permitted anywhere in an |IdentifierName|.</p> |
There was a problem hiding this comment.
That's not related to this PR, and shouldn't be part of it.
Separately, I think that's more confusing than the current wording (and also it is unrelated to this PR).
With the current wording, you might get misled about what's in Unicode, but you won't get misled about what's in this specification. Whereas with your suggested wording it's easy to read it as suggesting that LOW LINE is only permitted as the first code point in an IdentifierName, as opposed to being permitted anywhere.
And it's more important that this clearly convey what's in this specification than what's in Unicode.
We could probably find some other wording which is unambiguous about both but maybe not without adding more complexity than is warranted.
|
Unicode 15.1.0 was released earlier this week. Marking this PR as officially ready for review. |
Unicode v15.1.0 makes both U+200C and U+200D `ID_Continue` characters, meaning we no longer need to explicitly special-case them for them to match `IdentifierPart`. Issue: tc39#3073
4059ed8 to
467819a
Compare
Unicode v15.1.0 makes both U+200C and U+200D
ID_Continuecharacters, meaning we no longer need to explicitly special-case them for them to matchIdentifierPart.Fixes #3073