Editorial: Remove special-casing of U+200C and U+200D by mathiasbynens · Pull Request #3074 · tc39/ecma262

mathiasbynens · 2023-05-25T09:52:44Z

Unicode v15.1.0 makes both U+200C and U+200D ID_Continue characters, meaning we no longer need to explicitly special-case them for them to match IdentifierPart.

Fixes #3073

gibson042

This is excellent, and I have confirmed it's consistency with the Unicode Standard, Version 15.1.0 draft. But we probably shouldn't merge until it is actually released.

spec.html

mathiasbynens · 2023-05-25T16:36:39Z

This is excellent, and I have confirmed it's consistency with the Unicode Standard, Version 15.1.0 draft. But we probably shouldn't merge until it is actually released.

Indeed.

There is no “do not merge (yet)” issue label (should we add that?), so I’ve added “needs data” for now.

bakkot · 2023-05-25T16:47:29Z

There is no “do not merge (yet)” issue label (should we add that?)

You could mark it as a draft? But the comment will do fine also.

michaelficarra · 2023-05-25T17:56:05Z

spec.html

    <p>|IdentifierName| and |ReservedWord| are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. |ReservedWord| is an enumerated subset of |IdentifierName|. The syntactic grammar defines |Identifier| as an |IdentifierName| that is not a |ReservedWord|. The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in the latest version of the Unicode Standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.</p>
    <emu-note>
-      <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an |IdentifierName|.</p>
+      <p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|.</p>


Underscores are already a part of ID_Continue. Let's call out just our divergences here.

Suggested change

<p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an |IdentifierName|.</p>

<p>This standard specifies specific code point additions: U+005F (LOW LINE) is permitted as the first code point of an |IdentifierName| and U+0024 (DOLLAR SIGN) is permitted anywhere in an |IdentifierName|.</p>

That's not related to this PR, and shouldn't be part of it.

Separately, I think that's more confusing than the current wording (and also it is unrelated to this PR).

With the current wording, you might get misled about what's in Unicode, but you won't get misled about what's in this specification. Whereas with your suggested wording it's easy to read it as suggesting that LOW LINE is only permitted as the first code point in an IdentifierName, as opposed to being permitted anywhere.

And it's more important that this clearly convey what's in this specification than what's in Unicode.

We could probably find some other wording which is unambiguous about both but maybe not without adding more complexity than is warranted.

mathiasbynens · 2023-09-15T06:27:46Z

Unicode 15.1.0 was released earlier this week. Marking this PR as officially ready for review.

Unicode v15.1.0 makes both U+200C and U+200D `ID_Continue` characters, meaning we no longer need to explicitly special-case them for them to match `IdentifierPart`. Issue: tc39#3073

gibson042 approved these changes May 25, 2023

View reviewed changes

spec.html Outdated Show resolved Hide resolved

mathiasbynens added the needs data This PR needs more information; such as web compatibility data, “web reality” (what all engines do)… label May 25, 2023

mathiasbynens mentioned this pull request May 25, 2023

Add Unicode v15.1.0-sensitive IdentifierPart tests tc39/test262#3833

Merged

michaelficarra reviewed May 25, 2023

View reviewed changes

ljharb marked this pull request as draft May 25, 2023 18:31

mathiasbynens marked this pull request as ready for review September 15, 2023 06:27

bakkot added the editor call to be discussed in the next editor call label Sep 15, 2023

nicolo-ribaudo mentioned this pull request Sep 15, 2023

Remove special-casing of U+200C and U+200D babel/babel#15973

Merged

bakkot approved these changes Sep 19, 2023

View reviewed changes

michaelficarra removed the editor call to be discussed in the next editor call label Sep 19, 2023

michaelficarra approved these changes Sep 19, 2023

View reviewed changes

michaelficarra removed the needs data This PR needs more information; such as web compatibility data, “web reality” (what all engines do)… label Sep 19, 2023

michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Feb 21, 2024

Editorial: Remove special-casing of U+200C and U+200D (tc39#3074)

467819a

Unicode v15.1.0 makes both U+200C and U+200D `ID_Continue` characters, meaning we no longer need to explicitly special-case them for them to match `IdentifierPart`. Issue: tc39#3073

ljharb force-pushed the unicode-15.1.0 branch from 4059ed8 to 467819a Compare February 21, 2024 23:43

ljharb merged commit 467819a into tc39:main Feb 21, 2024

mathiasbynens deleted the unicode-15.1.0 branch February 22, 2024 07:11

leaysgur mentioned this pull request Nov 25, 2024

Update redundant regexp example for identifier name mdn/content#36956

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editorial: Remove special-casing of U+200C and U+200D#3074

Editorial: Remove special-casing of U+200C and U+200D#3074
ljharb merged 1 commit intotc39:mainfrom
mathiasbynens:unicode-15.1.0

mathiasbynens commented May 25, 2023 •

edited by ljharb

Loading

Uh oh!

gibson042 left a comment

Uh oh!

Uh oh!

mathiasbynens commented May 25, 2023

Uh oh!

bakkot commented May 25, 2023

Uh oh!

michaelficarra May 25, 2023

Uh oh!

bakkot May 25, 2023

Uh oh!

mathiasbynens commented Sep 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	<p>This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an \|IdentifierName\|.</p>
	<p>This standard specifies specific code point additions: U+005F (LOW LINE) is permitted as the first code point of an \|IdentifierName\| and U+0024 (DOLLAR SIGN) is permitted anywhere in an \|IdentifierName\|.</p>

Conversation

mathiasbynens commented May 25, 2023 • edited by ljharb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gibson042 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathiasbynens commented May 25, 2023

Uh oh!

bakkot commented May 25, 2023

Uh oh!

michaelficarra May 25, 2023

Choose a reason for hiding this comment

Uh oh!

bakkot May 25, 2023

Choose a reason for hiding this comment

Uh oh!

mathiasbynens commented Sep 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mathiasbynens commented May 25, 2023 •

edited by ljharb

Loading