Skip to content

fix: skip ascii symbol in process#31

Merged
JLHwung merged 3 commits intomathiasbynens:mainfrom
JLHwung:skip-ascii-symbol-process
Sep 29, 2023
Merged

fix: skip ascii symbol in process#31
JLHwung merged 3 commits intomathiasbynens:mainfrom
JLHwung:skip-ascii-symbol-process

Conversation

@JLHwung
Copy link
Copy Markdown
Collaborator

@JLHwung JLHwung commented Sep 17, 2019

In this PR we skip the processItem when it is a symbol-kind and has code point within ASCII range.

This fix will prevent - transpiled as \\x2D when it is not in character class. Besides that, it should speed up a little bit since ASCII symbols are pretty common.

@nicolo-ribaudo
Copy link
Copy Markdown
Collaborator

It would be good to have a test with - inside a character class, e.g. /[--\\]/

@JLHwung
Copy link
Copy Markdown
Collaborator Author

JLHwung commented Sep 18, 2019

@nicolo-ribaudo Good question! This PR does not touch - in character class by design.

Currently, every items in CharacterClass is added to a single regenerate set so that we can perform possible code range concatenations. However, the regenerate lacks of the token information: i.e. the kind of token where - is a symbol and \- is an identifier. Therefore, regenerate has to output escaped \\x2D even if it may be redundant within the context.

@mathiasbynens
Copy link
Copy Markdown
Owner

Did you see #16?

@JLHwung
Copy link
Copy Markdown
Collaborator Author

JLHwung commented Sep 18, 2019

Sorry I didn't see #16 before.

What's happening in #16 is that - is passed to regenerate and regenerate.toString returns \-, which is invalid when unicode flag is enabled, so regenerate now returns \x2D.

However, when - is a symbol-kind and not in a character class, it should not be transpiled at all. Note that \- is an identifier-kind, which means \- will still pass into regenerate and output \x2D.

@mathiasbynens
Copy link
Copy Markdown
Owner

I'm okay with making - not be escaped outside of character classes. It’s correct that this logic needs to live in regexpu-core and not regenerate, since regenerate’s output must work in any context (in and outside character classes).

But let’s keep the behavior for other ASCII symbols (in particular, non-printable ones) as-is.

break;
case 'value':
const codePoint = item.codePoint;
if (item.kind === "symbol" && codePoint >= 0x20 && codePoint <= 0x7E) {
Copy link
Copy Markdown
Collaborator

@nicolo-ribaudo nicolo-ribaudo Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as codePoint == 0x2D, since many ascii symbols must be escaped: [, {, |, +, ?, *, $, ...

Copy link
Copy Markdown
Collaborator Author

@JLHwung JLHwung Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaped character, i.e. \[ is identifier kind instead of symbol kind. If regjsparser parses it safely as symbol, we don't have to escape it.

This is how /\{/ will be parsed:

{
  "type": "value",
  "kind": "identifier",
  "codePoint": 123,
  "range": [
    0,
    2
  ],
  "raw": "\\{"
}

@mathiasbynens mathiasbynens changed the base branch from master to main June 18, 2020 06:50
@JLHwung JLHwung force-pushed the skip-ascii-symbol-process branch from 9afeddb to a710f04 Compare September 18, 2023 15:22
@JLHwung
Copy link
Copy Markdown
Collaborator Author

JLHwung commented Sep 18, 2023

The CI error is fixed in #81.

@nicolo-ribaudo
Copy link
Copy Markdown
Collaborator

@JLHwung could you rebase?

@JLHwung JLHwung force-pushed the skip-ascii-symbol-process branch from a710f04 to 572314f Compare September 19, 2023 15:30
@JLHwung JLHwung force-pushed the skip-ascii-symbol-process branch from 572314f to e7739a1 Compare September 19, 2023 15:35
@JLHwung JLHwung merged commit 55f3e5e into mathiasbynens:main Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants