Faster identifier tokenizing by JLHwung · Pull Request #13262 · babel/babel

JLHwung · 2021-05-05T15:00:04Z

Q	A
License	MIT

~~This PR includes commits on #13256 , see here for the real diff.~~ Edits: already rebased.

This PR

moves the flow iterator @@iterator parsing to the flow plugin and simplifies the tokenizer state. It turns out we don't actually need to access this.state.isIterator as long as we have a dedicated code path for Flow iterator. So state.isIterator is removed.
passes through the identifier start so we don't have to re-read the input source and test with isIdentifierChar

Currently, when parsing a length-3 input ab;, this.input.codePointAt is called for 5 times:

Seq	Position	Character	Context
1	0	"a"	`getTokenFromCode`
2	0	"a"	`readWord1`
3	1	"b"	`readWord1`
4	2	";"	`readWord1`
5	2	";"	`getTokenFromCode`

This PR passes through the read code point 0x61 in sequence 1 for readWord1 so we can avoid reading again in sequence 2. Now if we are parsing ab;, only 4 codePointAt call will be issued (1, 3, 4, 5). As we can see, the margin is diminishing for longer identifier names.

Although we could do the same for escaped identifiers, I don't think it worths the efforts because escaped identifiers are rare.

Benchmark results

Combing these two tricks we see up to 4.5% performance gain on length-1 identifiers (Best case).

$ node --predictable ./benchmark/many-identifiers/1-length.bench.mjs
baseline 64 length-1 identifiers: 15836 ops/sec ±67.43% (0.063ms)
baseline 128 length-1 identifiers: 16041 ops/sec ±2.09% (0.062ms)
baseline 256 length-1 identifiers: 8463 ops/sec ±1.48% (0.118ms)
baseline 512 length-1 identifiers: 4261 ops/sec ±1.21% (0.235ms)
baseline 1024 length-1 identifiers: 2165 ops/sec ±1.03% (0.462ms)
current 64 length-1 identifiers: 20153 ops/sec ±81.04% (0.05ms)
current 128 length-1 identifiers: 17915 ops/sec ±0.48% (0.056ms)
current 256 length-1 identifiers: 8844 ops/sec ±0.96% (0.113ms)
current 512 length-1 identifiers: 4410 ops/sec ±1.42% (0.227ms)
current 1024 length-1 identifiers: 2191 ops/sec ±1.22% (0.456ms)

up to 2.5% performance gain on length-2 identifiers

$ node --predictable ./benchmark/many-identifiers/2-length.bench.mjs
baseline 64 length-2 identifiers: 20141 ops/sec ±66.04% (0.05ms)
baseline 128 length-2 identifiers: 15641 ops/sec ±1.62% (0.064ms)
baseline 256 length-2 identifiers: 7981 ops/sec ±1.06% (0.125ms)
baseline 512 length-2 identifiers: 3959 ops/sec ±1.11% (0.253ms)
baseline 1024 length-2 identifiers: 1935 ops/sec ±1.47% (0.517ms)
current 64 length-2 identifiers: 21245 ops/sec ±63.76% (0.047ms)
current 128 length-2 identifiers: 15943 ops/sec ±1.32% (0.063ms)
current 256 length-2 identifiers: 8079 ops/sec ±1.23% (0.124ms)
current 512 length-2 identifiers: 4064 ops/sec ±0.99% (0.246ms)
current 1024 length-2 identifiers: 2051 ops/sec ±0.2% (0.488ms)

and no significant performance gain on length-20 identifiers. Note that in predictable mode, the first bench suite always has significant variant and should not be taken into account.

baseline 64 length-20 identifiers: 14220 ops/sec ±69.35% (0.07ms)
baseline 128 length-20 identifiers: 11560 ops/sec ±0.96% (0.087ms)
baseline 256 length-20 identifiers: 5885 ops/sec ±0.54% (0.17ms)
baseline 512 length-20 identifiers: 2902 ops/sec ±1.38% (0.345ms)
baseline 1024 length-20 identifiers: 1429 ops/sec ±1.43% (0.7ms)
current 64 length-20 identifiers: 16010 ops/sec ±55.29% (0.062ms)
current 128 length-20 identifiers: 11621 ops/sec ±1.02% (0.086ms)
current 256 length-20 identifiers: 5901 ops/sec ±0.5% (0.169ms)
current 512 length-20 identifiers: 2906 ops/sec ±0.91% (0.344ms)
current 1024 length-20 identifiers: 1417 ops/sec ±2.47% (0.706ms)

codesandbox-ci · 2021-05-05T15:04:09Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 00c42cc:

Sandbox	Source
babel-repl-custom-plugin	Configuration
babel-plugin-multi-config	Configuration

babel-bot · 2021-05-05T15:04:43Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/45886/

JLHwung · 2021-05-05T15:06:10Z

packages/babel-parser/src/plugins/flow/index.js

+      // Allow @@iterator and @@asyncIterator as a identifier only inside type
+      if (!this.isIterator(word) || !this.state.inType) {
+        this.raise(this.state.pos, Errors.InvalidIdentifier, fullWord);
+      }


The implementation of readIterator is copied from original readWord. It oversights that an iterator identifier may contain escapes and thus should not be allowed:

For example, currently Babel parses it successfully

function foo(): { @@iter\u0061tor: () => string } { return (0: any); }

This PR focuses on performance only and I will open a new PR for that after this PR gets merged.

nicolo-ribaudo

This is probably less than 1% when parsing real files, but I like that it moves Flow stuff to the Flow plugin.

- Mover iterator identifier parsing to the Flow plugin - If the character is an identifier start, pass it to readWord1

JLHwung added pkg: parser PR: Performance 🏃‍♀️ A type of pull request used for our changelog categories labels May 5, 2021

JLHwung commented May 5, 2021

View reviewed changes

nicolo-ribaudo approved these changes May 6, 2021

View reviewed changes

JLHwung added 2 commits May 6, 2021 12:31

add benchmark

56365b9

perf: faster identifier tokenizing

00c42cc

- Mover iterator identifier parsing to the Flow plugin - If the character is an identifier start, pass it to readWord1

JLHwung force-pushed the faster-identifier-tokenizing branch from a20aef9 to 00c42cc Compare May 6, 2021 16:31

jridgewell approved these changes May 6, 2021

View reviewed changes

JLHwung merged commit a8fea40 into babel:main May 6, 2021

JLHwung deleted the faster-identifier-tokenizing branch May 6, 2021 22:47

This was referenced May 18, 2021

chore(deps-dev): bump @babel/core from 7.13.8 to 7.14.2 filecoin-project/slate#746

Closed

chore(deps-dev): bump @babel/eslint-parser from 7.13.8 to 7.14.2 filecoin-project/slate#745

Closed

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Aug 6, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster identifier tokenizing#13262

Faster identifier tokenizing#13262
JLHwung merged 2 commits intobabel:mainfrom
JLHwung:faster-identifier-tokenizing

JLHwung commented May 5, 2021 •

edited

Loading

Uh oh!

codesandbox-ci bot commented May 5, 2021 •

edited

Loading

Uh oh!

babel-bot commented May 5, 2021 •

edited

Loading

Uh oh!

JLHwung May 5, 2021 •

edited

Loading

Uh oh!

nicolo-ribaudo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

JLHwung commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

codesandbox-ci bot commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

babel-bot commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JLHwung May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JLHwung commented May 5, 2021 •

edited

Loading

codesandbox-ci bot commented May 5, 2021 •

edited

Loading

babel-bot commented May 5, 2021 •

edited

Loading

JLHwung May 5, 2021 •

edited

Loading