Skip to content

Extract string parsing to a separate package#14772

Merged
nicolo-ribaudo merged 6 commits intobabel:mainfrom
nicolo-ribaudo:parser-extract-string-parsing
Jul 20, 2022
Merged

Extract string parsing to a separate package#14772
nicolo-ribaudo merged 6 commits intobabel:mainfrom
nicolo-ribaudo:parser-extract-string-parsing

Conversation

@nicolo-ribaudo
Copy link
Copy Markdown
Member

@nicolo-ribaudo nicolo-ribaudo commented Jul 19, 2022

Q                       A
Fixed Issues?
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

This was hard, but it lets us avoid an external dependency with some logic that we already have in #14757.

@liuxingbaoyu you should be able to do something like this:

let unterminatedCalled = false;
try {
  const error = () => { throw new Error(); };
  cooked = parseStringContents("template", raw, 0, 0, 0, {
    unterminated() { unterminatedCalled = true; },
    strictNumericEscape: error,
    invalidEscapeSequence: error,
    numericSeparatorInEscapeSequence: error,
    unexpectedNumericSeparator: error,
    invalidDigit: error,
    invalidCodePoint: error,
  });
  if (!unterminatedCalled) cooked = null; // we must consume the whole raw string
} catch {}

This PR mostly moves code around, with one main change: instead of directly updating this.state.{pos,curLine,lineStart}, we keep track of them in variables and we update them only after returning from the helper parser functions.

@nicolo-ribaudo nicolo-ribaudo added PR: Internal 🏠 A type of pull request used for our changelog categories pkg: parser labels Jul 19, 2022
@JLHwung JLHwung self-requested a review July 20, 2022 00:17
@liuxingbaoyu
Copy link
Copy Markdown
Member

Amazing!

babel/babel.config.js

Lines 208 to 214 in 486d27a

{
test: [
"packages/babel-generator",
"packages/babel-plugin-proposal-decorators",
].map(normalize),
plugins: ["babel-plugin-transform-charcodes"],
},

Can you add it here?

@nicolo-ribaudo nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 43cc94a to 9971813 Compare July 20, 2022 11:15
@babel-bot
Copy link
Copy Markdown
Collaborator

babel-bot commented Jul 20, 2022

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/52568/

"rawValue": "\\u{}",
"raw": "\"\\u{}\"",
"expressionValue": "null"
"expressionValue": ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new behaviour here is better than current main. Should we set expressionValue to be null when the string can not be successfully parsed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I don't know why the behavior changed 😬
But yes, null would be even better.

Copy link
Copy Markdown
Member Author

@nicolo-ribaudo nicolo-ribaudo Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, expressionValue is a property of directives that represents the .value property of the originally parsed StringLiteral. I feat that making it null would break anything relying on the fact that StringLiteral.value is a string, so I prefer the behavior currently implemented by this PR of just replacing invalid escapes with "" (for example, ab\u{}cd is abcd).

@nicolo-ribaudo nicolo-ribaudo force-pushed the parser-extract-string-parsing branch from 700d6f0 to 23418cd Compare July 20, 2022 13:19
Copy link
Copy Markdown
Contributor

@JLHwung JLHwung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Copy Markdown
Member

@liuxingbaoyu liuxingbaoyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a lax benchmark with no significant performance penalty.

current 1 jquery 3.6: 40.11 ops/sec ±5.28% (25ms)
current 4 jquery 3.6: 9.5 ops/sec ±3.78% (105ms)
current 16 jquery 3.6: 2.22 ops/sec ±7.14% (451ms)
current 64 jquery 3.6: 0.54 ops/sec ±2.83% (1861ms)
baseline 1 jquery 3.6: 41.42 ops/sec ±5.53% (24ms)
baseline 4 jquery 3.6: 9.89 ops/sec ±2.5% (101ms)
baseline 16 jquery 3.6: 2.36 ops/sec ±5% (424ms)
baseline 64 jquery 3.6: 0.55 ops/sec ±9.05% (1830ms)

@nicolo-ribaudo
Copy link
Copy Markdown
Member Author

Good! I believe rewriting all the tokenizer like this could actually improve performance, because we would skip all the this.state.* accesses and the this.... prototype lookup to call other tokenizer methods.

@nicolo-ribaudo nicolo-ribaudo merged commit ea5ff29 into babel:main Jul 20, 2022
@nicolo-ribaudo nicolo-ribaudo deleted the parser-extract-string-parsing branch July 20, 2022 15:42
Comment on lines +16 to +18
"devDependencies": {
"charcodes": "^0.2.0"
},
Copy link
Copy Markdown
Contributor

@merceyz merceyz Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if @babel/helper-string-parser will be bundled so asking to be on the safe side; shouldn't this be a normal dependency?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a plugin that inlines charcodes.*** into the corresponding codepoints: https://github.com/xtuc/charcodes/blob/master/packages/babel-plugin-transform-charcodes/README.md

@github-actions github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 21, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants