perf(parser): use peek_token instead of checkpoint/rewind for single-token decisions#23056
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d903c49f08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Merging this PR will not alter performance
Comparing Footnotes
|
75ff726 to
1e0d3d0
Compare
Merge activity
|
…token decisions (#23056) ## What Two related changes: 1. **perf(parser): use `peek_token` instead of checkpoint/rewind for single-token decisions** — replace the `checkpoint` + `bump` + `rewind` pattern with a single cached lexer-level `peek_token()` in four hot statement-start paths where the parse decision depends only on the one token following the keyword: - `parse_let` — token after `let` - `parse_import_statement` — `import.meta` / `import()` vs import declaration - `parse_async_statement` — `async function` vs other - `for (let …` — token after `let` 2. **fix(parser): dedup irregular whitespaces recorded during lookahead** — prerequisite for the above (see below). ## Why In each path the keyword was speculatively consumed only to inspect the next token, then rewound on the cold path. By peeking first and only `bump`-ing once we commit, the keyword is never speculatively consumed, so no rewind is needed. `peek_token()` is a cached lexer-level re-lex and is far cheaper than a parser-level `checkpoint`, which snapshots lexer state, the current token, the error position and the fatal-error slot. This continues the existing migration of single-token lookaheads from `lookahead`/`checkpoint` to `peek_token` (see the note in `modifiers.rs`). ## The dedup fix `peek_token`/`lookahead` lex past the current position and record any irregular whitespace they scan into `trivia_builder.irregular_whitespaces`; after rewinding, the committed path re-lexes the same span and records it again, producing duplicate `no-irregular-whitespace` diagnostics (e.g. `let<NBSP>x = 1`). `add_comment` already handles this exact rewind-duplicate case for comments by skipping re-inserts on an ordered vec (`start <= last.start`). The same guard is now applied to `add_irregular_whitespace`. This also fixes the latent duplicate for the pre-existing `lookahead`-based paths, and correctly keeps genuinely-distinct adjacent whitespaces (`let<NBSP><NBSP>x` still reports two). ## Conformance No AST change. Verified by panic-isolated, per-suite comparison against `main`: - estree (full AST + spans + token streams): byte-identical - transformer: byte-identical Irregular-whitespace counts verified directly: `let<NBSP>x = 1`, `for (let<NBSP>x of y){}`, `async<NBSP>function f(){}` each report exactly one span (was two); `let<NBSP><NBSP>x` reports two.
1e0d3d0 to
68805ac
Compare
) ## What Follow-up to #23056. Replace the parser-level `lookahead` (checkpoint + bump + rewind) in the TS `asserts` type-predicate path with a single cached lexer-level `peek_token()`. The check only inspects the one token after `asserts`: ```rust // before if self.lookahead(|parser| { parser.bump(Kind::Asserts); parser.is_token_identifier_or_keyword_on_same_line() }) { ... } // after let next = self.lexer.peek_token(); if next.kind().is_identifier_name() && !next.is_on_new_line() { ... } ``` The now-unused `is_token_identifier_or_keyword_on_same_line` helper is removed. ## Why `peek_token()` is a cached lexer-level re-lex, far cheaper than a parser-level `checkpoint` (which snapshots lexer state, the current token, the error position and the fatal-error slot). Same single-token migration as the four paths in #23056. ## Conformance No AST change — estree (full AST + spans + token streams) byte-identical to `main`.
### 💥 BREAKING CHANGES - ee4dc73 ast: [**BREAKING**] Add `#[non_exhaustive]` to AST nodes (#23046) (overlookmotel) - 4c35362 ast: [**BREAKING**] Add `AstBuilder::template_element_escape_raw` and `template_element_escape_raw_with_lone_surrogates` methods (#23047) (overlookmotel) ### 🚀 Features - b846ab2 react_compiler: Integrate the Rust port of the React Compiler (#22942) (Boshen) - 5b8dd68 parser: Report TS1255 for invalid class definite assertions (#22917) (camc314) - 85efabf semantic: Make building the class table optional, off by default (#22862) (Boshen) ### 🐛 Bug Fixes - 556acdc codegen: Parenthesize TS-cast assignment targets (#23112) (Boshen) - 37169ff codegen: Don't emit space between postfix `--` and `>` when minifying (#23036) (Boshen) - a4b1bf7 codegen: Drop redundant whitespace in minified TypeScript output (#23038) (Boshen) - cf53285 parser: Report reserved type-declaration names in the parser (#23035) (Boshen) - 4e44969 ast: Fix UB in `escape_template_element_raw` (#23052) (overlookmotel) - c543154 parser: Report comma operator in JSX expression in the parser (#23030) (Boshen) - 325c94f codegen: Tighten conditional-type and constructor-type whitespace when minifying (#23033) (Boshen) - 95dd3a2 parser: Report `import type` alias to a non-external reference in the parser (#23032) (Boshen) - 90180b8 codegen: Drop space after `:` in function return type when minifying (#23028) (Boshen) - 6da876e parser: Report `abstract` private class field in the parser (#23029) (Boshen) - 28467ce codegen: Don't emit space before a postfix update operand when minifying (#23027) (Boshen) - cb29926 codegen: Drop redundant space after `export default` when minifying (#23024) (Boshen) - 62965ae codegen: Drop redundant space after `else` when minifying (#23025) (Boshen) - 989230a parser: Report compound assignment to non-simple target in the parser (#23022) (Boshen) - 06f367c parser: Report `super.#field` private access in the parser (#23014) (Boshen) - 184edef codegen: Print space before `const`/`declare` enum modifier (#23013) (Boshen) - 4d722e0 parser: Report duplicate switch `default` clause in the parser (#23012) (Boshen) - 597ed85 codegen: Parenthesize `let`/`async` for-of head target (#23008) (Boshen) - 8b631bf codegen: Remove stray space before mapped type value colon (#23010) (Boshen) - c08407e codegen: Don't over-parenthesize `in` inside an arrow in a for-init (#23009) (Boshen) - 600cd6f codegen: Parenthesize lower-precedence `TSInstantiationExpression` operand (#23007) (Boshen) - 187e1a5 codegen: Don't leak space after comment-only JSX expression container (#23006) (Boshen) - 294c473 codegen: Don't over-parenthesize `TSTypeAssertion` operand (#23004) (Boshen) - 786d96f codegen: Give `TSTypeAssertion` unary precedence (#23002) (Boshen) - 1295882 parser: Report `new.target` and `import.meta` syntax errors in the parser (#23003) (Boshen) - d727b6b codegen: Parenthesize `await` expression as base of `**` (#23001) (Boshen) - 67dfa08 codegen: Keep parentheses around `new` callees containing a call (#22997) (Boshen) - 17e7cf3 parser: Disallow unerasable `as`/`satisfies` assertions (#22986) (Boshen) - beb46d3 parser: Commit to module goal on decorated exports (#22941) (Boshen) - 49e63f7 isolated-declarations: Require annotations for satisfies initializers (#22898) (camc314) - 8c93601 isolated-declarations: Allow unknown enum initializer in non-const enum (#22900) (camc314) ### ⚡ Performance - 7d89909 parser: Peek instead of lookahead for yield disambiguation (#23071) (Boshen) - bf872f0 parser: Skip arrow lookahead for a parenthesized literal (#23070) (Boshen) - d19fc54 parser: Guard type-argument speculation behind an angle-token check (#23069) (Boshen) - 8eb5507 parser: Skip redundant member-rest re-scan on call entry (#23068) (Boshen) - 883dfc1 parser: Skip parse_call_expression_rest when no call follows (#23063) (Boshen) - b171153 parser: Peek before the await-using lookahead (#23059) (Boshen) - 56f21bd parser: Use peek_token for the TS `asserts` type predicate (#23058) (Boshen) - 68805ac parser: Use peek_token instead of checkpoint/rewind for single-token decisions (#23056) (Boshen) - 1f9d8eb ast: `AstBuilder::template_element_escape_raw` avoid allocation if no escape required (#23053) (overlookmotel) - 502b04d semantic: Move cold function redeclaration handling into `#[cold]` function (#22973) (overlookmotel) ### 📚 Documentation - 275d318 napi/minifier: Point `target` to oxc docs (#23102) (camc314) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
What
Two related changes:
perf(parser): use
peek_tokeninstead of checkpoint/rewind for single-token decisions — replace thecheckpoint+bump+rewindpattern with a single cached lexer-levelpeek_token()in four hot statement-start paths where the parse decision depends only on the one token following the keyword:parse_let— token afterletparse_import_statement—import.meta/import()vs import declarationparse_async_statement—async functionvs otherfor (let …— token afterletfix(parser): dedup irregular whitespaces recorded during lookahead — prerequisite for the above (see below).
Why
In each path the keyword was speculatively consumed only to inspect the next token, then rewound on the cold path. By peeking first and only
bump-ing once we commit, the keyword is never speculatively consumed, so no rewind is needed.peek_token()is a cached lexer-level re-lex and is far cheaper than a parser-levelcheckpoint, which snapshots lexer state, the current token, the error position and the fatal-error slot.This continues the existing migration of single-token lookaheads from
lookahead/checkpointtopeek_token(see the note inmodifiers.rs).The dedup fix
peek_token/lookaheadlex past the current position and record any irregular whitespace they scan intotrivia_builder.irregular_whitespaces; after rewinding, the committed path re-lexes the same span and records it again, producing duplicateno-irregular-whitespacediagnostics (e.g.let<NBSP>x = 1).add_commentalready handles this exact rewind-duplicate case for comments by skipping re-inserts on an ordered vec (start <= last.start). The same guard is now applied toadd_irregular_whitespace. This also fixes the latent duplicate for the pre-existinglookahead-based paths, and correctly keeps genuinely-distinct adjacent whitespaces (let<NBSP><NBSP>xstill reports two).Conformance
No AST change. Verified by panic-isolated, per-suite comparison against
main:Irregular-whitespace counts verified directly:
let<NBSP>x = 1,for (let<NBSP>x of y){},async<NBSP>function f(){}each report exactly one span (was two);let<NBSP><NBSP>xreports two.