Skip to content

perf(parser): use peek_token instead of checkpoint/rewind for single-token decisions#23056

Merged
graphite-app[bot] merged 1 commit into
mainfrom
perf/parser-peek-instead-of-checkpoint
Jun 7, 2026
Merged

perf(parser): use peek_token instead of checkpoint/rewind for single-token decisions#23056
graphite-app[bot] merged 1 commit into
mainfrom
perf/parser-peek-instead-of-checkpoint

Conversation

@Boshen

@Boshen Boshen commented Jun 7, 2026

Copy link
Copy Markdown
Member

What

Two related changes:

  1. perf(parser): use peek_token instead of checkpoint/rewind for single-token decisions — replace the checkpoint + bump + rewind pattern with a single cached lexer-level peek_token() in four hot statement-start paths where the parse decision depends only on the one token following the keyword:

    • parse_let — token after let
    • parse_import_statementimport.meta / import() vs import declaration
    • parse_async_statementasync function vs other
    • for (let … — token after let
  2. fix(parser): dedup irregular whitespaces recorded during lookahead — prerequisite for the above (see below).

Why

In each path the keyword was speculatively consumed only to inspect the next token, then rewound on the cold path. By peeking first and only bump-ing once we commit, the keyword is never speculatively consumed, so no rewind is needed. peek_token() is a cached lexer-level re-lex and is far cheaper than a parser-level checkpoint, which snapshots lexer state, the current token, the error position and the fatal-error slot.

This continues the existing migration of single-token lookaheads from lookahead/checkpoint to peek_token (see the note in modifiers.rs).

The dedup fix

peek_token/lookahead lex past the current position and record any irregular whitespace they scan into trivia_builder.irregular_whitespaces; after rewinding, the committed path re-lexes the same span and records it again, producing duplicate no-irregular-whitespace diagnostics (e.g. let<NBSP>x = 1).

add_comment already handles this exact rewind-duplicate case for comments by skipping re-inserts on an ordered vec (start <= last.start). The same guard is now applied to add_irregular_whitespace. This also fixes the latent duplicate for the pre-existing lookahead-based paths, and correctly keeps genuinely-distinct adjacent whitespaces (let<NBSP><NBSP>x still reports two).

Conformance

No AST change. Verified by panic-isolated, per-suite comparison against main:

  • estree (full AST + spans + token streams): byte-identical
  • transformer: byte-identical

Irregular-whitespace counts verified directly: let<NBSP>x = 1, for (let<NBSP>x of y){}, async<NBSP>function f(){} each report exactly one span (was two); let<NBSP><NBSP>x reports two.

@github-actions github-actions Bot added the A-parser Area - Parser label Jun 7, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d903c49f08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/oxc_parser/src/js/declaration.rs
@codspeed-hq

codspeed-hq Bot commented Jun 7, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 57 untouched benchmarks
⏩ 9 skipped benchmarks1


Comparing perf/parser-peek-instead-of-checkpoint (1e0d3d0) with main (37169ff)2

Open in CodSpeed

Footnotes

  1. 9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (dc0e174) during the generation of this report, so 37169ff was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@Boshen Boshen force-pushed the perf/parser-peek-instead-of-checkpoint branch from 75ff726 to 1e0d3d0 Compare June 7, 2026 05:06
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Jun 7, 2026

Boshen commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Merge activity

…token decisions (#23056)

## What

Two related changes:

1. **perf(parser): use `peek_token` instead of checkpoint/rewind for single-token decisions** — replace the `checkpoint` + `bump` + `rewind` pattern with a single cached lexer-level `peek_token()` in four hot statement-start paths where the parse decision depends only on the one token following the keyword:
   - `parse_let` — token after `let`
   - `parse_import_statement` — `import.meta` / `import()` vs import declaration
   - `parse_async_statement` — `async function` vs other
   - `for (let …` — token after `let`

2. **fix(parser): dedup irregular whitespaces recorded during lookahead** — prerequisite for the above (see below).

## Why

In each path the keyword was speculatively consumed only to inspect the next token, then rewound on the cold path. By peeking first and only `bump`-ing once we commit, the keyword is never speculatively consumed, so no rewind is needed. `peek_token()` is a cached lexer-level re-lex and is far cheaper than a parser-level `checkpoint`, which snapshots lexer state, the current token, the error position and the fatal-error slot.

This continues the existing migration of single-token lookaheads from `lookahead`/`checkpoint` to `peek_token` (see the note in `modifiers.rs`).

## The dedup fix

`peek_token`/`lookahead` lex past the current position and record any irregular whitespace they scan into `trivia_builder.irregular_whitespaces`; after rewinding, the committed path re-lexes the same span and records it again, producing duplicate `no-irregular-whitespace` diagnostics (e.g. `let<NBSP>x = 1`).

`add_comment` already handles this exact rewind-duplicate case for comments by skipping re-inserts on an ordered vec (`start <= last.start`). The same guard is now applied to `add_irregular_whitespace`. This also fixes the latent duplicate for the pre-existing `lookahead`-based paths, and correctly keeps genuinely-distinct adjacent whitespaces (`let<NBSP><NBSP>x` still reports two).

## Conformance

No AST change. Verified by panic-isolated, per-suite comparison against `main`:
- estree (full AST + spans + token streams): byte-identical
- transformer: byte-identical

Irregular-whitespace counts verified directly: `let<NBSP>x = 1`, `for (let<NBSP>x of y){}`, `async<NBSP>function f(){}` each report exactly one span (was two); `let<NBSP><NBSP>x` reports two.
@graphite-app graphite-app Bot force-pushed the perf/parser-peek-instead-of-checkpoint branch from 1e0d3d0 to 68805ac Compare June 7, 2026 05:18
@graphite-app graphite-app Bot merged commit 68805ac into main Jun 7, 2026
30 checks passed
@graphite-app graphite-app Bot removed the 0-merge Merge with Graphite Merge Queue label Jun 7, 2026
@graphite-app graphite-app Bot deleted the perf/parser-peek-instead-of-checkpoint branch June 7, 2026 05:21
graphite-app Bot pushed a commit that referenced this pull request Jun 7, 2026
)

## What

Follow-up to #23056. Replace the parser-level `lookahead` (checkpoint + bump + rewind) in the TS `asserts` type-predicate path with a single cached lexer-level `peek_token()`.

The check only inspects the one token after `asserts`:

```rust
// before
if self.lookahead(|parser| {
    parser.bump(Kind::Asserts);
    parser.is_token_identifier_or_keyword_on_same_line()
}) { ... }

// after
let next = self.lexer.peek_token();
if next.kind().is_identifier_name() && !next.is_on_new_line() { ... }
```

The now-unused `is_token_identifier_or_keyword_on_same_line` helper is removed.

## Why

`peek_token()` is a cached lexer-level re-lex, far cheaper than a parser-level `checkpoint` (which snapshots lexer state, the current token, the error position and the fatal-error slot). Same single-token migration as the four paths in #23056.

## Conformance

No AST change — estree (full AST + spans + token streams) byte-identical to `main`.
Boshen added a commit that referenced this pull request Jun 8, 2026
### 💥 BREAKING CHANGES

- ee4dc73 ast: [**BREAKING**] Add `#[non_exhaustive]` to AST nodes
(#23046) (overlookmotel)
- 4c35362 ast: [**BREAKING**] Add
`AstBuilder::template_element_escape_raw` and
`template_element_escape_raw_with_lone_surrogates` methods (#23047)
(overlookmotel)

### 🚀 Features

- b846ab2 react_compiler: Integrate the Rust port of the React Compiler
(#22942) (Boshen)
- 5b8dd68 parser: Report TS1255 for invalid class definite assertions
(#22917) (camc314)
- 85efabf semantic: Make building the class table optional, off by
default (#22862) (Boshen)

### 🐛 Bug Fixes

- 556acdc codegen: Parenthesize TS-cast assignment targets (#23112)
(Boshen)
- 37169ff codegen: Don't emit space between postfix `--` and `>` when
minifying (#23036) (Boshen)
- a4b1bf7 codegen: Drop redundant whitespace in minified TypeScript
output (#23038) (Boshen)
- cf53285 parser: Report reserved type-declaration names in the parser
(#23035) (Boshen)
- 4e44969 ast: Fix UB in `escape_template_element_raw` (#23052)
(overlookmotel)
- c543154 parser: Report comma operator in JSX expression in the parser
(#23030) (Boshen)
- 325c94f codegen: Tighten conditional-type and constructor-type
whitespace when minifying (#23033) (Boshen)
- 95dd3a2 parser: Report `import type` alias to a non-external reference
in the parser (#23032) (Boshen)
- 90180b8 codegen: Drop space after `:` in function return type when
minifying (#23028) (Boshen)
- 6da876e parser: Report `abstract` private class field in the parser
(#23029) (Boshen)
- 28467ce codegen: Don't emit space before a postfix update operand when
minifying (#23027) (Boshen)
- cb29926 codegen: Drop redundant space after `export default` when
minifying (#23024) (Boshen)
- 62965ae codegen: Drop redundant space after `else` when minifying
(#23025) (Boshen)
- 989230a parser: Report compound assignment to non-simple target in the
parser (#23022) (Boshen)
- 06f367c parser: Report `super.#field` private access in the parser
(#23014) (Boshen)
- 184edef codegen: Print space before `const`/`declare` enum modifier
(#23013) (Boshen)
- 4d722e0 parser: Report duplicate switch `default` clause in the parser
(#23012) (Boshen)
- 597ed85 codegen: Parenthesize `let`/`async` for-of head target
(#23008) (Boshen)
- 8b631bf codegen: Remove stray space before mapped type value colon
(#23010) (Boshen)
- c08407e codegen: Don't over-parenthesize `in` inside an arrow in a
for-init (#23009) (Boshen)
- 600cd6f codegen: Parenthesize lower-precedence
`TSInstantiationExpression` operand (#23007) (Boshen)
- 187e1a5 codegen: Don't leak space after comment-only JSX expression
container (#23006) (Boshen)
- 294c473 codegen: Don't over-parenthesize `TSTypeAssertion` operand
(#23004) (Boshen)
- 786d96f codegen: Give `TSTypeAssertion` unary precedence (#23002)
(Boshen)
- 1295882 parser: Report `new.target` and `import.meta` syntax errors in
the parser (#23003) (Boshen)
- d727b6b codegen: Parenthesize `await` expression as base of `**`
(#23001) (Boshen)
- 67dfa08 codegen: Keep parentheses around `new` callees containing a
call (#22997) (Boshen)
- 17e7cf3 parser: Disallow unerasable `as`/`satisfies` assertions
(#22986) (Boshen)
- beb46d3 parser: Commit to module goal on decorated exports (#22941)
(Boshen)
- 49e63f7 isolated-declarations: Require annotations for satisfies
initializers (#22898) (camc314)
- 8c93601 isolated-declarations: Allow unknown enum initializer in
non-const enum (#22900) (camc314)

### ⚡ Performance

- 7d89909 parser: Peek instead of lookahead for yield disambiguation
(#23071) (Boshen)
- bf872f0 parser: Skip arrow lookahead for a parenthesized literal
(#23070) (Boshen)
- d19fc54 parser: Guard type-argument speculation behind an angle-token
check (#23069) (Boshen)
- 8eb5507 parser: Skip redundant member-rest re-scan on call entry
(#23068) (Boshen)
- 883dfc1 parser: Skip parse_call_expression_rest when no call follows
(#23063) (Boshen)
- b171153 parser: Peek before the await-using lookahead (#23059)
(Boshen)
- 56f21bd parser: Use peek_token for the TS `asserts` type predicate
(#23058) (Boshen)
- 68805ac parser: Use peek_token instead of checkpoint/rewind for
single-token decisions (#23056) (Boshen)
- 1f9d8eb ast: `AstBuilder::template_element_escape_raw` avoid
allocation if no escape required (#23053) (overlookmotel)
- 502b04d semantic: Move cold function redeclaration handling into
`#[cold]` function (#22973) (overlookmotel)

### 📚 Documentation

- 275d318 napi/minifier: Point `target` to oxc docs (#23102) (camc314)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant