Skip to content

feat(es): expand swc_es_parser structured for-head and TS type parsing#11593

Merged
kdy1 merged 4 commits intomainfrom
kdy1/swc-es-parser-phase2-structured-nodes
Feb 27, 2026
Merged

feat(es): expand swc_es_parser structured for-head and TS type parsing#11593
kdy1 merged 4 commits intomainfrom
kdy1/swc-es-parser-phase2-structured-nodes

Conversation

@kdy1
Copy link
Copy Markdown
Member

@kdy1 kdy1 commented Feb 27, 2026

Summary

This PR continues the new swc_es_parser implementation by replacing placeholder parsing paths with structured AST construction for for heads and TypeScript types.

What changed

  • Added structured for head AST in swc_es_ast:
    • ForHead::{Classic, In, Of}
    • ForClassicHead, ForInit, ForInHead, ForOfHead, ForBinding
  • Parser now builds real for head nodes instead of skipping the entire header.
  • Expanded TypeScript type AST + parser support:
    • union/intersection/array/tuple/parenthesized/type-literal/function type nodes
    • type reference generic arguments (TsTypeRef.type_args)
    • type alias generic params (TsTypeAliasDecl.type_params)
  • Improved JSX name parsing:
    • identifier vs qualified name parsing
    • opening/closing tag mismatch diagnostics
  • Updated visitors to traverse new for head and TS type structures.
  • Extended parser tests and fixture harness rendering for the new nodes.

Validation

  • cargo fmt --all
  • cargo check -p swc_es_ast -p swc_es_visit -p swc_es_parser
  • cargo test -p swc_es_ast
  • cargo test -p swc_es_visit
  • cargo test -p swc_es_parser
  • cargo clippy -p swc_es_ast -p swc_es_visit -p swc_es_parser --all-targets -- -D warnings

Notes

  • Full workspace clippy (cargo clippy --all --all-targets -- -D warnings) still intermittently failed in this environment due network DNS resolution to static.crates.io.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 27, 2026

⚠️ No Changeset found

Latest commit: 2e2f8a0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Feb 27, 2026

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 27, 2026

Binary Sizes

File Size
swc.linux-x64-gnu.node 28M (28742088 bytes)

Commit: 24bdc8b

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Feb 27, 2026

Merging this PR will not alter performance

✅ 188 untouched benchmarks
🆕 2 new benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 es_parser/bootstrap/typescript N/A 43.8 µs N/A
🆕 es_parser/bootstrap/angular N/A 26.7 µs N/A

Comparing kdy1/swc-es-parser-phase2-structured-nodes (2e2f8a0) with main (80b4be8)

Open in CodSpeed

@kdy1 kdy1 marked this pull request as ready for review February 27, 2026 03:53
@kdy1 kdy1 requested a review from a team as a code owner February 27, 2026 03:53
Copilot AI review requested due to automatic review settings February 27, 2026 03:53
@kdy1 kdy1 enabled auto-merge (squash) February 27, 2026 03:53
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Feb 27, 2026

PR Review: feat(es): expand swc_es_parser structured for-head and TS type parsing

This is a substantial and well-structured phase-2 PR that continues building out the new swc_es_parser. The overall architecture (arena-backed AST, hand-written recursive descent parser, typed generational IDs) is sound and aligns well with the project's performance goals. Below is detailed feedback.


Positive Highlights

  • Clean arena-backed AST design: The AstStore + typed Id<T> approach is a good fit for a high-performance parser. The generational IDs prevent use-after-free bugs.
  • Well-documented code: Every AST node, struct field, and public API has doc comments.
  • Good test coverage for new features: The for head variants (classic/in/of), JSX qualified name parsing, tag mismatch diagnostics, and TS type parsing all have dedicated unit tests.
  • Fixture harness reuse: Reusing swc_ecma_parser test fixtures is a smart approach for incremental validation.
  • Feature-gated serde: The serde-impl and encoding-impl features keep the core AST lean.

Bugs / Correctness Issues

  1. ?? (nullish coalescing) mapped to LogicalOr (parser.rs, parse_logical_or_expr):

    TokenKind::Nullish => BinaryOp::LogicalOr,

    This maps ?? to BinaryOp::LogicalOr instead of a distinct nullish coalescing operator. The AST's BinaryOp enum is missing a NullishCoalescing variant. This will produce incorrect AST for a ?? b (it'll look identical to a || b).

  2. last_pos() returns self.cur.span.hi instead of the previous token's end (parser.rs:4837):

    fn last_pos(&self) -> swc_common::BytePos {
        self.cur.span.hi
    }

    After bump(), self.cur is the next token. So last_pos() should return the end position of the previous token, not the current one. This will cause all spans to extend to the end of the following token rather than the end of the actual node. You likely need a prev_hi: BytePos field updated on each bump().

  3. Slash token collides with regex literal / line comment recovery: In binary_or_assign for b'/', the lexer always treats / as an operator. When / starts a regex literal or appears after certain tokens (like return), this will mislex. This is a known hard problem and acceptable for a bootstrap parser, but worth a TODO comment.

  4. Block comment lexer skips non-ASCII bytes (lexer.rs, skip_space_and_comments):
    The block comment loop only matches cur_as_ascii(), so if a block comment contains non-ASCII characters (e.g., /* こんにちは */), the lexer will break out of the block comment loop on the first non-ASCII byte and leave the comment unterminated.

  5. BindingIdent is missing from Pat::Ident: The Pat enum uses Pat::Ident(BindingIdent) but BindingIdent (which appears to just be a type alias for Ident) doesn't carry type annotation info. For TS mode, parameter patterns like x: number won't be able to attach the type annotation. Worth noting as a known gap for the bootstrap.


Performance Considerations

  1. ts_paren_followed_by_arrow clones the entire Lexer (parser.rs:4501):

    let mut lexer = self.lexer.clone();

    This clones the lexer (including its Vec<Error> and StringInput) for every ( in a TS type context. This could be expensive for large files. Consider using the LexerCheckpoint save/restore mechanism instead, which is already implemented and designed for this purpose.

  2. store.expr(id).cloned() in hot paths (parser.rs, multiple locations like for_init_to_binding, expr_to_assign_pat, parse_decl):
    Cloning AST nodes from the store to match on them then immediately dropping the clone is wasteful. Consider matching on store.expr(id) references directly. Some nodes contain Vec fields making the clone non-trivial.

  3. Lexer is_ident_start / is_ident_continue use char::is_alphabetic() / is_alphanumeric(): These are correct for Unicode but are slow compared to a lookup table. For a parser targeting high performance, consider a fast-path table for ASCII and only falling back to Unicode checks for non-ASCII.


Code Quality / Design

  1. #[allow(unused_imports)] in operator.rs:

    #[allow(unused_imports)]
    pub use crate::expr::{AssignOp, BinaryOp, UnaryOp};

    If these re-exports are unused, they should either be used or removed rather than suppressed. If they're for public API convenience, remove the allow.

  2. AstStore accessor pattern is repetitive (store.rs): Each node type requires 3 methods (alloc_*, *, *_mut) written manually. This is a good candidate for a macro to reduce boilerplate and prevent copy-paste errors as more node types are added.

  3. parse_decl allocates a stmt then immediately destructures it (parser.rs:2859-2918):
    The parse_decl method calls parse_var_decl_stmt() which allocates a Stmt in the arena, then immediately reads it back and extracts the inner DeclId. This wastes an arena slot. Consider adding a parse_var_decl helper that returns a DeclId directly, and have parse_var_decl_stmt be a thin wrapper around it.

  4. lib.rs module-level doc comment is in Korean (swc_es_parser/src/lib.rs:2478):

    //! `swc_es_ast` 기반 ECMAScript parser.
    

    Per CLAUDE.md: "Write comments and documentations in English."

  5. Missing bitwise operators: BinaryOp has Amp, Pipe, Caret in the token kinds but the BinaryOp AST enum lacks BitAnd, BitOr, BitXor, LeftShift, RightShift, UnsignedRightShift. The parser doesn't have a bitwise precedence level either. This means expressions like a & b won't parse correctly.

  6. Missing Slash in TokenKind display for error messages: Consider implementing Display for TokenKind to improve error messages. Currently format!("{:?}", self.cur.kind) produces debug output like Keyword(Let) rather than user-friendly 'let'.


Security

  1. unsafe usage in lexer (lexer.rs:2156, 2261, 2354, 2364): The unsafe blocks for slice_str and bump_bytes appear sound as they rely on StringInput's invariants, but each should have a // SAFETY: comment explaining why the preconditions are met, per Rust best practices.

Test Coverage Gaps

  1. No tests for using / await using declarations: The parser handles explicit resource management syntax but there are no unit tests for it.
  2. No tests for nested JSX elements: Only self-closing and simple opening/closing tags are tested.
  3. No tests for for await (... of ...) syntax: ForOfHead.is_await is set but never tested.
  4. No tests for error recovery: The recover_stmt method is untested. A test that verifies the parser can recover from a syntax error and continue parsing subsequent statements would be valuable.

Summary

This is solid foundational work for the new parser. The arena-backed architecture is well-suited for performance, and the incremental approach of expanding coverage across multiple PRs is sensible. The main items to address before merging are:

  • P0 (correctness): Fix ??LogicalOr mapping, fix last_pos() to track previous token end
  • P1 (correctness): Fix block comment lexer for non-ASCII content
  • P1 (performance): Replace lexer clone with checkpoint in ts_paren_followed_by_arrow
  • P2 (quality): Translate Korean doc comment, add // SAFETY: comments on unsafe blocks

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e2f8a0090

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +980 to +981
TokenKind::OrOr => BinaryOp::LogicalOr,
TokenKind::Nullish => BinaryOp::LogicalOr,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve nullish coalescing operator kind

In parse_logical_or_expr, TokenKind::Nullish is currently lowered to BinaryOp::LogicalOr, which makes a ?? b indistinguishable from a || b in the AST. Downstream transforms/codegen that depend on operator identity will therefore change behavior for valid nullish cases (for example 0 ?? 1 should stay 0, but 0 || 1 is 1).

Useful? React with 👍 / 👎.

Comment on lines +458 to +460
if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) {
self.bump();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep postfix update operators in for-head AST

The classic for update parser consumes ++/-- but stores only the pre-update expression in ForClassicHead.update. For for (;; i++) this records the same AST as for (;; i), dropping the increment side effect and causing incorrect behavior for any consumer that re-emits or evaluates the parsed tree.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues bootstrapping swc_es_parser by expanding swc_es_ast with structured for heads and TypeScript type nodes, updating the parser to build those nodes (plus improved JSX name handling), and adding visitors + fixture-based test coverage to exercise the new structures.

Changes:

  • Added new swc_es_ast node groups (module decls, JSX, classes/functions, for heads, TS types) and corresponding arena storage/IDs.
  • Implemented parsing for structured for statement heads, richer TS type syntax, and improved JSX tag name parsing/mismatch diagnostics.
  • Added swc_es_visit visitor/walker crate and extended parser tests + fixture snapshot harness.

Reviewed changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/swc-es-parser-design.md Adds a short design note describing AST/error model and parser strategy for the bootstrap parser.
crates/swc_es_visit/Cargo.toml Introduces the new swc_es_visit crate package metadata and dependencies.
crates/swc_es_visit/src/lib.rs Adds visitor traits and default AST walkers, including traversal for new for heads and TS types.
crates/swc_es_visit/tests/visit.rs Adds a basic traversal test to validate visitor walking/counting behavior.
crates/swc_es_parser/Cargo.toml Adds the swc_es_parser crate manifest, features, dev-deps, and bench configuration.
crates/swc_es_parser/README.md Documents parser goals, current status, fixture harness usage, and API sketch.
crates/swc_es_parser/src/lib.rs Exposes the parser public API and parse_file_as_* helpers that collect recoverable errors.
crates/swc_es_parser/src/context.rs Adds bitflags-based parser context tracking.
crates/swc_es_parser/src/error.rs Adds a crate-local error model with conversion to swc_common diagnostics.
crates/swc_es_parser/src/token.rs Defines token kinds, keyword set, and token payload representation.
crates/swc_es_parser/src/syntax.rs Defines Syntax, EsSyntax, and TsSyntax options (serde-friendly).
crates/swc_es_parser/src/lexer.rs Implements the bootstrap lexer producing the new token model.
crates/swc_es_parser/src/parser.rs Implements the bootstrap recursive-descent parser, including for heads, TS type parsing, and JSX parsing.
crates/swc_es_parser/tests/smoke.rs Adds smoke tests that reuse upstream fixtures and sanity-check core APIs.
crates/swc_es_parser/tests/fixture_harness.rs Adds snapshot-based fixture harness and rendering for new AST node shapes.
crates/swc_es_parser/examples/parse.rs Adds a small example program showing how to parse and inspect the AST.
crates/swc_es_parser/benches/parser.rs Adds criterion benchmarks for parsing representative ES/TS sources.
crates/swc_es_ast/Cargo.toml Adds serde-impl / encoding-impl feature gates and dev-dependency for serde tests.
crates/swc_es_ast/src/lib.rs Re-exports new AST modules/types and adds new typed IDs (module decls, JSX, TS types, etc.).
crates/swc_es_ast/src/store.rs Extends AstStore with arenas/allocators/accessors for new node kinds.
crates/swc_es_ast/src/stmt.rs Adds Stmt::For, Stmt::ModuleDecl, and structured ForHead / binding types.
crates/swc_es_ast/src/decl.rs Adds Decl::TsTypeAlias and associated node struct.
crates/swc_es_ast/src/expr.rs Adds expressions for arrays/objects/classes/functions/JSX/TS-as and extends BinaryOp.
crates/swc_es_ast/src/pat.rs Adds Pat::Expr and serde gating to pattern nodes.
crates/swc_es_ast/src/program.rs Adds serde gating to program nodes.
crates/swc_es_ast/src/lit.rs Adds serde gating to literal nodes.
crates/swc_es_ast/src/prop.rs Adds property-name and key-value prop node definitions used by object/class parsing.
crates/swc_es_ast/src/module_decl.rs Adds module declaration node definitions used by module parsing.
crates/swc_es_ast/src/jsx.rs Adds JSX node definitions used by JSX parsing.
crates/swc_es_ast/src/function.rs Adds Function and Param node definitions used by function expressions and visitors.
crates/swc_es_ast/src/class.rs Adds class/class-member node definitions used by class parsing and visitors.
crates/swc_es_ast/src/typescript.rs Adds structured TS type nodes used by TS parsing and visitors.
crates/swc_es_ast/src/operator.rs Adds UpdateOp plus operator re-exports.
crates/swc_es_ast/src/ident.rs Adds serde gating to identifiers.
crates/swc_es_ast/tests/serde.rs Adds a serde roundtrip test for store IDs and atoms under serde-impl.
crates/swc_arena/Cargo.toml Adds a serde-impl feature gate for arena ID serialization support.
crates/swc_arena/src/lib.rs Implements Serialize/Deserialize for Id<T> behind serde-impl.
Cargo.lock Updates lockfile for newly introduced crates/features/dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +454 to +460
let update = if self.cur.kind == TokenKind::RParen {
None
} else {
let expr = self.parse_expr()?;
if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) {
self.bump();
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the for classic head update clause, postfix ++/-- is being consumed and then discarded (i++ becomes just Ident(i) in the AST). This makes the produced AST incorrect for common for loops. Consider adding an UpdateExpr node (the AST already defines UpdateOp) and parsing postfix update operators in parse_postfix_expr / here instead of dropping them.

Copilot uses AI. Check for mistakes.
Comment on lines +504 to +508
let expr = self.parse_expr()?;
if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) {
self.bump();
}
Some(expr)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above for the update clause when an initializer is present: i++/i-- tokens are consumed but not represented in the AST, producing an incorrect tree. This should be represented as an update expression node rather than dropped.

Suggested change
let expr = self.parse_expr()?;
if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) {
self.bump();
}
Some(expr)
Some(self.parse_expr()?)

Copilot uses AI. Check for mistakes.
Comment on lines +2140 to +2143
matches!(
self.cur.kind,
TokenKind::Keyword(Keyword::Import | Keyword::Export)
)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_module_start treats any import token as the start of a module declaration. This will misclassify expression forms like import('x') and import.meta as module items, causing incorrect parsing and program kind detection. This check should look ahead to distinguish import declarations from import(...) / import.meta and only treat the declaration forms as module starts.

Suggested change
matches!(
self.cur.kind,
TokenKind::Keyword(Keyword::Import | Keyword::Export)
)
match self.cur.kind {
// `export` always starts a module declaration.
TokenKind::Keyword(Keyword::Export) => true,
// `import` only starts a module declaration if it is *not* followed by
// `(` (dynamic import) or `.` (`import.meta`), which are expression forms.
TokenKind::Keyword(Keyword::Import) => {
let next = self.peek();
!matches!(next.kind, TokenKind::LParen | TokenKind::Dot)
}
_ => false,
}

Copilot uses AI. Check for mistakes.
Comment on lines +2289 to +2291
fn last_pos(&self) -> swc_common::BytePos {
self.cur.span.hi
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_pos() returns self.cur.span.hi, but self.cur is the next unconsumed token after most parsing routines (because expect()/bump() advance). As a result, many node spans will extend past the actual end of the node into the following token. Consider tracking the end position of the last consumed token (e.g. storing prev_span_hi on bump) and using that for span construction.

Copilot uses AI. Check for mistakes.
let start = self.cur.span.lo;
let op = match self.cur.kind {
TokenKind::OrOr => BinaryOp::LogicalOr,
TokenKind::Nullish => BinaryOp::LogicalOr,
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? tokens are being parsed as BinaryOp::LogicalOr, which loses the nullish-coalescing operator semantics. This will produce an incorrect AST for expressions like a ?? b (and also makes it impossible to distinguish || vs ?? downstream). Add a dedicated binary op (e.g. NullishCoalescing) and map TokenKind::Nullish to it, with the appropriate precedence rules.

Suggested change
TokenKind::Nullish => BinaryOp::LogicalOr,
TokenKind::Nullish => BinaryOp::NullishCoalescing,

Copilot uses AI. Check for mistakes.
TokenKind::MinusEq => Some(AssignOp::SubAssign),
TokenKind::StarEq => Some(AssignOp::MulAssign),
TokenKind::SlashEq => Some(AssignOp::DivAssign),
TokenKind::PercentEq => Some(AssignOp::ModAssign),
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lexer produces many assignment-like tokens (e.g. &=, |=, ^=, &&=, ||=, ??=), but parse_assignment_expr only recognizes =, +=, -=, *=, /=, %=. Any of the other tokens will currently be left unconsumed and likely break parsing. Either handle the remaining assignment operators here or remove/feature-gate their tokenization until supported.

Suggested change
TokenKind::PercentEq => Some(AssignOp::ModAssign),
TokenKind::PercentEq => Some(AssignOp::ModAssign),
TokenKind::AmpEq => Some(AssignOp::BitAndAssign),
TokenKind::PipeEq => Some(AssignOp::BitOrAssign),
TokenKind::CaretEq => Some(AssignOp::BitXorAssign),
TokenKind::AmpAmpEq => Some(AssignOp::LogicalAndAssign),
TokenKind::PipePipeEq => Some(AssignOp::LogicalOrAssign),
TokenKind::QuestionQuestionEq => Some(AssignOp::NullishAssign),

Copilot uses AI. Check for mistakes.
Comment on lines +2055 to +2077
fn parse_ts_type_name(&mut self) -> Ident {
let start = self.cur.span.lo;
let mut name = String::new();

loop {
let Some(segment) = self.cur_name_atom() else {
break;
};
if !name.is_empty() {
name.push('.');
}
name.push_str(segment.as_ref());
self.bump();

if self.cur.kind != TokenKind::Dot {
break;
}
self.bump();
}

if name.is_empty() {
name.push('_');
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_ts_type_name silently falls back to "_" when it can't read any segments. In cases where the current token is TokenKind::Ident without a payload (e.g. from lexer error recovery), cur_name_atom() returns None and this function will not consume any input, which can lead to non-progress / infinite loops in type parsing. Consider returning PResult<Ident> and producing a recoverable error while consuming at least one token to guarantee progress.

Copilot uses AI. Check for mistakes.
@kdy1 kdy1 merged commit f11fd70 into main Feb 27, 2026
203 checks passed
@kdy1 kdy1 deleted the kdy1/swc-es-parser-phase2-structured-nodes branch February 27, 2026 04:01
@github-actions github-actions Bot modified the milestones: Planned, 1.15.18 Feb 27, 2026
@swc-project swc-project locked as resolved and limited conversation to collaborators Mar 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants