feat(es): expand swc_es_parser structured for-head and TS type parsing#11593
feat(es): expand swc_es_parser structured for-head and TS type parsing#11593
Conversation
|
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
Binary Sizes
Commit: 24bdc8b |
Merging this PR will not alter performance
Performance Changes
Comparing |
PR Review:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2e2f8a0090
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| TokenKind::OrOr => BinaryOp::LogicalOr, | ||
| TokenKind::Nullish => BinaryOp::LogicalOr, |
There was a problem hiding this comment.
Preserve nullish coalescing operator kind
In parse_logical_or_expr, TokenKind::Nullish is currently lowered to BinaryOp::LogicalOr, which makes a ?? b indistinguishable from a || b in the AST. Downstream transforms/codegen that depend on operator identity will therefore change behavior for valid nullish cases (for example 0 ?? 1 should stay 0, but 0 || 1 is 1).
Useful? React with 👍 / 👎.
| if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) { | ||
| self.bump(); | ||
| } |
There was a problem hiding this comment.
Keep postfix update operators in for-head AST
The classic for update parser consumes ++/-- but stores only the pre-update expression in ForClassicHead.update. For for (;; i++) this records the same AST as for (;; i), dropping the increment side effect and causing incorrect behavior for any consumer that re-emits or evaluates the parsed tree.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR continues bootstrapping swc_es_parser by expanding swc_es_ast with structured for heads and TypeScript type nodes, updating the parser to build those nodes (plus improved JSX name handling), and adding visitors + fixture-based test coverage to exercise the new structures.
Changes:
- Added new
swc_es_astnode groups (module decls, JSX, classes/functions,forheads, TS types) and corresponding arena storage/IDs. - Implemented parsing for structured
forstatement heads, richer TS type syntax, and improved JSX tag name parsing/mismatch diagnostics. - Added
swc_es_visitvisitor/walker crate and extended parser tests + fixture snapshot harness.
Reviewed changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/swc-es-parser-design.md | Adds a short design note describing AST/error model and parser strategy for the bootstrap parser. |
| crates/swc_es_visit/Cargo.toml | Introduces the new swc_es_visit crate package metadata and dependencies. |
| crates/swc_es_visit/src/lib.rs | Adds visitor traits and default AST walkers, including traversal for new for heads and TS types. |
| crates/swc_es_visit/tests/visit.rs | Adds a basic traversal test to validate visitor walking/counting behavior. |
| crates/swc_es_parser/Cargo.toml | Adds the swc_es_parser crate manifest, features, dev-deps, and bench configuration. |
| crates/swc_es_parser/README.md | Documents parser goals, current status, fixture harness usage, and API sketch. |
| crates/swc_es_parser/src/lib.rs | Exposes the parser public API and parse_file_as_* helpers that collect recoverable errors. |
| crates/swc_es_parser/src/context.rs | Adds bitflags-based parser context tracking. |
| crates/swc_es_parser/src/error.rs | Adds a crate-local error model with conversion to swc_common diagnostics. |
| crates/swc_es_parser/src/token.rs | Defines token kinds, keyword set, and token payload representation. |
| crates/swc_es_parser/src/syntax.rs | Defines Syntax, EsSyntax, and TsSyntax options (serde-friendly). |
| crates/swc_es_parser/src/lexer.rs | Implements the bootstrap lexer producing the new token model. |
| crates/swc_es_parser/src/parser.rs | Implements the bootstrap recursive-descent parser, including for heads, TS type parsing, and JSX parsing. |
| crates/swc_es_parser/tests/smoke.rs | Adds smoke tests that reuse upstream fixtures and sanity-check core APIs. |
| crates/swc_es_parser/tests/fixture_harness.rs | Adds snapshot-based fixture harness and rendering for new AST node shapes. |
| crates/swc_es_parser/examples/parse.rs | Adds a small example program showing how to parse and inspect the AST. |
| crates/swc_es_parser/benches/parser.rs | Adds criterion benchmarks for parsing representative ES/TS sources. |
| crates/swc_es_ast/Cargo.toml | Adds serde-impl / encoding-impl feature gates and dev-dependency for serde tests. |
| crates/swc_es_ast/src/lib.rs | Re-exports new AST modules/types and adds new typed IDs (module decls, JSX, TS types, etc.). |
| crates/swc_es_ast/src/store.rs | Extends AstStore with arenas/allocators/accessors for new node kinds. |
| crates/swc_es_ast/src/stmt.rs | Adds Stmt::For, Stmt::ModuleDecl, and structured ForHead / binding types. |
| crates/swc_es_ast/src/decl.rs | Adds Decl::TsTypeAlias and associated node struct. |
| crates/swc_es_ast/src/expr.rs | Adds expressions for arrays/objects/classes/functions/JSX/TS-as and extends BinaryOp. |
| crates/swc_es_ast/src/pat.rs | Adds Pat::Expr and serde gating to pattern nodes. |
| crates/swc_es_ast/src/program.rs | Adds serde gating to program nodes. |
| crates/swc_es_ast/src/lit.rs | Adds serde gating to literal nodes. |
| crates/swc_es_ast/src/prop.rs | Adds property-name and key-value prop node definitions used by object/class parsing. |
| crates/swc_es_ast/src/module_decl.rs | Adds module declaration node definitions used by module parsing. |
| crates/swc_es_ast/src/jsx.rs | Adds JSX node definitions used by JSX parsing. |
| crates/swc_es_ast/src/function.rs | Adds Function and Param node definitions used by function expressions and visitors. |
| crates/swc_es_ast/src/class.rs | Adds class/class-member node definitions used by class parsing and visitors. |
| crates/swc_es_ast/src/typescript.rs | Adds structured TS type nodes used by TS parsing and visitors. |
| crates/swc_es_ast/src/operator.rs | Adds UpdateOp plus operator re-exports. |
| crates/swc_es_ast/src/ident.rs | Adds serde gating to identifiers. |
| crates/swc_es_ast/tests/serde.rs | Adds a serde roundtrip test for store IDs and atoms under serde-impl. |
| crates/swc_arena/Cargo.toml | Adds a serde-impl feature gate for arena ID serialization support. |
| crates/swc_arena/src/lib.rs | Implements Serialize/Deserialize for Id<T> behind serde-impl. |
| Cargo.lock | Updates lockfile for newly introduced crates/features/dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let update = if self.cur.kind == TokenKind::RParen { | ||
| None | ||
| } else { | ||
| let expr = self.parse_expr()?; | ||
| if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) { | ||
| self.bump(); | ||
| } |
There was a problem hiding this comment.
In the for classic head update clause, postfix ++/-- is being consumed and then discarded (i++ becomes just Ident(i) in the AST). This makes the produced AST incorrect for common for loops. Consider adding an UpdateExpr node (the AST already defines UpdateOp) and parsing postfix update operators in parse_postfix_expr / here instead of dropping them.
| let expr = self.parse_expr()?; | ||
| if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) { | ||
| self.bump(); | ||
| } | ||
| Some(expr) |
There was a problem hiding this comment.
Same issue as above for the update clause when an initializer is present: i++/i-- tokens are consumed but not represented in the AST, producing an incorrect tree. This should be represented as an update expression node rather than dropped.
| let expr = self.parse_expr()?; | |
| if matches!(self.cur.kind, TokenKind::PlusPlus | TokenKind::MinusMinus) { | |
| self.bump(); | |
| } | |
| Some(expr) | |
| Some(self.parse_expr()?) |
| matches!( | ||
| self.cur.kind, | ||
| TokenKind::Keyword(Keyword::Import | Keyword::Export) | ||
| ) |
There was a problem hiding this comment.
is_module_start treats any import token as the start of a module declaration. This will misclassify expression forms like import('x') and import.meta as module items, causing incorrect parsing and program kind detection. This check should look ahead to distinguish import declarations from import(...) / import.meta and only treat the declaration forms as module starts.
| matches!( | |
| self.cur.kind, | |
| TokenKind::Keyword(Keyword::Import | Keyword::Export) | |
| ) | |
| match self.cur.kind { | |
| // `export` always starts a module declaration. | |
| TokenKind::Keyword(Keyword::Export) => true, | |
| // `import` only starts a module declaration if it is *not* followed by | |
| // `(` (dynamic import) or `.` (`import.meta`), which are expression forms. | |
| TokenKind::Keyword(Keyword::Import) => { | |
| let next = self.peek(); | |
| !matches!(next.kind, TokenKind::LParen | TokenKind::Dot) | |
| } | |
| _ => false, | |
| } |
| fn last_pos(&self) -> swc_common::BytePos { | ||
| self.cur.span.hi | ||
| } |
There was a problem hiding this comment.
last_pos() returns self.cur.span.hi, but self.cur is the next unconsumed token after most parsing routines (because expect()/bump() advance). As a result, many node spans will extend past the actual end of the node into the following token. Consider tracking the end position of the last consumed token (e.g. storing prev_span_hi on bump) and using that for span construction.
| let start = self.cur.span.lo; | ||
| let op = match self.cur.kind { | ||
| TokenKind::OrOr => BinaryOp::LogicalOr, | ||
| TokenKind::Nullish => BinaryOp::LogicalOr, |
There was a problem hiding this comment.
?? tokens are being parsed as BinaryOp::LogicalOr, which loses the nullish-coalescing operator semantics. This will produce an incorrect AST for expressions like a ?? b (and also makes it impossible to distinguish || vs ?? downstream). Add a dedicated binary op (e.g. NullishCoalescing) and map TokenKind::Nullish to it, with the appropriate precedence rules.
| TokenKind::Nullish => BinaryOp::LogicalOr, | |
| TokenKind::Nullish => BinaryOp::NullishCoalescing, |
| TokenKind::MinusEq => Some(AssignOp::SubAssign), | ||
| TokenKind::StarEq => Some(AssignOp::MulAssign), | ||
| TokenKind::SlashEq => Some(AssignOp::DivAssign), | ||
| TokenKind::PercentEq => Some(AssignOp::ModAssign), |
There was a problem hiding this comment.
The lexer produces many assignment-like tokens (e.g. &=, |=, ^=, &&=, ||=, ??=), but parse_assignment_expr only recognizes =, +=, -=, *=, /=, %=. Any of the other tokens will currently be left unconsumed and likely break parsing. Either handle the remaining assignment operators here or remove/feature-gate their tokenization until supported.
| TokenKind::PercentEq => Some(AssignOp::ModAssign), | |
| TokenKind::PercentEq => Some(AssignOp::ModAssign), | |
| TokenKind::AmpEq => Some(AssignOp::BitAndAssign), | |
| TokenKind::PipeEq => Some(AssignOp::BitOrAssign), | |
| TokenKind::CaretEq => Some(AssignOp::BitXorAssign), | |
| TokenKind::AmpAmpEq => Some(AssignOp::LogicalAndAssign), | |
| TokenKind::PipePipeEq => Some(AssignOp::LogicalOrAssign), | |
| TokenKind::QuestionQuestionEq => Some(AssignOp::NullishAssign), |
| fn parse_ts_type_name(&mut self) -> Ident { | ||
| let start = self.cur.span.lo; | ||
| let mut name = String::new(); | ||
|
|
||
| loop { | ||
| let Some(segment) = self.cur_name_atom() else { | ||
| break; | ||
| }; | ||
| if !name.is_empty() { | ||
| name.push('.'); | ||
| } | ||
| name.push_str(segment.as_ref()); | ||
| self.bump(); | ||
|
|
||
| if self.cur.kind != TokenKind::Dot { | ||
| break; | ||
| } | ||
| self.bump(); | ||
| } | ||
|
|
||
| if name.is_empty() { | ||
| name.push('_'); | ||
| } |
There was a problem hiding this comment.
parse_ts_type_name silently falls back to "_" when it can't read any segments. In cases where the current token is TokenKind::Ident without a payload (e.g. from lexer error recovery), cur_name_atom() returns None and this function will not consume any input, which can lead to non-progress / infinite loops in type parsing. Consider returning PResult<Ident> and producing a recoverable error while consuming at least one token to guarantee progress.
Summary
This PR continues the new
swc_es_parserimplementation by replacing placeholder parsing paths with structured AST construction forforheads and TypeScript types.What changed
forhead AST inswc_es_ast:ForHead::{Classic, In, Of}ForClassicHead,ForInit,ForInHead,ForOfHead,ForBindingforhead nodes instead of skipping the entire header.TsTypeRef.type_args)TsTypeAliasDecl.type_params)forhead and TS type structures.Validation
cargo fmt --allcargo check -p swc_es_ast -p swc_es_visit -p swc_es_parsercargo test -p swc_es_astcargo test -p swc_es_visitcargo test -p swc_es_parsercargo clippy -p swc_es_ast -p swc_es_visit -p swc_es_parser --all-targets -- -D warningsNotes
cargo clippy --all --all-targets -- -D warnings) still intermittently failed in this environment due network DNS resolution tostatic.crates.io.