Skip to content

perf(swc_ecma_parser): reduce JSX identifier rescan allocations#11671

Merged
kdy1 merged 2 commits intomainfrom
kdy1/fix-11650-jsx-rescan-alloc
Mar 12, 2026
Merged

perf(swc_ecma_parser): reduce JSX identifier rescan allocations#11671
kdy1 merged 2 commits intomainfrom
kdy1/fix-11650-jsx-rescan-alloc

Conversation

@kdy1
Copy link
Copy Markdown
Member

@kdy1 kdy1 commented Mar 12, 2026

Summary

  • reduce allocations in JSX identifier rescanning by removing format!/to_string() assembly in scan_jsx_identifier
  • refactor identifier-part scanning to append into a caller-owned buffer via scan_identifier_parts_into
  • add an ASCII run fast path for identifier-part scanning to append contiguous ASCII identifier bytes in one slice operation
  • preserve unicode escape and non-ASCII behavior while avoiding extra intermediate String allocations

Tests

  • git submodule update --init --recursive
  • UPDATE=1 cargo test -p swc_ecma_parser --features verify --test jsx error -- --ignored
  • cargo test -p swc_ecma_parser --features verify --test jsx error -- --ignored
  • cargo test -p swc_ecma_parser
  • cargo fmt --all
  • cargo clippy --all --all-targets -- -D warnings

Closes #11650

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 12, 2026

⚠️ No Changeset found

Latest commit: 9555c5b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 12, 2026

Binary Sizes

File Size
swc.linux-x64-gnu.node 28M (28754632 bytes)

Commit: 98bf428

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 12, 2026

Merging this PR will not alter performance

✅ 218 untouched benchmarks


Comparing kdy1/fix-11650-jsx-rescan-alloc (9555c5b) with main (f62bfa9)

Open in CodSpeed

@kdy1 kdy1 marked this pull request as ready for review March 12, 2026 01:06
@kdy1 kdy1 requested a review from a team as a code owner March 12, 2026 01:06
Copilot AI review requested due to automatic review settings March 12, 2026 01:06
@claude

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Performance optimization for JSX identifier rescanning that reduces String allocations by appending into a caller-owned buffer and adding an ASCII fast path for contiguous identifier bytes.

Changes:

  • Refactored scan_identifier_partsscan_identifier_parts_into to append into a caller-provided String buffer instead of allocating a new one
  • Replaced format!/to_string() assembly in scan_jsx_identifier with direct input_slice_str + push_str to avoid intermediate allocations
  • Added ASCII run fast path that batches contiguous ASCII identifier bytes into a single slice append

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
crates/swc_ecma_parser/src/lexer/state.rs Core refactor: scan_identifier_parts_into, allocation-free prefix building, ASCII fast path
crates/swc_ecma_parser/tests/jsx/errors/issue-11650/input.js New test input with unicode escape in JSX identifier
crates/swc_ecma_parser/tests/jsx/errors/issue-11650/input.js.swc-stderr Expected error output for the new test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +280 to +300
let mut value = String::new();
if token.is_known_ident() || token.is_keyword() {
let prefix = unsafe {
// Safety: start and end are valid position because we got them from
// `self.input`
self.input_slice_str(start, prefix_end)
};
value.reserve(prefix.len() + v.len());
value.push_str(prefix);
} else if let Some(TokenValue::Word(prefix)) = self.state.token_value.take() {
value.reserve(prefix.len() + v.len());
value.push_str(&prefix);
} else {
format!("{token}{v}")
};
self.atom(v)
let prefix = unsafe {
// Safety: start and end are valid position because we got them from
// `self.input`
self.input_slice_str(start, prefix_end)
};
value.reserve(prefix.len() + v.len());
value.push_str(prefix);
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_known_ident() || is_keyword() branch (lines 282-288) and the else fallback branch (lines 293-299) contain identical code for building the prefix via input_slice_str. Consider extracting this into a single code path to reduce duplication — for example, merging them with a single if that handles both cases, falling through to the TokenValue::Word case only when applicable.

Copilot uses AI. Check for mistakes.
@kdy1 kdy1 enabled auto-merge (squash) March 12, 2026 01:24
@kdy1 kdy1 disabled auto-merge March 12, 2026 01:25
@kdy1 kdy1 merged commit f9214fe into main Mar 12, 2026
19 checks passed
@kdy1 kdy1 deleted the kdy1/fix-11650-jsx-rescan-alloc branch March 12, 2026 01:25
@github-actions github-actions Bot added this to the Planned milestone Mar 12, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 12, 2026

PR Review: perf(swc_ecma_parser): reduce JSX identifier rescan allocations

Summary

Good performance optimization that eliminates intermediate String allocations in JSX identifier rescanning by:

  1. Refactoring scan_identifier_partsscan_identifier_parts_into to append into a caller-owned buffer
  2. Adding an ASCII fast path that collects contiguous ASCII identifier bytes in one slice operation
  3. Replacing format!/to_string() with direct input_slice_str lookups for the prefix

Bug Fix (nice catch)

The removal of self.bump(1); // bump 'u' before self.read_unicode_escape() at line 602 is a correctness fix, not just a refactor. read_unicode_escape() already bumps u internally (line 1464 of mod.rs), so the old scan_identifier_parts was double-bumping past u, which would have caused incorrect parsing of \u escapes in identifier parts. The new test case <if-\u{2F804}> exercises this path and correctly expects an error. Good to have this covered.

Code Quality

  • The unsafe blocks for input_slice_str have appropriate safety comments and the invariants are sound — start and prefix_end/self.cur_pos() are all obtained from self.input.
  • The ASCII fast path inner loop is clean — check byte, bump if ASCII ident part, then slice the whole chunk at once.

Minor Observations

  1. Duplicated branches in scan_jsx_identifier (lines 281-299): The is_known_ident() || is_keyword() branch and the final else branch have identical bodies — both do input_slice_str(start, prefix_end). Consider extracting the common path:

    let v = if !v.is_empty() {
        let mut value = String::new();
        if let Some(TokenValue::Word(prefix)) = self.state.token_value.take() {
            value.reserve(prefix.len() + v.len());
            value.push_str(&prefix);
        } else {
            let prefix = unsafe { self.input_slice_str(start, prefix_end) };
            value.reserve(prefix.len() + v.len());
            value.push_str(prefix);
        }
        value.push_str(&v);
        self.atom(value)
    } ...

    This would eliminate the duplicated unsafe block. Though this is minor and the current form mirrors the existing pattern, so up to you.

  2. String::new() followed by reserve() (line 280): String::with_capacity(prefix.len() + v.len()) would be slightly cleaner, but this would require computing the prefix first. Current approach is fine.

Performance

The optimization is well-targeted: hot-path JSX identifier scanning no longer allocates intermediate Strings or uses format!. The ASCII fast path avoids per-byte push calls in favor of a single push_str of a slice. This should be a measurable improvement for JSX-heavy codebases.

Test Coverage

  • New fixture test for the unicode escape bug fix (issue-11650).
  • The error output .swc-stderr file is included.
  • Existing parser tests provide regression coverage.

Verdict

LGTM. Clean performance optimization with a nice correctness fix for the \u double-bump bug. The duplicated branches are a minor nit.

@github-actions github-actions Bot modified the milestones: Planned, 1.15.21 Mar 22, 2026
@swc-project swc-project locked as resolved and limited conversation to collaborators Apr 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

perf(swc_ecma_parser): reduce allocations in JSX identifier rescanning

2 participants