perf(html): speed up the experimental HTML parser and reduce its memory usage#21130
Conversation
walkHtmlTokens: replace the per-code-point ASCII predicate chains with a single packed Uint8Array char-class lookup table, hoist the two closure-local predicates to module scope, and slice the named-character-reference candidate run from the input once instead of re-slicing per prefix length. buildHtmlAst: move per-token tag-name membership tests off freshly-allocated array literals (Array#includes) onto shared module-level Sets, hoist the per-call Sets (implied-end-tags, cell close, whitespace) to module scope, collapse inTableScope array arguments to a string/Set fast path, guard the CR and NULL text rewrites behind a cheap presence check, and dispatch token end-offset tracking on type instead of the megamorphic in operator.
The tree builder allocated one token object (plus a nested pos) per
tokenizer callback and one {parent,beforeNode} per inserted node, and
the token union's varying shapes made the per-token t.* reads
megamorphic. Funnel every callback through one reused MutableToken with a
fixed shape (pos reused too) and return insertion places via one shared
object; both are consumed synchronously and never retained, and the only
buffered tokens (inTableText) are snapshotted into fresh objects. Cuts
minor GCs by ~24% on a tag-heavy document with no behaviour change
(full html5lib suite still green).
The data / RCDATA / RAWTEXT / script-data / PLAINTEXT and quoted attribute-value states advanced one code point at a time, re-entering the 80-case state switch for every ordinary character. Fast-forward over the run of insignificant code points in a tight inner loop that stops at the state's delimiters (NULL included so per-character error reporting is preserved). ~45% faster tokenizing on text-heavy input; no behaviour change (full html5lib suite green).
…osure walkHtmlTokens recorded the last open tag's lowercased name on every start tag, but it is only consulted by the RAWTEXT/RCDATA/script end-tag states. Match the content mode against the raw tag-name range (case-insensitive, no slice) and materialize the lowercased lastOpenTagName only when a special content mode is actually entered, so ordinary tags allocate nothing. buildHtmlAst's attribute callback now dedupes with a plain loop instead of an Array#some closure allocated per attribute. ~16% faster tokenizing on attribute-heavy input.
Initialize templateContent on every element (only <template> fills it) and serializedName on every attribute (only foreign content fills it) at creation instead of adding the property later, so each keeps a single monomorphic hidden class for the open-stack/scope walks and the AST consumers. No behaviour change.
Cut intermediate objects that were built and immediately discarded while constructing the AST (the output nodes themselves are irreducible): - insertCharacters merges a run into the adjacent text sibling by appending the string directly, instead of always allocating a text node and letting insertAtPlace discard it on merge. - sameAttrs compares attribute lists with a nested scan instead of building a Map (+ array) per formatting-element comparison. - adjustForeignAttrs / adjustMathmlAttrs fork the attribute array lazily and reuse the original objects when nothing needs adjusting, instead of mapping to a fresh array + object per attribute on every foreign element (~11% faster building SVG-heavy input). No behaviour change; full html5lib suite green.
…ments Most elements have no attributes, yet each was given its own empty attributes array (plus a fresh empty pendingAttrs buffer per tag). Only <html>/<body> ever receive merged attributes and are always built with their own mutable array, so every other attributeless element can share one frozen EMPTY_ATTRS. The tokenizer callbacks now reuse the empty pendingAttrs buffer instead of reallocating it, and synthesized elements pass EMPTY_ATTRS. ~12% fewer minor GCs on attributeless-heavy input (tables/lists/formatted text). No behaviour change; html5lib + html configCases green.
…lookup process() ran modes[mode](t) per token — a megamorphic keyed load over the ~21 insertion-mode strings. Route the four dispatch sites through a runMode() switch (cases ordered by frequency, default falling back to the keyed load), turning the hot per-token dispatch into monomorphic direct calls. ~3-4% faster building a large realistic document; no behaviour change (full html5lib + html configCases green).
contentModeAfterOpenTag ran the isForeign callback (which calls adjustedCurrent) after every open tag, but isForeign only ever vetoes a switch *into* a special content mode — it can't turn a data-state tag into a special one. Resolve the (allocation-free) tag-name range first and only consult isForeign when the tag would actually enter RAWTEXT/RCDATA/script, so ordinary tags skip the callback. Behaviour identical; full html5lib suite green.
🦋 Changeset detectedLatest commit: 3af9ea1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This PR is packaged and the instant preview is available (cd45931). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@cd45931
yarn add -D webpack@https://pkg.pr.new/webpack@cd45931
pnpm add -D webpack@https://pkg.pr.new/webpack@cd45931 |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (84.07%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #21130 +/- ##
==========================================
+ Coverage 92.17% 92.32% +0.14%
==========================================
Files 581 581
Lines 62946 63179 +233
Branches 17422 17467 +45
==========================================
+ Hits 58023 58331 +308
+ Misses 4923 4848 -75
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Object.freeze([]) is readonly never[], which TS won't narrow directly to the mutable HtmlAttribute[]; cast through unknown (lint:types only runs in CI, not the pre-commit hooks).
Merging this PR will improve performance by 97.69%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Memory | benchmark "asset-modules-bytes", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
858.9 KB | 320.5 KB | ×2.7 |
| ⚡ | Memory | benchmark "react", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
332.6 KB | 156.8 KB | ×2.1 |
| ⚡ | Memory | benchmark "side-effects-reexport", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
1,186.9 KB | 873.1 KB | +35.95% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/html-parser-optimizations (3af9ea1) with main (d39efba)
Footnotes
-
18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
The experimental HTML parser (
walkHtmlTokens+buildHtmlAst, introduced in #21116) ends up on per-module hot paths, so its constant factors matter for build time and peak heap. This PR applies the same kind of low-level work the CSS tokenizer recently got, with no change to parsing behaviour:isForeigncheck for tags that can't switch content mode.switch-based insertion-mode dispatch in place of a megamorphic keyed lookup.Measured on a ~3 MB realistic document: tokenizing ~18% faster, full parse ~28% faster, ~37% fewer minor GCs; on text/prose-heavy input up to ~44–45% faster. No linked issue.
What kind of change does this PR introduce?
perf
Did you add tests for your changes?
No new tests — these are behaviour-preserving performance changes, verified to be byte-for-byte equivalent against the existing suites: the full html5lib tokenizer + tree-construction corpus (
test/html5lib.spectest.js, 15k+ cases),test/walkHtmlTokens.unittest.js,test/buildHtmlAst.unittest.js,test/HtmlParser.unittest.js, and the HTMLconfigCases.Does this PR introduce a breaking change?
No.
If relevant, what needs to be documented once your changes are merged or what have you already documented?
n/a
Use of AI
Yes — these changes were written with Claude Code: it profiled the parser, proposed and implemented the optimizations, and measured before/after. Every change was validated against the full html5lib corpus and the HTML test suites, and candidates that did not show a measurable, regression-free win were discarded. All output was reviewed.
Generated by Claude Code