Skip to content

perf(html): speed up the experimental HTML parser and reduce its memory usage#21130

Merged
alexander-akait merged 11 commits into
mainfrom
perf/html-parser-optimizations
Jun 8, 2026
Merged

perf(html): speed up the experimental HTML parser and reduce its memory usage#21130
alexander-akait merged 11 commits into
mainfrom
perf/html-parser-optimizations

Conversation

@alexander-akait

Copy link
Copy Markdown
Member

Summary

The experimental HTML parser (walkHtmlTokens + buildHtmlAst, introduced in #21116) ends up on per-module hot paths, so its constant factors matter for build time and peak heap. This PR applies the same kind of low-level work the CSS tokenizer recently got, with no change to parsing behaviour:

  • Tokenizer: an ASCII char-class lookup table, bulk-scanning runs of ordinary text/RAWTEXT/RCDATA/script/PLAINTEXT and quoted attribute values instead of one code point per switch turn, lazy lowercasing of open-tag names, and skipping the isForeign check for tags that can't switch content mode.
  • Tree construction: a single reused mutable token (and reused insertion-place) instead of one object per tokenizer callback, stable hidden classes for elements/attributes, a shared frozen empty attribute list for attributeless elements, fewer throwaway allocations during construction, and a switch-based insertion-mode dispatch in place of a megamorphic keyed lookup.

Measured on a ~3 MB realistic document: tokenizing ~18% faster, full parse ~28% faster, ~37% fewer minor GCs; on text/prose-heavy input up to ~44–45% faster. No linked issue.

What kind of change does this PR introduce?

perf

Did you add tests for your changes?

No new tests — these are behaviour-preserving performance changes, verified to be byte-for-byte equivalent against the existing suites: the full html5lib tokenizer + tree-construction corpus (test/html5lib.spectest.js, 15k+ cases), test/walkHtmlTokens.unittest.js, test/buildHtmlAst.unittest.js, test/HtmlParser.unittest.js, and the HTML configCases.

Does this PR introduce a breaking change?

No.

If relevant, what needs to be documented once your changes are merged or what have you already documented?

n/a

Use of AI

Yes — these changes were written with Claude Code: it profiled the parser, proposed and implemented the optimizations, and measured before/after. Every change was validated against the full html5lib corpus and the HTML test suites, and candidates that did not show a measurable, regression-free win were discarded. All output was reviewed.


Generated by Claude Code

walkHtmlTokens: replace the per-code-point ASCII predicate chains with a
single packed Uint8Array char-class lookup table, hoist the two
closure-local predicates to module scope, and slice the
named-character-reference candidate run from the input once instead of
re-slicing per prefix length.

buildHtmlAst: move per-token tag-name membership tests off
freshly-allocated array literals (Array#includes) onto shared
module-level Sets, hoist the per-call Sets (implied-end-tags, cell
close, whitespace) to module scope, collapse inTableScope array
arguments to a string/Set fast path, guard the CR and NULL text rewrites
behind a cheap presence check, and dispatch token end-offset tracking on
type instead of the megamorphic in operator.
The tree builder allocated one token object (plus a nested pos) per
tokenizer callback and one {parent,beforeNode} per inserted node, and
the token union's varying shapes made the per-token t.* reads
megamorphic. Funnel every callback through one reused MutableToken with a
fixed shape (pos reused too) and return insertion places via one shared
object; both are consumed synchronously and never retained, and the only
buffered tokens (inTableText) are snapshotted into fresh objects. Cuts
minor GCs by ~24% on a tag-heavy document with no behaviour change
(full html5lib suite still green).
The data / RCDATA / RAWTEXT / script-data / PLAINTEXT and quoted
attribute-value states advanced one code point at a time, re-entering the
80-case state switch for every ordinary character. Fast-forward over the
run of insignificant code points in a tight inner loop that stops at the
state's delimiters (NULL included so per-character error reporting is
preserved). ~45% faster tokenizing on text-heavy input; no behaviour
change (full html5lib suite green).
…osure

walkHtmlTokens recorded the last open tag's lowercased name on every
start tag, but it is only consulted by the RAWTEXT/RCDATA/script end-tag
states. Match the content mode against the raw tag-name range
(case-insensitive, no slice) and materialize the lowercased
lastOpenTagName only when a special content mode is actually entered, so
ordinary tags allocate nothing. buildHtmlAst's attribute callback now
dedupes with a plain loop instead of an Array#some closure allocated per
attribute. ~16% faster tokenizing on attribute-heavy input.
Initialize templateContent on every element (only <template> fills it)
and serializedName on every attribute (only foreign content fills it) at
creation instead of adding the property later, so each keeps a single
monomorphic hidden class for the open-stack/scope walks and the AST
consumers. No behaviour change.
Cut intermediate objects that were built and immediately discarded while
constructing the AST (the output nodes themselves are irreducible):

- insertCharacters merges a run into the adjacent text sibling by
  appending the string directly, instead of always allocating a text
  node and letting insertAtPlace discard it on merge.
- sameAttrs compares attribute lists with a nested scan instead of
  building a Map (+ array) per formatting-element comparison.
- adjustForeignAttrs / adjustMathmlAttrs fork the attribute array lazily
  and reuse the original objects when nothing needs adjusting, instead of
  mapping to a fresh array + object per attribute on every foreign
  element (~11% faster building SVG-heavy input).

No behaviour change; full html5lib suite green.
…ments

Most elements have no attributes, yet each was given its own empty
attributes array (plus a fresh empty pendingAttrs buffer per tag). Only
<html>/<body> ever receive merged attributes and are always built with
their own mutable array, so every other attributeless element can share
one frozen EMPTY_ATTRS. The tokenizer callbacks now reuse the empty
pendingAttrs buffer instead of reallocating it, and synthesized elements
pass EMPTY_ATTRS. ~12% fewer minor GCs on attributeless-heavy input
(tables/lists/formatted text). No behaviour change; html5lib + html
configCases green.
…lookup

process() ran modes[mode](t) per token — a megamorphic keyed load over
the ~21 insertion-mode strings. Route the four dispatch sites through a
runMode() switch (cases ordered by frequency, default falling back to the
keyed load), turning the hot per-token dispatch into monomorphic direct
calls. ~3-4% faster building a large realistic document; no behaviour
change (full html5lib + html configCases green).
contentModeAfterOpenTag ran the isForeign callback (which calls
adjustedCurrent) after every open tag, but isForeign only ever vetoes a
switch *into* a special content mode — it can't turn a data-state tag
into a special one. Resolve the (allocation-free) tag-name range first
and only consult isForeign when the tag would actually enter
RAWTEXT/RCDATA/script, so ordinary tags skip the callback. Behaviour
identical; full html5lib suite green.
@changeset-bot

changeset-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 3af9ea1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
webpack Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

This PR is packaged and the instant preview is available (cd45931).

Install it locally:

  • npm
npm i -D webpack@https://pkg.pr.new/webpack@cd45931
  • yarn
yarn add -D webpack@https://pkg.pr.new/webpack@cd45931
  • pnpm
pnpm add -D webpack@https://pkg.pr.new/webpack@cd45931

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.89381% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.32%. Comparing base (5e599a1) to head (3af9ea1).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
lib/html/buildHtmlAst.js 98.95% 3 Missing ⚠️
lib/html/walkHtmlTokens.js 98.78% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (84.07%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21130      +/-   ##
==========================================
+ Coverage   92.17%   92.32%   +0.14%     
==========================================
  Files         581      581              
  Lines       62946    63179     +233     
  Branches    17422    17467      +45     
==========================================
+ Hits        58023    58331     +308     
+ Misses       4923     4848      -75     
Flag Coverage Δ
css-parsing 28.69% <ø> (+<0.01%) ⬆️
html5lib 31.07% <98.67%> (+0.30%) ⬆️
integration 88.49% <66.81%> (+0.05%) ⬆️
test262 45.30% <ø> (?)
unit 41.11% <78.53%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Object.freeze([]) is readonly never[], which TS won't narrow directly to
the mutable HtmlAttribute[]; cast through unknown (lint:types only runs
in CI, not the pre-commit hooks).
@codspeed-hq

codspeed-hq Bot commented Jun 8, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 97.69%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 123 untouched benchmarks
⏩ 18 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory benchmark "asset-modules-bytes", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 858.9 KB 320.5 KB ×2.7
Memory benchmark "react", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 332.6 KB 156.8 KB ×2.1
Memory benchmark "side-effects-reexport", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 1,186.9 KB 873.1 KB +35.95%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/html-parser-optimizations (3af9ea1) with main (d39efba)

Open in CodSpeed

Footnotes

  1. 18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Types Coverage

Coverage after merging perf/html-parser-optimizations into main will be
99.00%
Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
bin
   webpack.js98.77%100%100%98.77%91
examples
   build-common.js100%100%100%100%
   buildAll.js100%100%100%100%
   examples.js100%100%100%100%
   template-common.js98.21%100%100%98.21%72
examples/custom-javascript-parser
   test.filter.js100%100%100%100%
examples/custom-javascript-parser/internals
   acorn-parse.js100%100%100%100%
   meriyah-parse.js100%100%100%100%
   oxc-parse.js91.30%100%100%91.30%140, 142–143, 145, 147, 153–154, 161, 168, 90
examples/markdown
   webpack.config.mjs100%100%100%100%
examples/typescript
   test.filter.js100%100%100%100%
examples/typescript-non-erasable
   test.filter.js50%100%100%50%5
examples/virtual-modules
   test.filter.js100%100%100%100%
examples/wasm-bindgen-esm
   test.filter.js100%100%100%100%
examples/wasm-complex
   test.filter.js100%100%100%100%
examples/wasm-simple
   test.filter.js100%100%100%100%
examples/wasm-simple-source-phase
   test.filter.js100%100%100%100%
lib
   APIPlugin.js100%100%100%100%
   AsyncDependenciesBlock.js100%100%100%100%
   AutomaticPrefetchPlugin.js100%100%100%100%
   BannerPlugin.js100%100%100%100%
   Cache.js98.21%100%100%98.21%101
   CacheFacade.js100%100%100%100%
   Chunk.js99.72%100%100%99.72%39
   ChunkGraph.js100%100%100%100%
   ChunkGroup.js100%100%100%100%
   ChunkTemplate.js100%100%100%100%
   CleanPlugin.js99.15%100%100%99.15%206, 226
   CodeGenerationResults.js100%100%100%100%
   CompatibilityPlugin.js100%100%100%100%
   Compilation.js98.49%100%100%98.49%1577, 1873, 1880, 1888, 1910, 2806, 3249, 3924, 3954, 4007–4008, 4012, 4017, 4033–4034, 4048–4049, 4054–4055, 4532, 4558, 512, 517, 5366, 5398, 5415, 5431, 5447, 5462, 5487–5488, 5490, 5818, 5823, 5829, 5832, 5844, 5846, 5850, 5866, 5881, 5913, 5967, 5991, 6105, 731–732
   Compiler.js99.56%100%100%99.56%1135–1136, 1144
   ConcatenationScope.js98.59%100%100%98.59%189
   ConditionalInitFragment.js100%100%100%100%
   ConstPlugin.js100%100%100%100%
   ContextExclusionPlugin.js100%100%100%100%
   ContextModule.js100%100%100%100%
   ContextModuleFactory.js97.40%100%100%97.40%258, 395, 418, 420, 424, 433–434
   ContextReplacementPlugin.js100%100%100%100%
   DefinePlugin.js99%100%100%99%170–171, 187, 206, 280
   DependenciesBlock.js100%100%100%100%
   Dependency.js98.15%100%100%98.15%379, 425
   DependencyTemplate.js100%100%100%100%
   DependencyTemplates.js100%100%100%100%
   DotenvPlugin.js98.41%100%100%98.41%378, 391–392
   DynamicEntryPlugin.js100%100%100%100%
   EntryOptionPlugin.js100%100%100%100%
   EntryPlugin.js100%100%100%100%
   Entrypoint.js100%100%100%100%
   EnvironmentPlugin.js97.14%100%100%97.14%49
   ErrorHelpers.js100%100%100%100%
   EvalDevToolModulePlugin.js100%100%100%100%
   EvalSourceMapDevToolPlugin.js100%100%100%100%
   ExportsInfo.js100%100%100%100%
   ExportsInfoApiPlugin.js100%100%100%100%
   ExternalModule.js98.97%100%100%98.97%425–429, 577
   ExternalModuleFactoryPlugin.js100%100%100%100%
   ExternalsPlugin.js100%100%100%100%
   FileSystemInfo.js99.50%100%100%99.50%182, 2252–2253, 2256, 2267, 2278, 2289, 278, 3693, 3708, 3732
   FlagAllModulesAsUsedPlugin.js100%100%100%100%
   FlagDependencyExportsPlugin.js98.85%100%100%98.85%434, 436, 440
   FlagDependencyUsagePlugin.js100%100%100%100%
   FlagEntryExportAsUsedPlugin.js100%100%100%100%
   Generator.js100%100%100%100%
   HotModuleReplacementPlugin.js100%100%100%100%
   HotUpdateChunk.js100%100%100%100%
   IgnorePlugin.js100%100%100%100%
   IgnoreWarningsPlugin.js100%100%100%100%
   InitFragment.js100%100%100%100%
   JavascriptMetaInfoPlugin.js100%100%100%100%
   LibraryTemplatePlugin.js100%100%100%100%
   LoaderOptionsPlugin.js100%100%100%100%
   LoaderTargetPlugin.js100%100%100%100%
   MainTemplate.js100%100%100%100%
   ManifestPlugin.js100%100%100%100%
   Module.js98.50%100%100%98.50%1312, 1317, 1377, 1391, 1453, 1462
   ModuleFactory.js100%100%100%100%
   ModuleFilenameHelpers.js98.85%100%100%98.85%106, 108
   ModuleGraph.js99.73%100%100%99.73%1005
   ModuleGraphConnection.js100%100%100%100%
   ModuleInfoHeaderPlugin.js100%100%100%100%
   ModuleNotFoundError.js100%100%100%100%
   ModuleProfile.js100%100%100%100%
   ModuleSourceTypeConstants.js100%100%100%100%
   ModuleTemplate.js100%100%100%100%
   ModuleTypeConstants.js100%100%100%100%
   MultiCompiler.js99.69%100%100%99.69%659
   MultiStats.js100%100%100%100%
   MultiWatching.js100%100%100%100%
   NoEmitOnErrorsPlugin.js100%100%100%100%
   NodeStuffPlugin.js100%100%100%100%
   NormalModule.js98.15%100%100%98.15%1212, 1215, 1232, 1249, 1496, 1530, 1546, 1633, 2288, 2293–2303, 569
   NormalModuleFactory.js99.47%100%100%99.47%1083, 1392, 486, 498
   NormalModuleReplacementPlugin.js100%100%100%100%
   NullFactory.js100%100%100%100%
   OptimizationStages.js100%100%100%100%
   OptionsApply.js100%100%100%100%
   Parser.js100%100%100%100%
   PlatformPlugin.js100%100%100%100%
   PrefetchPlugin.js100%100%100%100%
   ProgressPlugin.js98.85%100%100%98.85%519–520, 525, 527, 591
   ProvidePlugin.js100%100%100%100%
   RawModule.js100%100%100%100%
   RecordIdsPlugin.js100%100%100%100%
   RequestShortener.js100%100%100%100%
   ResolverFactory.js100%100%100%100%
   RuntimeGlobals.js100%100%100%100%
   RuntimeModule.js100%100%100%100%
   RuntimePlugin.js100%100%100%100%
   RuntimeTemplate.js100%100%100%100%
   SelfModuleFactory.js100%100%100%100%
   SingleEntryPlugin.js100%100%100%100%
   SourceMapDevToolModuleOptionsPlugin.js100%100%100%100%
   SourceMapDevToolPlugin.js98.62%100%100%98.62%220, 224, 226, 419, 430, 891
   Stats.js100%100%100%100%
   Template.js100%100%100%100%
   TemplatedPathPlugin.js99.13%100%100%99.13%176–177
   UseStrictPlugin.js100%100%100%100%
   WarnCaseSensitiveModulesPlugin.js100%100%100%100%
   WarnDeprecatedOptionPlugin.js100%100%100%100%
   WarnNoModeSetPlugin.js100%100%100%100%
   WatchIgnorePlugin.js100%100%100%100%
   Watching.js100%100%100%100%
   WebpackError.js100%100%100%100%
   WebpackIsIncludedPlugin.js100%100%100%100%
   WebpackOptionsApply.js100%100%100%100%
   WebpackOptionsDefaulter.js100%100%100%100%
   buildChunkGraph.js99.87%100%100%99.87%326
   cli.js98.62%100%100%98.62%10, 119, 545, 577, 627, 897
   index.js99.72%100%100%99.72%165
   validateSchema.js94.67%100%100%94.67%100, 87, 89, 98
   webpack.js96.33%100%100%96.33%10, 198, 220, 222
lib/asset
   AssetBytesGenerator.js100%100%100%100%
   AssetBytesParser.js100%100%100%100%
   AssetGenerator.js100%100%100%100%
   AssetModulesPlugin.js97.32%100%100%97.32%283, 307, 310, 36, 362, 41
   AssetParser.js100%100%100%100%
   AssetSourceGenerator.js100%100%100%100%
   AssetSourceParser.js100%100%100%100%
   RawDataUrlModule.js100%100%100%100%
lib/async-modules
   AsyncModuleHelpers.js100%100%100%100%
   AwaitDependenciesInitFragment.js100%100%100%100%
   InferAsyncModulesPlugin.js100%100%100%100%
lib/cache
   AddBuildDependenciesPlugin.js100%100%100%100%
   AddManagedPathsPlugin.js100%100%100%100%
   IdleFileCachePlugin.js97.92%100%100%97.92%71, 83, 91
   MemoryCachePlugin.js95.83%100%100%95.83%33
   MemoryWithGcCachePlugin.js93.15%100%100%93.15%106, 113–114, 122, 89
   PackFileCacheStrategy.js96.40%100%100%96.40%1250, 1350, 1354, 1416, 628, 647, 657–659, 661, 677–678, 683, 686, 688, 693, 698, 722, 728, 762, 768, 774, 779, 790, 799, 804–805, 807, 824, 830–831, 833
   ResolverCachePlugin.js100%100%100%100%
   getLazyHashedEtag.js100%100%100%100%
   mergeEtags.js100%100%100%100%
lib/config
   browserslistTargetHandler.js100%100%100%100%
   defaults.js99.30%100%100%99.30%1428–1430, 1438, 273, 276, 281, 285
   normalization.js99.01%100%100%99.01%191–192, 258, 273
   target.js100%100%100%100%
lib/container
   ContainerEntryDependency.js100%100%100%100%
   ContainerEntryModule.js100%100%100%100%
   ContainerEntryModuleFactory.js100%100%100%100%
   ContainerExposedDependency.js100%100%100%100%
   ContainerPlugin.js100%100%100%100%
   ContainerReferencePlugin.js100%100%100%100%
   FallbackDependency.js100%100%100%100%
   

@alexander-akait alexander-akait merged commit cd45931 into main Jun 8, 2026
63 of 66 checks passed
@alexander-akait alexander-akait deleted the perf/html-parser-optimizations branch June 8, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant