perf(html): speed up the experimental HTML parser and reduce its memory usage by alexander-akait · Pull Request #21130 · webpack/webpack

alexander-akait · 2026-06-08T15:39:46Z

Summary

The experimental HTML parser (walkHtmlTokens + buildHtmlAst, introduced in #21116) ends up on per-module hot paths, so its constant factors matter for build time and peak heap. This PR applies the same kind of low-level work the CSS tokenizer recently got, with no change to parsing behaviour:

Tokenizer: an ASCII char-class lookup table, bulk-scanning runs of ordinary text/RAWTEXT/RCDATA/script/PLAINTEXT and quoted attribute values instead of one code point per switch turn, lazy lowercasing of open-tag names, and skipping the isForeign check for tags that can't switch content mode.
Tree construction: a single reused mutable token (and reused insertion-place) instead of one object per tokenizer callback, stable hidden classes for elements/attributes, a shared frozen empty attribute list for attributeless elements, fewer throwaway allocations during construction, and a switch-based insertion-mode dispatch in place of a megamorphic keyed lookup.

Measured on a ~3 MB realistic document: tokenizing ~18% faster, full parse ~28% faster, ~37% fewer minor GCs; on text/prose-heavy input up to ~44–45% faster. No linked issue.

What kind of change does this PR introduce?

perf

Did you add tests for your changes?

No new tests — these are behaviour-preserving performance changes, verified to be byte-for-byte equivalent against the existing suites: the full html5lib tokenizer + tree-construction corpus (test/html5lib.spectest.js, 15k+ cases), test/walkHtmlTokens.unittest.js, test/buildHtmlAst.unittest.js, test/HtmlParser.unittest.js, and the HTML configCases.

Does this PR introduce a breaking change?

No.

If relevant, what needs to be documented once your changes are merged or what have you already documented?

n/a

Use of AI

Yes — these changes were written with Claude Code: it profiled the parser, proposed and implemented the optimizations, and measured before/after. Every change was validated against the full html5lib corpus and the HTML test suites, and candidates that did not show a measurable, regression-free win were discarded. All output was reviewed.

Generated by Claude Code

walkHtmlTokens: replace the per-code-point ASCII predicate chains with a single packed Uint8Array char-class lookup table, hoist the two closure-local predicates to module scope, and slice the named-character-reference candidate run from the input once instead of re-slicing per prefix length. buildHtmlAst: move per-token tag-name membership tests off freshly-allocated array literals (Array#includes) onto shared module-level Sets, hoist the per-call Sets (implied-end-tags, cell close, whitespace) to module scope, collapse inTableScope array arguments to a string/Set fast path, guard the CR and NULL text rewrites behind a cheap presence check, and dispatch token end-offset tracking on type instead of the megamorphic in operator.

The tree builder allocated one token object (plus a nested pos) per tokenizer callback and one {parent,beforeNode} per inserted node, and the token union's varying shapes made the per-token t.* reads megamorphic. Funnel every callback through one reused MutableToken with a fixed shape (pos reused too) and return insertion places via one shared object; both are consumed synchronously and never retained, and the only buffered tokens (inTableText) are snapshotted into fresh objects. Cuts minor GCs by ~24% on a tag-heavy document with no behaviour change (full html5lib suite still green).

The data / RCDATA / RAWTEXT / script-data / PLAINTEXT and quoted attribute-value states advanced one code point at a time, re-entering the 80-case state switch for every ordinary character. Fast-forward over the run of insignificant code points in a tight inner loop that stops at the state's delimiters (NULL included so per-character error reporting is preserved). ~45% faster tokenizing on text-heavy input; no behaviour change (full html5lib suite green).

…osure walkHtmlTokens recorded the last open tag's lowercased name on every start tag, but it is only consulted by the RAWTEXT/RCDATA/script end-tag states. Match the content mode against the raw tag-name range (case-insensitive, no slice) and materialize the lowercased lastOpenTagName only when a special content mode is actually entered, so ordinary tags allocate nothing. buildHtmlAst's attribute callback now dedupes with a plain loop instead of an Array#some closure allocated per attribute. ~16% faster tokenizing on attribute-heavy input.

Initialize templateContent on every element (only <template> fills it) and serializedName on every attribute (only foreign content fills it) at creation instead of adding the property later, so each keeps a single monomorphic hidden class for the open-stack/scope walks and the AST consumers. No behaviour change.

Cut intermediate objects that were built and immediately discarded while constructing the AST (the output nodes themselves are irreducible): - insertCharacters merges a run into the adjacent text sibling by appending the string directly, instead of always allocating a text node and letting insertAtPlace discard it on merge. - sameAttrs compares attribute lists with a nested scan instead of building a Map (+ array) per formatting-element comparison. - adjustForeignAttrs / adjustMathmlAttrs fork the attribute array lazily and reuse the original objects when nothing needs adjusting, instead of mapping to a fresh array + object per attribute on every foreign element (~11% faster building SVG-heavy input). No behaviour change; full html5lib suite green.

…ments Most elements have no attributes, yet each was given its own empty attributes array (plus a fresh empty pendingAttrs buffer per tag). Only <html>/<body> ever receive merged attributes and are always built with their own mutable array, so every other attributeless element can share one frozen EMPTY_ATTRS. The tokenizer callbacks now reuse the empty pendingAttrs buffer instead of reallocating it, and synthesized elements pass EMPTY_ATTRS. ~12% fewer minor GCs on attributeless-heavy input (tables/lists/formatted text). No behaviour change; html5lib + html configCases green.

…lookup process() ran modes[mode](t) per token — a megamorphic keyed load over the ~21 insertion-mode strings. Route the four dispatch sites through a runMode() switch (cases ordered by frequency, default falling back to the keyed load), turning the hot per-token dispatch into monomorphic direct calls. ~3-4% faster building a large realistic document; no behaviour change (full html5lib + html configCases green).

contentModeAfterOpenTag ran the isForeign callback (which calls adjustedCurrent) after every open tag, but isForeign only ever vetoes a switch *into* a special content mode — it can't turn a data-state tag into a special one. Resolve the (allocation-free) tag-name range first and only consult isForeign when the tag would actually enter RAWTEXT/RCDATA/script, so ordinary tags skip the callback. Behaviour identical; full html5lib suite green.

changeset-bot · 2026-06-08T15:39:52Z

🦋 Changeset detected

Latest commit: 3af9ea1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
webpack	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-08T15:43:42Z

This PR is packaged and the instant preview is available (cd45931).

Install it locally:

npm

npm i -D webpack@https://pkg.pr.new/webpack@cd45931

yarn

yarn add -D webpack@https://pkg.pr.new/webpack@cd45931

pnpm

pnpm add -D webpack@https://pkg.pr.new/webpack@cd45931

codecov · 2026-06-08T15:45:02Z

Codecov Report

❌ Patch coverage is 98.89381% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.32%. Comparing base (5e599a1) to head (3af9ea1).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/html/buildHtmlAst.js	98.95%	3 Missing ⚠️
lib/html/walkHtmlTokens.js	98.78%	2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (84.07%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #21130      +/-   ##
==========================================
+ Coverage   92.17%   92.32%   +0.14%     
==========================================
  Files         581      581              
  Lines       62946    63179     +233     
  Branches    17422    17467      +45     
==========================================
+ Hits        58023    58331     +308     
+ Misses       4923     4848      -75

Flag	Coverage Δ
css-parsing	`28.69% <ø> (+<0.01%)`	⬆️
html5lib	`31.07% <98.67%> (+0.30%)`	⬆️
integration	`88.49% <66.81%> (+0.05%)`	⬆️
test262	`45.30% <ø> (?)`
unit	`41.11% <78.53%> (+0.16%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Object.freeze([]) is readonly never[], which TS won't narrow directly to the mutable HtmlAttribute[]; cast through unknown (lint:types only runs in CI, not the pre-commit hooks).

codspeed-hq · 2026-06-08T15:47:37Z

Merging this PR will improve performance by 97.69%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 123 untouched benchmarks
⏩ 18 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`benchmark "asset-modules-bytes", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}'`	858.9 KB	320.5 KB	×2.7
⚡	Memory	`benchmark "react", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}'`	332.6 KB	156.8 KB	×2.1
⚡	Memory	`benchmark "side-effects-reexport", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}'`	1,186.9 KB	873.1 KB	+35.95%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing perf/html-parser-optimizations (3af9ea1) with main (d39efba)}

18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2026-06-08T15:48:01Z

Types Coverage

Coverage after merging perf/html-parser-optimizations into main will be

99.00%

Coverage Report

File	Stmts	Branches	Funcs	Lines	Uncovered Lines
bin
webpack.js	98.77%	100%	100%	98.77%	91
examples
build-common.js	100%	100%	100%	100%
buildAll.js	100%	100%	100%	100%
examples.js	100%	100%	100%	100%
template-common.js	98.21%	100%	100%	98.21%	72
examples/custom-javascript-parser
test.filter.js	100%	100%	100%	100%
examples/custom-javascript-parser/internals
acorn-parse.js	100%	100%	100%	100%
meriyah-parse.js	100%	100%	100%	100%
oxc-parse.js	91.30%	100%	100%	91.30%	140, 142–143, 145, 147, 153–154, 161, 168, 90
examples/markdown
webpack.config.mjs	100%	100%	100%	100%
examples/typescript
test.filter.js	100%	100%	100%	100%
examples/typescript-non-erasable
test.filter.js	50%	100%	100%	50%	5
examples/virtual-modules
test.filter.js	100%	100%	100%	100%
examples/wasm-bindgen-esm
test.filter.js	100%	100%	100%	100%
examples/wasm-complex
test.filter.js	100%	100%	100%	100%
examples/wasm-simple
test.filter.js	100%	100%	100%	100%
examples/wasm-simple-source-phase
test.filter.js	100%	100%	100%	100%
lib
APIPlugin.js	100%	100%	100%	100%
AsyncDependenciesBlock.js	100%	100%	100%	100%
AutomaticPrefetchPlugin.js	100%	100%	100%	100%
BannerPlugin.js	100%	100%	100%	100%
Cache.js	98.21%	100%	100%	98.21%	101
CacheFacade.js	100%	100%	100%	100%
Chunk.js	99.72%	100%	100%	99.72%	39
ChunkGraph.js	100%	100%	100%	100%
ChunkGroup.js	100%	100%	100%	100%
ChunkTemplate.js	100%	100%	100%	100%
CleanPlugin.js	99.15%	100%	100%	99.15%	206, 226
CodeGenerationResults.js	100%	100%	100%	100%
CompatibilityPlugin.js	100%	100%	100%	100%
Compilation.js	98.49%	100%	100%	98.49%	1577, 1873, 1880, 1888, 1910, 2806, 3249, 3924, 3954, 4007–4008, 4012, 4017, 4033–4034, 4048–4049, 4054–4055, 4532, 4558, 512, 517, 5366, 5398, 5415, 5431, 5447, 5462, 5487–5488, 5490, 5818, 5823, 5829, 5832, 5844, 5846, 5850, 5866, 5881, 5913, 5967, 5991, 6105, 731–732
Compiler.js	99.56%	100%	100%	99.56%	1135–1136, 1144
ConcatenationScope.js	98.59%	100%	100%	98.59%	189
ConditionalInitFragment.js	100%	100%	100%	100%
ConstPlugin.js	100%	100%	100%	100%
ContextExclusionPlugin.js	100%	100%	100%	100%
ContextModule.js	100%	100%	100%	100%
ContextModuleFactory.js	97.40%	100%	100%	97.40%	258, 395, 418, 420, 424, 433–434
ContextReplacementPlugin.js	100%	100%	100%	100%
DefinePlugin.js	99%	100%	100%	99%	170–171, 187, 206, 280
DependenciesBlock.js	100%	100%	100%	100%
Dependency.js	98.15%	100%	100%	98.15%	379, 425
DependencyTemplate.js	100%	100%	100%	100%
DependencyTemplates.js	100%	100%	100%	100%
DotenvPlugin.js	98.41%	100%	100%	98.41%	378, 391–392
DynamicEntryPlugin.js	100%	100%	100%	100%
EntryOptionPlugin.js	100%	100%	100%	100%
EntryPlugin.js	100%	100%	100%	100%
Entrypoint.js	100%	100%	100%	100%
EnvironmentPlugin.js	97.14%	100%	100%	97.14%	49
ErrorHelpers.js	100%	100%	100%	100%
EvalDevToolModulePlugin.js	100%	100%	100%	100%
EvalSourceMapDevToolPlugin.js	100%	100%	100%	100%
ExportsInfo.js	100%	100%	100%	100%
ExportsInfoApiPlugin.js	100%	100%	100%	100%
ExternalModule.js	98.97%	100%	100%	98.97%	425–429, 577
ExternalModuleFactoryPlugin.js	100%	100%	100%	100%
ExternalsPlugin.js	100%	100%	100%	100%
FileSystemInfo.js	99.50%	100%	100%	99.50%	182, 2252–2253, 2256, 2267, 2278, 2289, 278, 3693, 3708, 3732
FlagAllModulesAsUsedPlugin.js	100%	100%	100%	100%
FlagDependencyExportsPlugin.js	98.85%	100%	100%	98.85%	434, 436, 440
FlagDependencyUsagePlugin.js	100%	100%	100%	100%
FlagEntryExportAsUsedPlugin.js	100%	100%	100%	100%
Generator.js	100%	100%	100%	100%
HotModuleReplacementPlugin.js	100%	100%	100%	100%
HotUpdateChunk.js	100%	100%	100%	100%
IgnorePlugin.js	100%	100%	100%	100%
IgnoreWarningsPlugin.js	100%	100%	100%	100%
InitFragment.js	100%	100%	100%	100%
JavascriptMetaInfoPlugin.js	100%	100%	100%	100%
LibraryTemplatePlugin.js	100%	100%	100%	100%
LoaderOptionsPlugin.js	100%	100%	100%	100%
LoaderTargetPlugin.js	100%	100%	100%	100%
MainTemplate.js	100%	100%	100%	100%
ManifestPlugin.js	100%	100%	100%	100%
Module.js	98.50%	100%	100%	98.50%	1312, 1317, 1377, 1391, 1453, 1462
ModuleFactory.js	100%	100%	100%	100%
ModuleFilenameHelpers.js	98.85%	100%	100%	98.85%	106, 108
ModuleGraph.js	99.73%	100%	100%	99.73%	1005
ModuleGraphConnection.js	100%	100%	100%	100%
ModuleInfoHeaderPlugin.js	100%	100%	100%	100%
ModuleNotFoundError.js	100%	100%	100%	100%
ModuleProfile.js	100%	100%	100%	100%
ModuleSourceTypeConstants.js	100%	100%	100%	100%
ModuleTemplate.js	100%	100%	100%	100%
ModuleTypeConstants.js	100%	100%	100%	100%
MultiCompiler.js	99.69%	100%	100%	99.69%	659
MultiStats.js	100%	100%	100%	100%
MultiWatching.js	100%	100%	100%	100%
NoEmitOnErrorsPlugin.js	100%	100%	100%	100%
NodeStuffPlugin.js	100%	100%	100%	100%
NormalModule.js	98.15%	100%	100%	98.15%	1212, 1215, 1232, 1249, 1496, 1530, 1546, 1633, 2288, 2293–2303, 569
NormalModuleFactory.js	99.47%	100%	100%	99.47%	1083, 1392, 486, 498
NormalModuleReplacementPlugin.js	100%	100%	100%	100%
NullFactory.js	100%	100%	100%	100%
OptimizationStages.js	100%	100%	100%	100%
OptionsApply.js	100%	100%	100%	100%
Parser.js	100%	100%	100%	100%
PlatformPlugin.js	100%	100%	100%	100%
PrefetchPlugin.js	100%	100%	100%	100%
ProgressPlugin.js	98.85%	100%	100%	98.85%	519–520, 525, 527, 591
ProvidePlugin.js	100%	100%	100%	100%
RawModule.js	100%	100%	100%	100%
RecordIdsPlugin.js	100%	100%	100%	100%
RequestShortener.js	100%	100%	100%	100%
ResolverFactory.js	100%	100%	100%	100%
RuntimeGlobals.js	100%	100%	100%	100%
RuntimeModule.js	100%	100%	100%	100%
RuntimePlugin.js	100%	100%	100%	100%
RuntimeTemplate.js	100%	100%	100%	100%
SelfModuleFactory.js	100%	100%	100%	100%
SingleEntryPlugin.js	100%	100%	100%	100%
SourceMapDevToolModuleOptionsPlugin.js	100%	100%	100%	100%
SourceMapDevToolPlugin.js	98.62%	100%	100%	98.62%	220, 224, 226, 419, 430, 891
Stats.js	100%	100%	100%	100%
Template.js	100%	100%	100%	100%
TemplatedPathPlugin.js	99.13%	100%	100%	99.13%	176–177
UseStrictPlugin.js	100%	100%	100%	100%
WarnCaseSensitiveModulesPlugin.js	100%	100%	100%	100%
WarnDeprecatedOptionPlugin.js	100%	100%	100%	100%
WarnNoModeSetPlugin.js	100%	100%	100%	100%
WatchIgnorePlugin.js	100%	100%	100%	100%
Watching.js	100%	100%	100%	100%
WebpackError.js	100%	100%	100%	100%
WebpackIsIncludedPlugin.js	100%	100%	100%	100%
WebpackOptionsApply.js	100%	100%	100%	100%
WebpackOptionsDefaulter.js	100%	100%	100%	100%
buildChunkGraph.js	99.87%	100%	100%	99.87%	326
cli.js	98.62%	100%	100%	98.62%	10, 119, 545, 577, 627, 897
index.js	99.72%	100%	100%	99.72%	165
validateSchema.js	94.67%	100%	100%	94.67%	100, 87, 89, 98
webpack.js	96.33%	100%	100%	96.33%	10, 198, 220, 222
lib/asset
AssetBytesGenerator.js	100%	100%	100%	100%
AssetBytesParser.js	100%	100%	100%	100%
AssetGenerator.js	100%	100%	100%	100%
AssetModulesPlugin.js	97.32%	100%	100%	97.32%	283, 307, 310, 36, 362, 41
AssetParser.js	100%	100%	100%	100%
AssetSourceGenerator.js	100%	100%	100%	100%
AssetSourceParser.js	100%	100%	100%	100%
RawDataUrlModule.js	100%	100%	100%	100%
lib/async-modules
AsyncModuleHelpers.js	100%	100%	100%	100%
AwaitDependenciesInitFragment.js	100%	100%	100%	100%
InferAsyncModulesPlugin.js	100%	100%	100%	100%
lib/cache
AddBuildDependenciesPlugin.js	100%	100%	100%	100%
AddManagedPathsPlugin.js	100%	100%	100%	100%
IdleFileCachePlugin.js	97.92%	100%	100%	97.92%	71, 83, 91
MemoryCachePlugin.js	95.83%	100%	100%	95.83%	33
MemoryWithGcCachePlugin.js	93.15%	100%	100%	93.15%	106, 113–114, 122, 89
PackFileCacheStrategy.js	96.40%	100%	100%	96.40%	1250, 1350, 1354, 1416, 628, 647, 657–659, 661, 677–678, 683, 686, 688, 693, 698, 722, 728, 762, 768, 774, 779, 790, 799, 804–805, 807, 824, 830–831, 833
ResolverCachePlugin.js	100%	100%	100%	100%
getLazyHashedEtag.js	100%	100%	100%	100%
mergeEtags.js	100%	100%	100%	100%
lib/config
browserslistTargetHandler.js	100%	100%	100%	100%
defaults.js	99.30%	100%	100%	99.30%	1428–1430, 1438, 273, 276, 281, 285
normalization.js	99.01%	100%	100%	99.01%	191–192, 258, 273
target.js	100%	100%	100%	100%
lib/container
ContainerEntryDependency.js	100%	100%	100%	100%
ContainerEntryModule.js	100%	100%	100%	100%
ContainerEntryModuleFactory.js	100%	100%	100%	100%
ContainerExposedDependency.js	100%	100%	100%	100%
ContainerPlugin.js	100%	100%	100%	100%
ContainerReferencePlugin.js	100%	100%	100%	100%
FallbackDependency.js	100%	100%	100%	100%

alexander-akait added 10 commits June 8, 2026 13:35

chore: broaden HTML parser perf changeset description

4558afc

fix: cast EMPTY_ATTRS through unknown for tsc

3af9ea1

Object.freeze([]) is readonly never[], which TS won't narrow directly to the mutable HtmlAttribute[]; cast through unknown (lint:types only runs in CI, not the pre-commit hooks).

alexander-akait merged commit cd45931 into main Jun 8, 2026
63 of 66 checks passed

alexander-akait deleted the perf/html-parser-optimizations branch June 8, 2026 17:23

github-actions Bot mentioned this pull request Jun 8, 2026

chore(release): new release #21037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(html): speed up the experimental HTML parser and reduce its memory usage#21130

perf(html): speed up the experimental HTML parser and reduce its memory usage#21130
alexander-akait merged 11 commits into
mainfrom
perf/html-parser-optimizations

alexander-akait commented Jun 8, 2026

Uh oh!

changeset-bot Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

alexander-akait commented Jun 8, 2026

Uh oh!

changeset-bot Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 97.69%

Performance Changes

Footnotes

Uh oh!

github-actions Bot commented Jun 8, 2026

Types Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 8, 2026 •

edited

Loading