fix(html): report missing-semicolon-after-character-reference for named references by alexander-akait · Pull Request #21102 · webpack/webpack

alexander-akait · 2026-06-04T14:40:07Z

The named character reference state matched legacy bare-form entities
(e.g. &amp, &copy) without emitting the WHATWG
missing-semicolon-after-character-reference parse error, even though the
numeric reference path already does. Emit it for named references too,
honoring the spec's historical attribute rule (no error when consumed in
an attribute value and followed by = or an ASCII alphanumeric).

changeset-bot · 2026-06-04T14:40:18Z

🦋 Changeset detected

Latest commit: a929f06

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
webpack	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-04T14:40:59Z

This PR is packaged and the instant preview is available (028c549).

Install it locally:

npm

npm i -D webpack@https://pkg.pr.new/webpack@028c549

yarn

yarn add -D webpack@https://pkg.pr.new/webpack@028c549

pnpm

pnpm add -D webpack@https://pkg.pr.new/webpack@028c549

codecov · 2026-06-04T14:42:01Z

Codecov Report

❌ Patch coverage is 92.80576% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.95%. Comparing base (faee810) to head (a929f06).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/html/walkHtmlTokens.js	92.59%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #21102      +/-   ##
==========================================
- Coverage   91.96%   91.95%   -0.01%     
==========================================
  Files         581      581              
  Lines       61259    61380     +121     
  Branches    16700    16766      +66     
==========================================
+ Hits        56335    56444     +109     
- Misses       4924     4936      +12

Flag	Coverage Δ
css-parsing	`28.71% <100.00%> (?)`
html5lib	`27.87% <79.85%> (?)`
integration	`89.44% <58.27%> (-0.11%)`	⬇️
test262	`45.29% <0.00%> (-0.01%)`	⬇️
unit	`39.60% <89.92%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2026-06-04T14:44:14Z

Merging this PR will improve performance by ×2.1

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 142 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Memory	`benchmark "future-defaults", scenario '{"name":"mode-production","mode":"production"}'`	8.6 MB	11 MB	-21.79%
⚡	Memory	`benchmark "asset-modules-inline", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}'`	1,232.8 KB	216.2 KB	×5.7

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/walkHtmlTokens-spec-review-XTMUy (a929f06) with main (faee810)}

…ed references The named character reference state matched legacy bare-form entities (e.g. `&amp`, `&copy`) without emitting the WHATWG missing-semicolon-after-character-reference parse error, even though the numeric reference path already does. Emit it for named references too, honoring the spec's historical attribute rule (no error when consumed in an attribute value and followed by `=` or an ASCII alphanumeric).

Close the remaining parse-error gaps so the tokenizer fully matches the WHATWG spec within the offset-scanner architecture: - unexpected-null-character across all 25 states that define it (the DOCTYPE states already had the branch but never reported it). - unexpected-character-in-attribute-name (double quote, apostrophe, <). - unexpected-character-in-unquoted-attribute-value (quote, apostrophe, <, =, backtick). - Numeric character reference validation (null-character-reference, character-reference-outside-unicode-range, surrogate-character-reference, noncharacter-character-reference, control-character-reference) by accumulating the code point during the hex/decimal states. duplicate-attribute and cdata-in-html-content remain unreported by design (they need per-tag state / tree-construction context the scanner does not keep); documented inline. Token offsets are unchanged.

…ce suite Validated walkHtmlTokens against the official html5lib-tests tokenizer suite (6738 cases) and fixed every divergence: - Restore isAsciiLowerAlpha (a prior edit dropped it, breaking script-data double-escape on ASCII-alpha input). - Run numeric-reference-end validation (and absence-of-digits / missing-semicolon) when a character reference ends exactly at EOF; previously the loop exited before the end state ran. - Do not report eof-in-doctype for EOF in a bogus DOCTYPE (spec emits the token with no error, like bogus comments). - EOF right after `<!` is incorrectly-opened-comment, not eof-in-comment. - Treat CR as whitespace to emulate the spec's CR->LF input-stream preprocessing (the scanner keeps original offsets). - Reconsume (not consume) in comment-end-dash / comment-end / comment-end-bang so NULL and `<` are handled by the comment state. - Report end-tag-with-trailing-solidus for self-closing end tags. Result: 6738/6738 conformance cases match, excluding only the documented offset-scanner omissions (duplicate-attribute, cdata-in-html-content, *-in-input-stream). Token offsets unchanged.

Add the official html5lib-tests tokenizer suite as a git submodule (test/html5lib-tests, like test262-cases) with a runner (test/html5lib.spectest.js, `yarn test:html5lib`) that checks every case's parse-error codes and input roundtrip against walkHtmlTokens. Running the suite uncovered a real bug: RCDATA (title/textarea) must process character references, but STATE_RCDATA did not handle `&`, so entity parse errors inside those elements were never reported. Fixed (offset output is unchanged; references stay within the text span). All cases pass except one documented, unit-tested deliberate deviation (partial tag emitted at EOF) and the parse errors the offset scanner intentionally omits (duplicate-attribute, cdata-in-html-content, *-in-input-stream).

Add an `html5lib` CI job (needs: basic, submodules: true) that runs `yarn cover:html5lib`, and narrow `test:test262`/`cover:test262` to test262.spectest.js so the two conformance suites run in their own jobs instead of the test262 job globbing every *.spectest.js.

Four CSS Syntax tokenizer bugs surfaced by the css-parsing-tests corpus: - A literal U+0080 looped forever: isIdentStartCodePoint used >= 0x80 but the internal _isIdentStartCodePointCC used > 0x80, so the dispatch entered ident consumption that then consumed zero code points. - A backslash at EOF inside url(...) looped forever: consumeAnEscapedCodePoint advanced past EOF, so the url loop's end-of-input guard never matched. - An unterminated comment at EOF was dropped (bytes lost from the token stream); now the comment token is emitted to EOF. - A string with a trailing backslash at EOF was dropped; now the string token is emitted to EOF. Added regression unit tests for each in walkCssTokens.unittest.js.

Add the official css-parsing-tests corpus as a git submodule (test/css-parsing-tests, like test262-cases / html5lib-tests) with a runner (test/cssParsing.spectest.js, `yarn test:css-parsing`) and a dedicated `css-parsing` CI job. The suite encodes an older CSS Syntax draft (combined match tokens, the removed <urange> token, NUL->U+FFFD preprocessing), so it is used as a large real-world/adversarial corpus rather than for AST equality: each input must round-trip through the tokenizer and every entry point must terminate without throwing. This corpus surfaced the tokenizer fixes in the previous commit.

Add webpack integration spectests that compile every html5lib-tests and css-parsing-tests input as an HTML/CSS entry (experiments.html/css, with url/import extraction disabled). This exercises the full pipeline — parse, AST, handle, generate — on the same adversarial corpora, asserting webpack never crashes/hangs and that any emitted error/warning is graceful, not an internal exception. Each corpus input is its own test (a plain `for` loop registers one `it` per input) for a granular report; the builds run once in beforeAll, batched into shared in-memory compilations (400 entries each). The two spectest files are self-contained and identical except for fixture loading. Run in the existing html5lib / css-parsing CI jobs via a `*.spectest.js` glob (test:html5lib / test:css-parsing).

Remove the tokenizer-level spectests (html5lib.spectest.js, cssParsing.spectest.js); the html5lib-tests and css-parsing-tests corpora are exercised only through real webpack builds. Point the test:html5lib / test:css-parsing scripts at the remaining webpack spectests.

github-actions · 2026-06-05T11:23:32Z

Types Coverage

Coverage after merging claude/walkHtmlTokens-spec-review-XTMUy into main will be

98.99%

Coverage Report

File	Stmts	Branches	Funcs	Lines	Uncovered Lines
bin
webpack.js	98.77%	100%	100%	98.77%	91
examples
build-common.js	100%	100%	100%	100%
buildAll.js	100%	100%	100%	100%
examples.js	100%	100%	100%	100%
template-common.js	98.21%	100%	100%	98.21%	72
examples/custom-javascript-parser
test.filter.js	100%	100%	100%	100%
examples/custom-javascript-parser/internals
acorn-parse.js	100%	100%	100%	100%
meriyah-parse.js	100%	100%	100%	100%
oxc-parse.js	91.30%	100%	100%	91.30%	140, 142–143, 145, 147, 153–154, 161, 168, 90
examples/markdown
webpack.config.mjs	100%	100%	100%	100%
examples/typescript
test.filter.js	100%	100%	100%	100%
examples/typescript-non-erasable
test.filter.js	50%	100%	100%	50%	5
examples/virtual-modules
test.filter.js	100%	100%	100%	100%
examples/wasm-bindgen-esm
test.filter.js	100%	100%	100%	100%
examples/wasm-complex
test.filter.js	100%	100%	100%	100%
examples/wasm-simple
test.filter.js	100%	100%	100%	100%
examples/wasm-simple-source-phase
test.filter.js	100%	100%	100%	100%
lib
APIPlugin.js	100%	100%	100%	100%
AsyncDependenciesBlock.js	100%	100%	100%	100%
AutomaticPrefetchPlugin.js	100%	100%	100%	100%
BannerPlugin.js	100%	100%	100%	100%
Cache.js	98.21%	100%	100%	98.21%	101
CacheFacade.js	100%	100%	100%	100%
Chunk.js	99.72%	100%	100%	99.72%	39
ChunkGraph.js	100%	100%	100%	100%
ChunkGroup.js	100%	100%	100%	100%
ChunkTemplate.js	100%	100%	100%	100%
CleanPlugin.js	99.15%	100%	100%	99.15%	206, 226
CodeGenerationResults.js	100%	100%	100%	100%
CompatibilityPlugin.js	100%	100%	100%	100%
Compilation.js	98.48%	100%	100%	98.48%	1577, 1873, 1880, 1888, 1910, 2806, 3247, 3911, 3941, 3994–3995, 3999, 4004, 4020–4021, 4035–4036, 4041–4042, 4519, 4545, 512, 517, 5353, 5385, 5402, 5418, 5434, 5449, 5474–5475, 5477, 5805, 5810, 5816, 5819, 5831, 5833, 5837, 5853, 5868, 5900, 5954, 5978, 6092, 731–732
Compiler.js	99.56%	100%	100%	99.56%	1135–1136, 1144
ConcatenationScope.js	98.59%	100%	100%	98.59%	189
ConditionalInitFragment.js	100%	100%	100%	100%
ConstPlugin.js	100%	100%	100%	100%
ContextExclusionPlugin.js	100%	100%	100%	100%
ContextModule.js	100%	100%	100%	100%
ContextModuleFactory.js	97.40%	100%	100%	97.40%	258, 395, 418, 420, 424, 433–434
ContextReplacementPlugin.js	100%	100%	100%	100%
DefinePlugin.js	99%	100%	100%	99%	170–171, 187, 206, 280
DependenciesBlock.js	100%	100%	100%	100%
Dependency.js	98.20%	100%	100%	98.20%	384, 430
DependencyTemplate.js	100%	100%	100%	100%
DependencyTemplates.js	100%	100%	100%	100%
DotenvPlugin.js	98.41%	100%	100%	98.41%	378, 391–392
DynamicEntryPlugin.js	100%	100%	100%	100%
EntryOptionPlugin.js	100%	100%	100%	100%
EntryPlugin.js	100%	100%	100%	100%
Entrypoint.js	100%	100%	100%	100%
EnvironmentPlugin.js	97.14%	100%	100%	97.14%	49
ErrorHelpers.js	100%	100%	100%	100%
EvalDevToolModulePlugin.js	100%	100%	100%	100%
EvalSourceMapDevToolPlugin.js	100%	100%	100%	100%
ExportsInfo.js	100%	100%	100%	100%
ExportsInfoApiPlugin.js	100%	100%	100%	100%
ExternalModule.js	98.97%	100%	100%	98.97%	425–429, 577
ExternalModuleFactoryPlugin.js	100%	100%	100%	100%
ExternalsPlugin.js	100%	100%	100%	100%
FileSystemInfo.js	99.50%	100%	100%	99.50%	182, 2252–2253, 2256, 2267, 2278, 2289, 278, 3693, 3708, 3732
FlagAllModulesAsUsedPlugin.js	100%	100%	100%	100%
FlagDependencyExportsPlugin.js	98.85%	100%	100%	98.85%	434, 436, 440
FlagDependencyUsagePlugin.js	100%	100%	100%	100%
FlagEntryExportAsUsedPlugin.js	100%	100%	100%	100%
Generator.js	100%	100%	100%	100%
HotModuleReplacementPlugin.js	100%	100%	100%	100%
HotUpdateChunk.js	100%	100%	100%	100%
IgnorePlugin.js	100%	100%	100%	100%
IgnoreWarningsPlugin.js	100%	100%	100%	100%
InitFragment.js	100%	100%	100%	100%
JavascriptMetaInfoPlugin.js	100%	100%	100%	100%
LibraryTemplatePlugin.js	100%	100%	100%	100%
LoaderOptionsPlugin.js	100%	100%	100%	100%
LoaderTargetPlugin.js	100%	100%	100%	100%
MainTemplate.js	100%	100%	100%	100%
ManifestPlugin.js	100%	100%	100%	100%
Module.js	98.50%	100%	100%	98.50%	1312, 1317, 1377, 1391, 1453, 1462
ModuleFactory.js	100%	100%	100%	100%
ModuleFilenameHelpers.js	98.85%	100%	100%	98.85%	106, 108
ModuleGraph.js	99.73%	100%	100%	99.73%	1004
ModuleGraphConnection.js	100%	100%	100%	100%
ModuleInfoHeaderPlugin.js	100%	100%	100%	100%
ModuleNotFoundError.js	100%	100%	100%	100%
ModuleProfile.js	100%	100%	100%	100%
ModuleSourceTypeConstants.js	100%	100%	100%	100%
ModuleTemplate.js	100%	100%	100%	100%
ModuleTypeConstants.js	100%	100%	100%	100%
MultiCompiler.js	99.69%	100%	100%	99.69%	659
MultiStats.js	100%	100%	100%	100%
MultiWatching.js	100%	100%	100%	100%
NoEmitOnErrorsPlugin.js	100%	100%	100%	100%
NodeStuffPlugin.js	100%	100%	100%	100%
NormalModule.js	98.15%	100%	100%	98.15%	1212, 1215, 1232, 1249, 1496, 1530, 1546, 1633, 2288, 2293–2303, 569
NormalModuleFactory.js	99.47%	100%	100%	99.47%	1083, 1392, 486, 498
NormalModuleReplacementPlugin.js	100%	100%	100%	100%
NullFactory.js	100%	100%	100%	100%
OptimizationStages.js	100%	100%	100%	100%
OptionsApply.js	100%	100%	100%	100%
Parser.js	100%	100%	100%	100%
PlatformPlugin.js	100%	100%	100%	100%
PrefetchPlugin.js	100%	100%	100%	100%
ProgressPlugin.js	98.85%	100%	100%	98.85%	519–520, 525, 527, 591
ProvidePlugin.js	100%	100%	100%	100%
RawModule.js	100%	100%	100%	100%
RecordIdsPlugin.js	100%	100%	100%	100%
RequestShortener.js	100%	100%	100%	100%
ResolverFactory.js	100%	100%	100%	100%
RuntimeGlobals.js	100%	100%	100%	100%
RuntimeModule.js	100%	100%	100%	100%
RuntimePlugin.js	100%	100%	100%	100%
RuntimeTemplate.js	100%	100%	100%	100%
SelfModuleFactory.js	100%	100%	100%	100%
SingleEntryPlugin.js	100%	100%	100%	100%
SourceMapDevToolModuleOptionsPlugin.js	100%	100%	100%	100%
SourceMapDevToolPlugin.js	98.62%	100%	100%	98.62%	220, 224, 226, 419, 430, 891
Stats.js	100%	100%	100%	100%
Template.js	100%	100%	100%	100%
TemplatedPathPlugin.js	98.86%	100%	100%	98.86%	136–137
UseStrictPlugin.js	100%	100%	100%	100%
WarnCaseSensitiveModulesPlugin.js	100%	100%	100%	100%
WarnDeprecatedOptionPlugin.js	100%	100%	100%	100%
WarnNoModeSetPlugin.js	100%	100%	100%	100%
WatchIgnorePlugin.js	100%	100%	100%	100%
Watching.js	100%	100%	100%	100%
WebpackError.js	100%	100%	100%	100%
WebpackIsIncludedPlugin.js	100%	100%	100%	100%
WebpackOptionsApply.js	100%	100%	100%	100%
WebpackOptionsDefaulter.js	100%	100%	100%	100%
buildChunkGraph.js	99.87%	100%	100%	99.87%	326
cli.js	98.62%	100%	100%	98.62%	10, 119, 545, 577, 627, 897
index.js	99.72%	100%	100%	99.72%	165
validateSchema.js	94.67%	100%	100%	94.67%	100, 87, 89, 98
webpack.js	96.33%	100%	100%	96.33%	10, 198, 220, 222
lib/asset
AssetBytesGenerator.js	100%	100%	100%	100%
AssetBytesParser.js	100%	100%	100%	100%
AssetGenerator.js	100%	100%	100%	100%
AssetModulesPlugin.js	97.32%	100%	100%	97.32%	283, 307, 310, 36, 362, 41
AssetParser.js	100%	100%	100%	100%
AssetSourceGenerator.js	100%	100%	100%	100%
AssetSourceParser.js	100%	100%	100%	100%
RawDataUrlModule.js	100%	100%	100%	100%
lib/async-modules
AsyncModuleHelpers.js	100%	100%	100%	100%
AwaitDependenciesInitFragment.js	100%	100%	100%	100%
InferAsyncModulesPlugin.js	100%	100%	100%	100%
lib/cache
AddBuildDependenciesPlugin.js	100%	100%	100%	100%
AddManagedPathsPlugin.js	100%	100%	100%	100%
IdleFileCachePlugin.js	97.92%	100%	100%	97.92%	71, 83, 91
MemoryCachePlugin.js	95.83%	100%	100%	95.83%	33
MemoryWithGcCachePlugin.js	93.15%	100%	100%	93.15%	106, 113–114, 122, 89
PackFileCacheStrategy.js	96.40%	100%	100%	96.40%	1250, 1350, 1354, 1416, 628, 647, 657–659, 661, 677–678, 683, 686, 688, 693, 698, 722, 728, 762, 768, 774, 779, 790, 799, 804–805, 807, 824, 830–831, 833
ResolverCachePlugin.js	100%	100%	100%	100%
getLazyHashedEtag.js	100%	100%	100%	100%
mergeEtags.js	100%	100%	100%	100%
lib/config
browserslistTargetHandler.js	100%	100%	100%	100%
defaults.js	99.30%	100%	100%	99.30%	1428–1430, 1438, 273, 276, 281, 285
normalization.js	99.01%	100%	100%	99.01%	191–192, 258, 273
target.js	100%	100%	100%	100%
lib/container
ContainerEntryDependency.js	100%	100%	100%	100%
ContainerEntryModule.js	100%	100%	100%	100%
ContainerEntryModuleFactory.js	100%	100%	100%	100%
ContainerExposedDependency.js	100%	100%	100%	100%
ContainerPlugin.js	100%	100%	100%	100%
ContainerReferencePlugin.js	100%	100%	100%	100%
FallbackDependency.js	100%	100%	100%	100%

alexander-akait force-pushed the claude/walkHtmlTokens-spec-review-XTMUy branch 7 times, most recently from 51df90b to b8d1837 Compare June 5, 2026 09:40

alexander-akait added 10 commits June 5, 2026 11:17

style: condense multi-line comments to single lines

d69375b

alexander-akait force-pushed the claude/walkHtmlTokens-spec-review-XTMUy branch from b8d1837 to a929f06 Compare June 5, 2026 11:22

alexander-akait merged commit 028c549 into main Jun 5, 2026
63 of 66 checks passed

alexander-akait deleted the claude/walkHtmlTokens-spec-review-XTMUy branch June 5, 2026 13:17

github-actions Bot mentioned this pull request Jun 5, 2026

chore(release): new release #21037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(html): report missing-semicolon-after-character-reference for named references#21102

fix(html): report missing-semicolon-after-character-reference for named references#21102
alexander-akait merged 10 commits into
mainfrom
claude/walkHtmlTokens-spec-review-XTMUy

alexander-akait commented Jun 4, 2026

Uh oh!

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

alexander-akait commented Jun 4, 2026

Uh oh!

changeset-bot Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×2.1

Performance Changes

Uh oh!

github-actions Bot commented Jun 5, 2026

Types Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading