Skip to content

fix(html): report missing-semicolon-after-character-reference for named references#21102

Merged
alexander-akait merged 10 commits into
mainfrom
claude/walkHtmlTokens-spec-review-XTMUy
Jun 5, 2026
Merged

fix(html): report missing-semicolon-after-character-reference for named references#21102
alexander-akait merged 10 commits into
mainfrom
claude/walkHtmlTokens-spec-review-XTMUy

Conversation

@alexander-akait

Copy link
Copy Markdown
Member

The named character reference state matched legacy bare-form entities
(e.g. &amp, &copy) without emitting the WHATWG
missing-semicolon-after-character-reference parse error, even though the
numeric reference path already does. Emit it for named references too,
honoring the spec's historical attribute rule (no error when consumed in
an attribute value and followed by = or an ASCII alphanumeric).

@changeset-bot

changeset-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: a929f06

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
webpack Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This PR is packaged and the instant preview is available (028c549).

Install it locally:

  • npm
npm i -D webpack@https://pkg.pr.new/webpack@028c549
  • yarn
yarn add -D webpack@https://pkg.pr.new/webpack@028c549
  • pnpm
pnpm add -D webpack@https://pkg.pr.new/webpack@028c549

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.80576% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.95%. Comparing base (faee810) to head (a929f06).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
lib/html/walkHtmlTokens.js 92.59% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21102      +/-   ##
==========================================
- Coverage   91.96%   91.95%   -0.01%     
==========================================
  Files         581      581              
  Lines       61259    61380     +121     
  Branches    16700    16766      +66     
==========================================
+ Hits        56335    56444     +109     
- Misses       4924     4936      +12     
Flag Coverage Δ
css-parsing 28.71% <100.00%> (?)
html5lib 27.87% <79.85%> (?)
integration 89.44% <58.27%> (-0.11%) ⬇️
test262 45.29% <0.00%> (-0.01%) ⬇️
unit 39.60% <89.92%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq

codspeed-hq Bot commented Jun 4, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by ×2.1

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 142 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory benchmark "future-defaults", scenario '{"name":"mode-production","mode":"production"}' 8.6 MB 11 MB -21.79%
Memory benchmark "asset-modules-inline", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 1,232.8 KB 216.2 KB ×5.7

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/walkHtmlTokens-spec-review-XTMUy (a929f06) with main (faee810)

Open in CodSpeed

@alexander-akait alexander-akait force-pushed the claude/walkHtmlTokens-spec-review-XTMUy branch 7 times, most recently from 51df90b to b8d1837 Compare June 5, 2026 09:40
…ed references

The named character reference state matched legacy bare-form entities
(e.g. `&amp`, `&copy`) without emitting the WHATWG
missing-semicolon-after-character-reference parse error, even though the
numeric reference path already does. Emit it for named references too,
honoring the spec's historical attribute rule (no error when consumed in
an attribute value and followed by `=` or an ASCII alphanumeric).
Close the remaining parse-error gaps so the tokenizer fully matches the
WHATWG spec within the offset-scanner architecture:

- unexpected-null-character across all 25 states that define it (the
  DOCTYPE states already had the branch but never reported it).
- unexpected-character-in-attribute-name (double quote, apostrophe, <).
- unexpected-character-in-unquoted-attribute-value (quote, apostrophe,
  <, =, backtick).
- Numeric character reference validation (null-character-reference,
  character-reference-outside-unicode-range, surrogate-character-reference,
  noncharacter-character-reference, control-character-reference) by
  accumulating the code point during the hex/decimal states.

duplicate-attribute and cdata-in-html-content remain unreported by
design (they need per-tag state / tree-construction context the scanner
does not keep); documented inline. Token offsets are unchanged.
…ce suite

Validated walkHtmlTokens against the official html5lib-tests tokenizer
suite (6738 cases) and fixed every divergence:

- Restore isAsciiLowerAlpha (a prior edit dropped it, breaking
  script-data double-escape on ASCII-alpha input).
- Run numeric-reference-end validation (and absence-of-digits /
  missing-semicolon) when a character reference ends exactly at EOF;
  previously the loop exited before the end state ran.
- Do not report eof-in-doctype for EOF in a bogus DOCTYPE (spec emits
  the token with no error, like bogus comments).
- EOF right after `<!` is incorrectly-opened-comment, not eof-in-comment.
- Treat CR as whitespace to emulate the spec's CR->LF input-stream
  preprocessing (the scanner keeps original offsets).
- Reconsume (not consume) in comment-end-dash / comment-end /
  comment-end-bang so NULL and `<` are handled by the comment state.
- Report end-tag-with-trailing-solidus for self-closing end tags.

Result: 6738/6738 conformance cases match, excluding only the
documented offset-scanner omissions (duplicate-attribute,
cdata-in-html-content, *-in-input-stream). Token offsets unchanged.
Add the official html5lib-tests tokenizer suite as a git submodule
(test/html5lib-tests, like test262-cases) with a runner
(test/html5lib.spectest.js, `yarn test:html5lib`) that checks every case's
parse-error codes and input roundtrip against walkHtmlTokens.

Running the suite uncovered a real bug: RCDATA (title/textarea) must
process character references, but STATE_RCDATA did not handle `&`, so
entity parse errors inside those elements were never reported. Fixed
(offset output is unchanged; references stay within the text span).

All cases pass except one documented, unit-tested deliberate deviation
(partial tag emitted at EOF) and the parse errors the offset scanner
intentionally omits (duplicate-attribute, cdata-in-html-content,
*-in-input-stream).
Add an `html5lib` CI job (needs: basic, submodules: true) that runs
`yarn cover:html5lib`, and narrow `test:test262`/`cover:test262` to
test262.spectest.js so the two conformance suites run in their own jobs
instead of the test262 job globbing every *.spectest.js.
Four CSS Syntax tokenizer bugs surfaced by the css-parsing-tests corpus:

- A literal U+0080 looped forever: isIdentStartCodePoint used >= 0x80 but
  the internal _isIdentStartCodePointCC used > 0x80, so the dispatch
  entered ident consumption that then consumed zero code points.
- A backslash at EOF inside url(...) looped forever: consumeAnEscapedCodePoint
  advanced past EOF, so the url loop's end-of-input guard never matched.
- An unterminated comment at EOF was dropped (bytes lost from the token
  stream); now the comment token is emitted to EOF.
- A string with a trailing backslash at EOF was dropped; now the string
  token is emitted to EOF.

Added regression unit tests for each in walkCssTokens.unittest.js.
Add the official css-parsing-tests corpus as a git submodule
(test/css-parsing-tests, like test262-cases / html5lib-tests) with a
runner (test/cssParsing.spectest.js, `yarn test:css-parsing`) and a
dedicated `css-parsing` CI job.

The suite encodes an older CSS Syntax draft (combined match tokens,
the removed <urange> token, NUL->U+FFFD preprocessing), so it is used as
a large real-world/adversarial corpus rather than for AST equality: each
input must round-trip through the tokenizer and every entry point must
terminate without throwing. This corpus surfaced the tokenizer fixes in
the previous commit.
Add webpack integration spectests that compile every html5lib-tests and
css-parsing-tests input as an HTML/CSS entry (experiments.html/css, with
url/import extraction disabled). This exercises the full pipeline — parse,
AST, handle, generate — on the same adversarial corpora, asserting webpack
never crashes/hangs and that any emitted error/warning is graceful, not an
internal exception.

Each corpus input is its own test (a plain `for` loop registers one `it`
per input) for a granular report; the builds run once in beforeAll, batched
into shared in-memory compilations (400 entries each). The two spectest
files are self-contained and identical except for fixture loading.

Run in the existing html5lib / css-parsing CI jobs via a `*.spectest.js`
glob (test:html5lib / test:css-parsing).
Remove the tokenizer-level spectests (html5lib.spectest.js,
cssParsing.spectest.js); the html5lib-tests and css-parsing-tests corpora
are exercised only through real webpack builds. Point the test:html5lib /
test:css-parsing scripts at the remaining webpack spectests.
@alexander-akait alexander-akait force-pushed the claude/walkHtmlTokens-spec-review-XTMUy branch from b8d1837 to a929f06 Compare June 5, 2026 11:22
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Types Coverage

Coverage after merging claude/walkHtmlTokens-spec-review-XTMUy into main will be
98.99%
Coverage Report
FileStmtsBranchesFuncsLinesUncovered Lines
bin
   webpack.js98.77%100%100%98.77%91
examples
   build-common.js100%100%100%100%
   buildAll.js100%100%100%100%
   examples.js100%100%100%100%
   template-common.js98.21%100%100%98.21%72
examples/custom-javascript-parser
   test.filter.js100%100%100%100%
examples/custom-javascript-parser/internals
   acorn-parse.js100%100%100%100%
   meriyah-parse.js100%100%100%100%
   oxc-parse.js91.30%100%100%91.30%140, 142–143, 145, 147, 153–154, 161, 168, 90
examples/markdown
   webpack.config.mjs100%100%100%100%
examples/typescript
   test.filter.js100%100%100%100%
examples/typescript-non-erasable
   test.filter.js50%100%100%50%5
examples/virtual-modules
   test.filter.js100%100%100%100%
examples/wasm-bindgen-esm
   test.filter.js100%100%100%100%
examples/wasm-complex
   test.filter.js100%100%100%100%
examples/wasm-simple
   test.filter.js100%100%100%100%
examples/wasm-simple-source-phase
   test.filter.js100%100%100%100%
lib
   APIPlugin.js100%100%100%100%
   AsyncDependenciesBlock.js100%100%100%100%
   AutomaticPrefetchPlugin.js100%100%100%100%
   BannerPlugin.js100%100%100%100%
   Cache.js98.21%100%100%98.21%101
   CacheFacade.js100%100%100%100%
   Chunk.js99.72%100%100%99.72%39
   ChunkGraph.js100%100%100%100%
   ChunkGroup.js100%100%100%100%
   ChunkTemplate.js100%100%100%100%
   CleanPlugin.js99.15%100%100%99.15%206, 226
   CodeGenerationResults.js100%100%100%100%
   CompatibilityPlugin.js100%100%100%100%
   Compilation.js98.48%100%100%98.48%1577, 1873, 1880, 1888, 1910, 2806, 3247, 3911, 3941, 3994–3995, 3999, 4004, 4020–4021, 4035–4036, 4041–4042, 4519, 4545, 512, 517, 5353, 5385, 5402, 5418, 5434, 5449, 5474–5475, 5477, 5805, 5810, 5816, 5819, 5831, 5833, 5837, 5853, 5868, 5900, 5954, 5978, 6092, 731–732
   Compiler.js99.56%100%100%99.56%1135–1136, 1144
   ConcatenationScope.js98.59%100%100%98.59%189
   ConditionalInitFragment.js100%100%100%100%
   ConstPlugin.js100%100%100%100%
   ContextExclusionPlugin.js100%100%100%100%
   ContextModule.js100%100%100%100%
   ContextModuleFactory.js97.40%100%100%97.40%258, 395, 418, 420, 424, 433–434
   ContextReplacementPlugin.js100%100%100%100%
   DefinePlugin.js99%100%100%99%170–171, 187, 206, 280
   DependenciesBlock.js100%100%100%100%
   Dependency.js98.20%100%100%98.20%384, 430
   DependencyTemplate.js100%100%100%100%
   DependencyTemplates.js100%100%100%100%
   DotenvPlugin.js98.41%100%100%98.41%378, 391–392
   DynamicEntryPlugin.js100%100%100%100%
   EntryOptionPlugin.js100%100%100%100%
   EntryPlugin.js100%100%100%100%
   Entrypoint.js100%100%100%100%
   EnvironmentPlugin.js97.14%100%100%97.14%49
   ErrorHelpers.js100%100%100%100%
   EvalDevToolModulePlugin.js100%100%100%100%
   EvalSourceMapDevToolPlugin.js100%100%100%100%
   ExportsInfo.js100%100%100%100%
   ExportsInfoApiPlugin.js100%100%100%100%
   ExternalModule.js98.97%100%100%98.97%425–429, 577
   ExternalModuleFactoryPlugin.js100%100%100%100%
   ExternalsPlugin.js100%100%100%100%
   FileSystemInfo.js99.50%100%100%99.50%182, 2252–2253, 2256, 2267, 2278, 2289, 278, 3693, 3708, 3732
   FlagAllModulesAsUsedPlugin.js100%100%100%100%
   FlagDependencyExportsPlugin.js98.85%100%100%98.85%434, 436, 440
   FlagDependencyUsagePlugin.js100%100%100%100%
   FlagEntryExportAsUsedPlugin.js100%100%100%100%
   Generator.js100%100%100%100%
   HotModuleReplacementPlugin.js100%100%100%100%
   HotUpdateChunk.js100%100%100%100%
   IgnorePlugin.js100%100%100%100%
   IgnoreWarningsPlugin.js100%100%100%100%
   InitFragment.js100%100%100%100%
   JavascriptMetaInfoPlugin.js100%100%100%100%
   LibraryTemplatePlugin.js100%100%100%100%
   LoaderOptionsPlugin.js100%100%100%100%
   LoaderTargetPlugin.js100%100%100%100%
   MainTemplate.js100%100%100%100%
   ManifestPlugin.js100%100%100%100%
   Module.js98.50%100%100%98.50%1312, 1317, 1377, 1391, 1453, 1462
   ModuleFactory.js100%100%100%100%
   ModuleFilenameHelpers.js98.85%100%100%98.85%106, 108
   ModuleGraph.js99.73%100%100%99.73%1004
   ModuleGraphConnection.js100%100%100%100%
   ModuleInfoHeaderPlugin.js100%100%100%100%
   ModuleNotFoundError.js100%100%100%100%
   ModuleProfile.js100%100%100%100%
   ModuleSourceTypeConstants.js100%100%100%100%
   ModuleTemplate.js100%100%100%100%
   ModuleTypeConstants.js100%100%100%100%
   MultiCompiler.js99.69%100%100%99.69%659
   MultiStats.js100%100%100%100%
   MultiWatching.js100%100%100%100%
   NoEmitOnErrorsPlugin.js100%100%100%100%
   NodeStuffPlugin.js100%100%100%100%
   NormalModule.js98.15%100%100%98.15%1212, 1215, 1232, 1249, 1496, 1530, 1546, 1633, 2288, 2293–2303, 569
   NormalModuleFactory.js99.47%100%100%99.47%1083, 1392, 486, 498
   NormalModuleReplacementPlugin.js100%100%100%100%
   NullFactory.js100%100%100%100%
   OptimizationStages.js100%100%100%100%
   OptionsApply.js100%100%100%100%
   Parser.js100%100%100%100%
   PlatformPlugin.js100%100%100%100%
   PrefetchPlugin.js100%100%100%100%
   ProgressPlugin.js98.85%100%100%98.85%519–520, 525, 527, 591
   ProvidePlugin.js100%100%100%100%
   RawModule.js100%100%100%100%
   RecordIdsPlugin.js100%100%100%100%
   RequestShortener.js100%100%100%100%
   ResolverFactory.js100%100%100%100%
   RuntimeGlobals.js100%100%100%100%
   RuntimeModule.js100%100%100%100%
   RuntimePlugin.js100%100%100%100%
   RuntimeTemplate.js100%100%100%100%
   SelfModuleFactory.js100%100%100%100%
   SingleEntryPlugin.js100%100%100%100%
   SourceMapDevToolModuleOptionsPlugin.js100%100%100%100%
   SourceMapDevToolPlugin.js98.62%100%100%98.62%220, 224, 226, 419, 430, 891
   Stats.js100%100%100%100%
   Template.js100%100%100%100%
   TemplatedPathPlugin.js98.86%100%100%98.86%136–137
   UseStrictPlugin.js100%100%100%100%
   WarnCaseSensitiveModulesPlugin.js100%100%100%100%
   WarnDeprecatedOptionPlugin.js100%100%100%100%
   WarnNoModeSetPlugin.js100%100%100%100%
   WatchIgnorePlugin.js100%100%100%100%
   Watching.js100%100%100%100%
   WebpackError.js100%100%100%100%
   WebpackIsIncludedPlugin.js100%100%100%100%
   WebpackOptionsApply.js100%100%100%100%
   WebpackOptionsDefaulter.js100%100%100%100%
   buildChunkGraph.js99.87%100%100%99.87%326
   cli.js98.62%100%100%98.62%10, 119, 545, 577, 627, 897
   index.js99.72%100%100%99.72%165
   validateSchema.js94.67%100%100%94.67%100, 87, 89, 98
   webpack.js96.33%100%100%96.33%10, 198, 220, 222
lib/asset
   AssetBytesGenerator.js100%100%100%100%
   AssetBytesParser.js100%100%100%100%
   AssetGenerator.js100%100%100%100%
   AssetModulesPlugin.js97.32%100%100%97.32%283, 307, 310, 36, 362, 41
   AssetParser.js100%100%100%100%
   AssetSourceGenerator.js100%100%100%100%
   AssetSourceParser.js100%100%100%100%
   RawDataUrlModule.js100%100%100%100%
lib/async-modules
   AsyncModuleHelpers.js100%100%100%100%
   AwaitDependenciesInitFragment.js100%100%100%100%
   InferAsyncModulesPlugin.js100%100%100%100%
lib/cache
   AddBuildDependenciesPlugin.js100%100%100%100%
   AddManagedPathsPlugin.js100%100%100%100%
   IdleFileCachePlugin.js97.92%100%100%97.92%71, 83, 91
   MemoryCachePlugin.js95.83%100%100%95.83%33
   MemoryWithGcCachePlugin.js93.15%100%100%93.15%106, 113–114, 122, 89
   PackFileCacheStrategy.js96.40%100%100%96.40%1250, 1350, 1354, 1416, 628, 647, 657–659, 661, 677–678, 683, 686, 688, 693, 698, 722, 728, 762, 768, 774, 779, 790, 799, 804–805, 807, 824, 830–831, 833
   ResolverCachePlugin.js100%100%100%100%
   getLazyHashedEtag.js100%100%100%100%
   mergeEtags.js100%100%100%100%
lib/config
   browserslistTargetHandler.js100%100%100%100%
   defaults.js99.30%100%100%99.30%1428–1430, 1438, 273, 276, 281, 285
   normalization.js99.01%100%100%99.01%191–192, 258, 273
   target.js100%100%100%100%
lib/container
   ContainerEntryDependency.js100%100%100%100%
   ContainerEntryModule.js100%100%100%100%
   ContainerEntryModuleFactory.js100%100%100%100%
   ContainerExposedDependency.js100%100%100%100%
   ContainerPlugin.js100%100%100%100%
   ContainerReferencePlugin.js100%100%100%100%
   FallbackDependency.js100%100%100%100%
   

@alexander-akait alexander-akait merged commit 028c549 into main Jun 5, 2026
63 of 66 checks passed
@alexander-akait alexander-akait deleted the claude/walkHtmlTokens-spec-review-XTMUy branch June 5, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant