Skip to content

test: add full WHATWG lexer integration tests#20974

Merged
alexander-akait merged 6 commits into
webpack:mainfrom
aryanraj45:test/full-lexer-integration
May 19, 2026
Merged

test: add full WHATWG lexer integration tests#20974
alexander-akait merged 6 commits into
webpack:mainfrom
aryanraj45:test/full-lexer-integration

Conversation

@aryanraj45

Copy link
Copy Markdown
Contributor

Summary

This PR introduces a comprehensive end-to-end integration test (test/configCases/html/full-lexer-integration/) to validate the new HTML Tokenizer.

While previous PRs implemented isolated unit tests for individual states, this PR proves that Webpack's compiler can successfully parse a complex, real-world HTML document utilizing all 80 WHATWG lexer states without crashing, dropping characters, or failing AST compilation.

Key highlights:

  • Full State Coverage: The page.html fixture acts as a "kitchen-sink" stress test, explicitly triggering <!DOCTYPE>, <![CDATA[...]]>, deeply nested structures, self-closing void elements, and multi-line comments.
  • Advanced Mode Verification: Validates RAWTEXT, RCDATA, and SCRIPT_DATA states by successfully parsing <style>, <textarea>, and <script> blocks.
  • Entity Integration: Proves end-to-end integrity for Character References (decimal, hex, bare ampersands, and missing-semicolon entities) nested inside both raw text and complex single/double/unquoted attributes.
  • Isolated Lexer Validation: Configured <script> and <style> tags with type="text/plain" to safely bypass Webpack's downstream JS/CSS chunk extraction pipeline, guaranteeing we strictly test the HTML lexer's stability without triggering circular dependency errors in the AST.

part of #536

What kind of change does this PR introduce?

test

Did you add tests for your changes?

Yes. Added a new Webpack config case test/configCases/html/full-lexer-integration/ containing the integration harness, the page.html fixture, and the resulting Jest snapshot.

Does this PR introduce a breaking change?

No.

If relevant, what needs to be documented once your changes are merged or what have you already documented?

n/a

Use of AI

partial

image

Copilot AI review requested due to automatic review settings May 18, 2026 09:21
@changeset-bot

changeset-bot Bot commented May 18, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 2b9c9f9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new configCases integration test under test/configCases/html/full-lexer-integration/ to exercise the HTML module pipeline with a “kitchen sink” HTML fixture and verify the emitted HTML string via Jest snapshots.

Changes:

  • Added a new HTML config case enabling experiments.html.
  • Added a large page.html fixture intended to hit many tokenizer/parser states (comments, entities, RCDATA/RAWTEXT/SCRIPT_DATA, CDATA marker, etc.).
  • Added a Jest snapshot asserting the exported HTML module output.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/configCases/html/full-lexer-integration/webpack.config.js Enables HTML experiment for the new config case.
test/configCases/html/full-lexer-integration/index.js Imports the HTML module and snapshots the exported string.
test/configCases/html/full-lexer-integration/page.html “Kitchen sink” HTML fixture used by the integration test.
test/configCases/html/full-lexer-integration/snapshots/ConfigTest.snap Snapshot for the non-cache test suite.
test/configCases/html/full-lexer-integration/simple.html Fixture referenced by links in page.html (currently empty).
test/configCases/html/full-lexer-integration/image.png Fixture referenced by <img> tags (currently empty).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/configCases/html/full-lexer-integration/page.html
Comment thread test/configCases/html/full-lexer-integration/index.js
@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 64.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.90%. Comparing base (526d638) to head (2b9c9f9).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
lib/html/walkHtmlTokens.js 64.00% 9 Missing ⚠️

❌ Your patch check has failed because the patch coverage (64.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20974      +/-   ##
==========================================
- Coverage   90.92%   90.90%   -0.02%     
==========================================
  Files         573      573              
  Lines       58610    58654      +44     
  Branches    15762    15784      +22     
==========================================
+ Hits        53290    53321      +31     
- Misses       5320     5333      +13     
Flag Coverage Δ
integration 89.67% <64.00%> (+0.05%) ⬆️
test262 45.37% <ø> (-0.06%) ⬇️
unit 36.60% <64.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq

codspeed-hq Bot commented May 18, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 41.5%

❌ 5 regressed benchmarks
✅ 139 untouched benchmarks
⏩ 72 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory benchmark "future-defaults", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 146.1 KB 311.8 KB -53.15%
Memory benchmark "many-chunks-commonjs", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 352.5 KB 452.1 KB -22.04%
Memory benchmark "asset-modules-bytes", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 131.2 KB 323.6 KB -59.47%
Memory benchmark "concatenate-modules", scenario '{"name":"mode-development","mode":"development"}' 782.2 KB 1,112.2 KB -29.67%
Memory benchmark "asset-modules-resource", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' 345.6 KB 525.3 KB -34.2%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing aryanraj45:test/full-lexer-integration (2b9c9f9) with main (865c051)

Open in CodSpeed

Footnotes

  1. 72 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@aryanraj45

Copy link
Copy Markdown
Contributor Author

Hii @alexander-akait have updated the snapshots to pass the Basic test can u please re-run it thanks :))

@alexander-akait

Copy link
Copy Markdown
Member

Can you take a look at copilot review, especially img src resolution

@aryanraj45

Copy link
Copy Markdown
Contributor Author

Can you take a look at copilot review, especially img src resolution

sorry @alexander-akait I didn’t look deeply at this earlier. It was the bug in the CDATA/RAWTEXT handling inside walkHtmlTokens.js,. The lexer was not returning cleanly to DATA after parser-consumed tag bodies, which affected image src resolution after CDATA/rawtext sections and which the same was caught by the integration test i have fixed it and run the basic test and everything is passing green now thanks ;))

@alexander-akait

Copy link
Copy Markdown
Member

basic test failed, we need to fix it

@aryanraj45

Copy link
Copy Markdown
Contributor Author

@alexander-akait i have ran yarn test:base ./test/ConfigCacheTestCases.basictest.js -t "style-tag|full-lexer-integration" -uto generate the new snapshot to pass the basic which was failing thanks :))

@aryanraj45

Copy link
Copy Markdown
Contributor Author

@alexander-akait All the required CI are green now thanks :))

@alexander-akait alexander-akait merged commit 490684a into webpack:main May 19, 2026
53 of 56 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

This PR is packaged and the instant preview is available (490684a).

Install it locally:

  • npm
npm i -D webpack@https://pkg.pr.new/webpack@490684a
  • yarn
yarn add -D webpack@https://pkg.pr.new/webpack@490684a
  • pnpm
pnpm add -D webpack@https://pkg.pr.new/webpack@490684a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants