fix: expand HTML parser tag and attribute coverage#21159
Conversation
🦋 Changeset detectedLatest commit: d4a816e The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
70a2e98 to
3b38301
Compare
|
This PR is packaged and the instant preview is available (f6e52d0). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@f6e52d0
yarn add -D webpack@https://pkg.pr.new/webpack@f6e52d0
pnpm add -D webpack@https://pkg.pr.new/webpack@f6e52d0 |
3b38301 to
3281835
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #21159 +/- ##
==========================================
+ Coverage 92.36% 92.57% +0.21%
==========================================
Files 581 587 +6
Lines 63411 63640 +229
Branches 17544 17629 +85
==========================================
+ Hits 58567 58917 +350
+ Misses 4844 4723 -121
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Merging this PR will improve performance by 76.81%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing |
- walk <template> content fragments so assets inside templates are processed - handle SVG <script href> and <script xlink:href> as script entries - handle <feImage href|xlink:href> - parse icon-uri from msapplication-task meta and add twitter:image:src - split link rel on any ASCII whitespace, consistent with other rel checks https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
…ta attributes - register user-configured sources entries under the adjusted camelCase tag name (e.g. feimage -> feImage) so they match the AST - filterMetaContent now checks every present name/property/itemprop attribute instead of returning on the first one - make the tree builder's SVG/foreign adjustment tables null-prototype so markup-controlled names like <constructor> can't resolve to inherited object members https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
Tokenization and srcset splitting still see the raw attribute text (so dependency spans stay source offsets), but each extracted URL is decoded with attribute-mode WHATWG semantics and normalized like raw values before module resolution. Inline style attribute CSS is decoded too. https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
The CSS pipeline now sees the entity-decoded attribute value (so url(a&b.png) resolves), and HtmlInlineStyleDependency re-escapes the processed CSS when its range is an attribute value, keeping the rewritten markup valid in any quoting context. <style> content stays raw both ways (rawtext has no character references). https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
75f964e to
e8a95a8
Compare
- srcset/src/msapplication-task parsers now run on the entity-decoded value (entity-encoded ASCII whitespace separates candidates, like in browsers); a decoded->raw boundary map from the new decodeHtmlEntitiesWithMap keeps rewrite spans on raw source offsets - the attributes map used by filters and type resolvers carries decoded values, so e.g. rel="icon" is recognized as icon - the style attribute url() pre-filter tests the decoded CSS text - regenerate srcset errors.js for decoded parse-error messages https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
- <iframe srcdoc> content is entity-decoded, parsed as HTML and walked with offsets composed back to raw source positions (recursively for nested srcdoc); every reference stays a plain asset URL since the iframe is a separate browsing context that can't join the outer page's entry graph - SVG presentation attributes (fill, stroke, filter, clip-path, mask, marker-*, cursor) resolve external url(file.svg#id) references via a new url() extractor; same-document fragments stay untouched - textPath and mpath href/xlink:href join the defaults table https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
…ling Back out the srcdoc restricted walk and the CSS url() extractor for SVG presentation attributes, leaving TODO markers to revisit both. The simple textPath/mpath href/xlink:href handling stays. https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
…dges Covers the patch lines codecov flagged: icon-uri trailing-whitespace trim and absent/empty icon-uri, empty style attribute, and a URL decoding to whitespace-only. https://claude.ai/code/session_016NpnTnhr7hLt3E4wof2XYi
Summary
Expand the experimental HTML parser's tag/attribute coverage (
<template>content, SVGscript href/feImage/textPath/mpath, more meta names) and decode HTML character references in attribute values (URLs, srcset candidates, filters,styleattributes with re-escaping on write-back), fixing several matching bugs (camelCase SVG tags insources, meta filter short-circuit, prototype-named tag lookups) along the way.What kind of change does this PR introduce?
fix
Did you add tests for your changes?
Yes — new config cases under
test/configCases/html/(template-content,svg-script-href,entities,parser-sources-svg-tag,svg-references), extensions tosources,srcsetandstyle-attributecases, and unit tests inbuildHtmlAst.unittest.js/walkHtmlTokens.unittest.js.Does this PR introduce a breaking change?
No.
If relevant, what needs to be documented once your changes are merged or what have you already documented?
n/a
Use of AI
This PR was developed with Claude Code under my direction: I specified each change, and reviewed/tested the results; tests were written first for the bug fixes.