perf: speed up CSS and HTML parsing and code generation#21181
Conversation
- CssParser: cache the normalized structural name so localGlobalActive / icssActive / composesAnchorSkip stop re-running a regex replace + toLowerCase per value token; add a stripBackslashes fast path; hoist the getComments comparator and skip its work when there are no comments; share the empty localIdentifiers/composesFiles containers in non-modules mode instead of copying them per rule. - css/syntax: test the source bytes for the '--' custom-property prefix instead of forcing the lazy token value slice. - CssGenerator: prefix-test the module type in getTypes instead of allocating a split() array per incoming connection. - HtmlGenerator: hoist the chunk-URL sentinel regex to module scope. - HtmlModulesPlugin: hoist createHash/nonNumericOnlyHash requires. - HtmlParser: reuse syntax.js findAttr instead of allocating an Array#find closure per script element.
- CssIcssExportDependency.getLocalIdent: hoist the 4 ident-sanitizing regexes + the prepareId closure to module scope; drop the redundant /[local]/.test guard before the replace. - CssUrlDependency.cssEscapeString: hoist the 3 escape regexes + shared replacer. - HtmlScriptSrcDependency: precompile the copyable-attribute matchers and the native-tag check instead of building RegExp from strings per call; gate the integrity strip behind includes(); precompute the CSS chunk tie-break sort key once per chunk; hoist the CSS source-type pair. - HtmlInlineStyleDependency: fold the 3 sequential attribute-escape passes into one regexp pass (byte-identical output). - CssModule.identifier: build the inheritance segment with a loop instead of map().join() (byte-identical).
HtmlScriptSrcDependency's template re-parsed the cloned tag's source text with a per-apply RegExp (isNativeTagForKind) to tell a native <script>/<link> from a custom element mapped to a source type. The parser already knows the tag name, so compute the native flag there and thread it through the dependency (tagIsNative); the template reads the boolean instead of re-parsing. Byte-identical output (verified against the html config + cache snapshots).
buildStylesheetLink / buildScriptTag re-parsed the originating tag's nonce/crossorigin/referrerpolicy out of the source text with three RegExps each time a sibling <link>/<script> was synthesized. The parser already has those attributes with byte offsets, so capture their exact source spans there (copyableAttrsText) and pass them through the dependency; the template just concatenates the precomputed text. Removes the COPYABLE_ATTR_REGEXPS scan from code-gen. Byte-identical (verified against the html config + cache snapshots, incl. css-imported-from-js which copies all three attrs).
…work - CssParser: memoize getKnownProperties on the parser instance (it depends only on the fixed option flags and the instance is reused across modules), instead of rebuilding the Map + ~10 record spreads per module. - CssParser: resolve the module resource once (parseResource is uncached without a cache object) and reuse it for the auto-mode check + self-reference resolution. - CssParser: getReexport short-circuits when there are no ICSS/@value definitions, skipping the `--`-key concat + Map probe per export. - CssParser: the url-in-declaration skip check reads the already-normalized currentStructuralName instead of re-running the vendor-prefix regex + toLowerCase per url() token. - HtmlParser: hoist the allowed <link rel> list to a module-level Set so filterLinkHref does set lookups instead of allocating a 10-element array + .some closure per <link>. All byte-identical (css + html config snapshots unchanged).
… fast paths Profiling a url()-heavy stylesheet showed emitUrlFunction dominating, via eager loc computation and unconditional regex replaces: - webpackIgnored computed the invalid-webpackIgnore warning loc eagerly for every checked url() / @import, though it's only used in that rare warning. Pass the warn range and compute rangeLoc lazily inside the branch. - normalizeUrl ran STRING_MULTILINE / TRIM_WHITE_SPACES / UNESCAPE replaces unconditionally; gate each on a cheap charCode/includes check so the common escape-free, edge-whitespace-free URL skips the regex engine. ~6% faster on a url-in-every-rule fixture; the url path no longer shows in the profile. Byte-identical (css config + spec snapshots unchanged).
|
|
This PR is packaged and the instant preview is available (b83041a). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@b83041a
yarn add -D webpack@https://pkg.pr.new/webpack@b83041a
pnpm add -D webpack@https://pkg.pr.new/webpack@b83041a |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21181 +/- ##
=======================================
Coverage 92.68% 92.68%
=======================================
Files 587 587
Lines 63962 64005 +43
Branches 17726 17748 +22
=======================================
+ Hits 59280 59322 +42
- Misses 4682 4683 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Merging this PR will degrade performance by 44.75%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Memory | benchmark "lodash", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
126.1 KB | 858.1 KB | -85.31% |
| ❌ | Memory | benchmark "side-effects-reexport", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
761.8 KB | 1,188.9 KB | -35.92% |
| ❌ | Memory | benchmark "many-chunks-esm", scenario '{"name":"mode-production","mode":"production"}' |
7.2 MB | 9.5 MB | -23.49% |
| ⚡ | Memory | benchmark "many-modules-esm", scenario '{"name":"mode-production","mode":"production"}' |
9.6 MB | 7.4 MB | +29.4% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/css-html-parser-codegen (48b56c5) with main (0bf661b)
…the tokenizer Decompose the CSS-Modules logic in lib/css/CssParser.js into focused helpers and a node-type switch (walkSelectorList, emitComposesWithAnchor, the Declaration visitor), and move CSS escape resolution into the tokenizer as lazy accessors (Token#unescaped, Node#unescapedName) — superseding backslash-only stripping with full unescaping, which also fixes hex escapes (e.g. \75 rl -> url). Add CssIcssExportDependency codegen perf (resolveReferences context-hoist and a transient composes-name index) plus assorted parser allocation cuts. Rebased on #21181: keeps its orthogonal optimizations and supersedes its stripBackslashes / property-name caching with the equivalents here.
…the tokenizer (#21196) * refactor: decompose CSS-Modules parser and move escape handling into the tokenizer Decompose the CSS-Modules logic in lib/css/CssParser.js into focused helpers and a node-type switch (walkSelectorList, emitComposesWithAnchor, the Declaration visitor), and move CSS escape resolution into the tokenizer as lazy accessors (Token#unescaped, Node#unescapedName) — superseding backslash-only stripping with full unescaping, which also fixes hex escapes (e.g. \75 rl -> url). Add CssIcssExportDependency codegen perf (resolveReferences context-hoist and a transient composes-name index) plus assorted parser allocation cuts. Rebased on #21181: keeps its orthogonal optimizations and supersedes its stripBackslashes / property-name caching with the equivalents here. * Add changeset for CSS escape resolution fix
Summary
Constant-factor performance work across the CSS and HTML parsers, their code generators, and the related dependency templates — no behavior change. Highlights: memoize the CSS parser's per-module
getKnownPropertiessetup and dedupeparseResource; drop per-token/per-exportreplace/toLowerCase/Map work (reuse the already-normalized declaration name, short-circuit empty ICSS lookups); hoist per-call regexes/closures in the CSS/HTML dependency templates; move HTML sibling-tag native-tag detection and CSP attribute capture to the parse step instead of re-parsing tag text with regex at code-gen; and a profiler-guidedurl()fast path (lazywebpackIgnorewarning loc + guardednormalizeUrlreplaces). Profiling-driven: a parse-isolated benchmark shows ~20% faster CSS parse on aurl()-heavy stylesheet, and emitted output is byte-identical.What kind of change does this PR introduce?
perf
Did you add tests for your changes?
Yes —
test/configCases/html/script-src-sibling-attrscovers the new parse-time CSP-attribute capture (quoted/bare/unquoted source forms copied byte-exact onto a synthesized sibling<link>). Otherwise these are behavior-preserving optimizations already covered by the existing CSS/HTMLConfigTestCases/ConfigCacheTestCases, thecssParsing/html5libspec suites, and theHtmlParser/CssIcssExportDependencyunit tests, all passing with unchanged snapshots.Does this PR introduce a breaking change?
No.
If relevant, what needs to be documented once your changes are merged or what have you already documented?
n/a
Use of AI
Yes — I used Claude (Anthropic) to profile the parsers, identify the hot paths, implement these optimizations, and validate them against the full CSS/HTML test suites and a parse-isolated benchmark. All changes were reviewed for correctness and behavior-identical output.