perf(npm-resolver): normalize non-abbreviated registry metadata before caching#12766
perf(npm-resolver): normalize non-abbreviated registry metadata before caching#12766astegmaier wants to merge 1 commit into
Conversation
…e caching Some registries (e.g. Azure DevOps Artifacts) ignore the abbreviated metadata Accept header (application/vnd.npm.install-v1+json) and return the full package document. pnpm stored that response verbatim in the abbreviated metadata cache, so every later resolution re-read and re-parsed fields the resolver never uses (scripts, exports, devDependencies of transitive deps, readme, custom fields). On a large workspace whose feed serves full documents, a single package's metadata was ~40 MB (of which ~88% was unused) instead of a few MB. Apply the existing clearMeta normalization to the abbreviated slot too (it was previously only applied for filterMetadata). clearMeta keeps every abbreviated field (and top-level time), so resolution output is unchanged; it drops the fields the resolver never reads. This roughly halves `dedupe --offline` wall time and lowers peak memory on such a workspace, with a byte-identical lockfile. Scoped to plain abbreviated results: full-metadata and minimumReleaseAge-upgraded documents are left untouched. clearMeta is also made null-safe for packages with no versions (unpublished), which the abbreviated path can now reach.
|
💖 Thanks for opening this pull request! 💖 |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…e network layer pnpm requests abbreviated package metadata (Accept: application/vnd.npm.install-v1+json). Some registries (e.g. Azure DevOps Artifacts) ignore that header and return the full CouchDB document, including per-version fields the installer never reads (scripts, exports, devDependencies, readme, custom package.json fields). pnpm stored that response verbatim in the abbreviated metadata cache, so every later resolution re-read and re-parsed those unused fields — on a large workspace whose feed serves full documents, a single package's metadata was ~40 MB, ~88% of it unused. Fix the response at the boundary where it enters pnpm: fetchMetadataFromFromRegistry inspects the response Content-Type. When an abbreviated request comes back with anything other than application/vnd.npm.install-v1+json, the registry ignored the header and served the full document, so the fetch layer strips it to the abbreviated field set (via the existing clearMeta helper) and re-serializes. Registries that honor the header (the npm registry echoes the abbreviated Content-Type) are detected as already-abbreviated and take an early return: no clearMeta, no re-serialization — the happy path pays nothing. clearMeta is extracted from pickPackage.ts into its own module so the network layer can use it without a circular import, and is made null-safe on versions so it can run on an unpublished package. pickPackage's abbreviated slot no longer needs any special-casing: the fetch layer now guarantees abbreviated shape, so the resolver keeps only its pre-existing filterMetadata (full-slot) narrowing. This is an alternative to pnpm#12766, which applied the same normalization one layer up in pickPackage. Both produce a byte-identical on-disk cache and byte-identical resolution output; this variant additionally skips the work entirely for spec-compliant registries and removes the special-casing from the resolver. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
pnpm requests abbreviated package metadata (
Accept: application/vnd.npm.install-v1+json) — the slim document with only the fields needed to install. Some registries ignore that header and return the full document (npm's abbreviated format is a vendor convention, not something every registry implements). Azure DevOps Artifacts is one such registry: it always returns the full CouchDB document, including per-versionscripts,exports,devDependencies,readme, and arbitrary custompackage.jsonfields.pnpm stored that response verbatim in the abbreviated metadata cache, so every later resolution re-read and re-parsed all of those unused fields.
This PR normalizes the abbreviated slot down to the abbreviated field set before caching, by reusing the existing
clearMetahelper (previously applied only forfilterMetadata). The resolver already ignores the dropped fields, so resolution output is byte-identical — the cache just gets much smaller and faster to parse.The problem, concretely
On a large monorepo at Microsoft whose workspace resolves against an Azure DevOps Artifacts feed, one internal package's abbreviated metadata was 39.7 MB — of which only ~12% is install-relevant:
scriptsdevDependenciesexportsdependencies(needed)_id,readme, dev-tooling config, …)dist(needed)The same package on the public npm registry (which honors the abbreviated header) is 0.52 MB. I confirmed the registry behavior directly: requesting that package from Azure DevOps with
Accept: application/vnd.npm.install-v1+jsonreturns byte-for-byte the same response asAccept: application/json— it does no content negotiation. The public npm registry returns 3.5× less for the abbreviated header.Because dedupe/resolution parses each package's metadata (potentially once per dependent), the extra ~88% of every document dominates wall time and heap.
The fix
clearMetaalready reduces a document to the abbreviated version object field set (pluslibc, and top-leveltime/dist-tags/modified). It was only applied on thefilterMetadata(full-slot) path. This PR applies it to the abbreviated slot as well, so anything cached there conforms to the abbreviated shape regardless of what the registry actually sent:minimumReleaseAge-upgraded documents are untouched.clearMetais made null-safe for packages with no versions (unpublished), which the abbreviated path can now reach.Why resolution is unchanged
clearMetakeeps every field the resolver reads (dependencies/optionalDependencies/peerDependencies*/dist/engines/cpu/os/libc/bin/bundleDependencies/deprecated/hasInstallScript/_npmUser) and the top-leveltime(sominimumReleaseAgestill works). The only registry-manifest read of a dropped field isresolveDependencies.tsreadingpkg.scripts?.prepare, which is guarded byresolvedVia === 'git-repository'— never reached for registry packages (git dependencies get their manifest from the cloned repo, not the packument).Validated end-to-end:
pnpm dedupe --offlineon the affected monorepo produces a byte-identicalpnpm-lock.yamlwhether the cached metadata is full or normalized.Performance
Warm-cache
pnpm dedupe --lockfile-only --offlineon that monorepo (~5,000 external packages), best of repeated runs:~2.1× faster and ~29% less memory, from ~55% smaller abbreviated documents overall (the large full-document packages shrink to ~12%; already-abbreviated public packages are unchanged).
Testing
metaCache.test.ts: a registry that returns the full document (withscripts,exports,readme, etc.) results in a normalized cached document (unused fields dropped, install fields kept). Fails onmain, passes here.@pnpm/resolving.npm-resolver: full suite passes (234 tests), compile + lint clean.--filter=...[origin/main]): passes (onedeps.inspection.commandsoutdatedtest is environment-flaky and fails identically onmain, unrelated to this change).Notes
clearMetakeeps exactly the fields they already send.filterMetadataresolvers, which would also shrink themetadata-fullcache; left out here to keep the change focused.