perf: node_modules-anchor fast paths#1189
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1189 +/- ##
==========================================
+ Coverage 93.75% 93.78% +0.03%
==========================================
Files 22 22
Lines 4229 4282 +53
==========================================
+ Hits 3965 4016 +51
- Misses 264 266 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81f5c0eeb3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if path.inside_node_modules | ||
| && let Some(anchor) = self.pkg_anchor(path) | ||
| && anchor.path() != path.path() | ||
| { | ||
| let anchor_canonical = self.canonicalize_with_visited(&anchor, visited)?; | ||
| // `anchor` is a genuine path-ancestor node of `path`, so the prefix always strips and | ||
| // the remainder is non-empty (guaranteed by the `anchor != path` guard above). | ||
| let rest = path.path().strip_prefix(anchor.path()).unwrap(); | ||
| let canonical = anchor_canonical.normalize_with(rest, self); | ||
| let _ = | ||
| path.canonicalized.set((Arc::downgrade(&canonical.0), canonical.0.path.clone())); | ||
| return Ok(canonical); |
There was a problem hiding this comment.
Preserve symlink resolution below package anchors
When the resolved path is inside node_modules/<pkg> and any component below that package root is itself a symlink (for example node_modules/pkg/lib -> ../../shared and resolving pkg/lib/index.js), this fast path canonicalizes only the package anchor and appends the remaining suffix without checking lib. With the default ResolveOptions::symlinks = true, load_realpath should return the symlink target as the previous per-component walk did, but this now returns and caches the unresolved node_modules/pkg/lib/index.js path.
Useful? React with 👍 / 👎.
Merging this PR will degrade performance by 6.83%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7c5991b5f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // `anchor` is a genuine path-ancestor node of `path`, so the prefix always strips and | ||
| // the remainder is non-empty (guaranteed by the `anchor != path` guard above). | ||
| let rest = path.path().strip_prefix(anchor.path()).unwrap(); | ||
| let canonical = anchor_canonical.normalize_with(rest, self); |
There was a problem hiding this comment.
Keep walking suffixes that escape package anchors
When the cached path contains .. components after the package segment, this suffix is not necessarily “strictly below” the anchor. For example, resolving from an absolute directory like /repo/node_modules/a/.. to ./b/index.js produces a path whose anchor is node_modules/a and whose rest is ../b/index.js; appending that to the canonicalized a skips the normal per-component walk over sibling b. If node_modules/b is a symlink and symlinks is enabled, this now returns the unresolved sibling path instead of the symlink target, whereas the old walk would lstat and follow b.
Useful? React with 👍 / 👎.
a7c5991 to
9080c54
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9080c548d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ctx.missing_dependencies.is_none() | ||
| && let Some(anchor) = self.cache.pkg_anchor(cached_path) | ||
| { | ||
| return self.cache.get_package_json(&anchor, &self.options, ctx); |
There was a problem hiding this comment.
Fall back when the anchor lacks package.json
When resolving a file under node_modules/<pkg>/sub/... where the package anchor has no package.json but a nested directory does, this fast path returns None immediately for the normal resolve()/resolve_file() APIs. The previous directory walk (and the new slow path still used by resolve_with_context) would keep probing up to the node_modules boundary and return that nested package.json, so Resolution::package_json() now differs between APIs and regresses packages that are resolved by explicit subpaths even though their top-level directory has no manifest.
Useful? React with 👍 / 👎.
87ad871 to
50143c3
Compare
Adds a `node_modules` canonicalization test suite + committed symlink fixtures that assert the resolver's canonicalization equals `std::fs::canonicalize` across the layout families: - **`canonicalize_matches_os_for_all_node_modules`** — walks every path in each installed `fixtures/bench-pm/installs/<combo>/node_modules` tree (npm/pnpm/yarn/bun × flat/isolated/hoisted/pnp); skipped when the bench fixtures aren't installed. - **`symlinked_package_anchor_walks_suffix_symlinks`** — a symlinked workspace anchor with a symlink in the suffix below it. - **`real_package_anchor_walks_internal_symlinks`** — a real `node_modules/<pkg>` anchor that ships an internal directory/file symlink (`lib -> dist`, a re-export file). - **`nested_monorepo_canonicalize_matches_os`** — a monorepo version conflict where a workspace package nests its own dependency version via an isolated-store symlink (root resolves `dep@2`, `packages/ui` nests `dep@1`). These all pass on `main` — they pin canonicalization correctness independent of any resolver optimization, so they can land on their own (and guard #1189). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Two fast paths for resolving inside `node_modules`, sharing one allocation-free `Cache::pkg_anchor` (a parent-chain walk with no syscalls) to the deepest `<...>/node_modules/<pkg>` directory (the "anchor"). 1. Canonicalize the anchor, not every component. `canonicalize_with_visited` canonicalizes the real anchor — recursively, so a symlinked project root is still resolved — and appends the suffix below it without a per-component `read_link` walk. A symlinked anchor (a linked workspace package, or the top-level link of an isolated layout) takes the normal walk. The suffix is appended only after `suffix_below_anchor_is_symlink_free` confirms every component strictly below the anchor is a real, non-symlink entry — reusing the cached `lstat`, so it is free for components already stat'd while resolving the file. An internal package symlink or a `..` suffix falls back to the normal walk. Correct for any layout because it verifies the suffix rather than assuming it from a detected layout. 2. Jump to the package's `package.json`. `find_package_json_for_a_package` jumps straight to the anchor's `package.json` instead of probing every intermediate directory, falling back to the directory walk when dependency tracking is enabled or when the anchor has no manifest of its own. Syscalls on the 18-request `package_managers` workload vs `main`: flat 108->86, isolated 143->120, yarn-isolated 135->112, yarn-pnp 174->134 (-16% to -23%).
…kspace) `packages/ui` depends on `react@17.0.2` while the app uses `react@18.3.1`, so the conflict un-hoists a nested `packages/ui/node_modules/react`; `packages/utils` links `@bench/ui` (workspace-to-workspace). Two `package_managers` workload requests resolve `react` from `packages/ui` (the nested 17) and `@bench/ui` from `packages/utils`; lockfiles regenerated for all combos.
65b36ce to
7cc5b97
Compare
|
Too hard, I give up. |
Summary
Two fast paths for resolving inside
node_modules, built on one allocation-free helper —Cache::pkg_anchor, which walks the cached parent chain (no syscalls) to the deepest<...>/node_modules/<pkg>directory (the "anchor").1. Canonicalize the anchor, not every component
canonicalize_with_visitedcanonicalizes the real anchor and appends the suffix below it without a per-componentread_linkwalk:/tmp→/private/tmp, a symlinked$HOME/checkout, a Windows junction) is still resolved.suffix_below_anchor_is_symlink_freeconfirms every component strictly below the anchor is a real, non-symlink entry. Thatlstatreuses the cache (CachedMeta), so components already stat'd while resolving the file cost nothing — and any internal package symlink (lib -> dist, a re-export file) or..suffix falls back to the normal walk.This is correct for any layout — flat, isolated, PnP, workspace, nested, or a hand-built tree — because it verifies the suffix rather than assuming it from a detected layout. No layout detection, no markers, no trust gate.
2. Jump to the package's
package.jsonfind_package_json_for_a_packagejumps straight to the anchor'spackage.jsoninstead of probing every intermediate directory, falling back to the directory walk when dependency tracking is enabled (resolve_with_context, somissing_dependenciesstays byte-identical) or when the anchor has no manifest of its own.Measured syscall reduction
FileSystem-call counts for the 18-request
package_managersworkload (which includes monorepo cases — a nestedreact@17under a workspace and a workspace-to-workspace dep), this branch vsmain, measured with one counting-FileSystemharness:The reduction comes from
read(skipped intermediatepackage.jsonENOENT probes) and the eliminated per-component canonicalize walk; the suffix check adds no syscalls in real installs because the lstats are cache hits.Tests
node_modules_canonicalize.rswalks every path (~12.4k) under each installedfixtures/bench-pm/installs/<combo>/node_modulestree (all npm/pnpm/yarn/bun × flat/isolated/hoisted/pnp combos) and asserts the resolver's canonicalization equalsstd::fs::canonicalize— Windows-aware — exercising the symlinks into virtual stores, the.binshim directory, and the monorepo layouts (a nestedreact@17underpackages/ui, a workspace-to-workspace link).internal-symlink) and a symlinked workspace anchor with a suffix symlink (symlinked-workspace); each test fails without the suffix check.--deny warnings, fmt.🤖 Generated with Claude Code