Skip to content

perf(pacquet): close the clean-install gap to pnpm CLI#11856

Merged
zkochan merged 6 commits into
mainfrom
pacquet-perf3
May 22, 2026
Merged

perf(pacquet): close the clean-install gap to pnpm CLI#11856
zkochan merged 6 commits into
mainfrom
pacquet-perf3

Conversation

@zkochan

@zkochan zkochan commented May 22, 2026

Copy link
Copy Markdown
Member

Summary

Cuts pacquet's clean-install and full-resolution wallclock by roughly 3-4× on Linux (and roughly 2× on macOS), and brings user-CPU below pnpm's, primarily by:

  • pipelining tarball fetches with resolution (re-application of a prior draft);
  • caching disk-loaded packuments in pick_package's in-memory meta_cache so repeated resolves of the same (registry, name) don't re-pay spawn_blocking + multi-MB serde_json::from_str (biggest single win);
  • de-duplicating the resolve-time tarball prefetch via an atomic DashSet claim (one spawn per unique URL, not per (parent, child) edge), routing the prefetch download through SilentReporter so it doesn't fire pnpm:progress ahead of resolved, and dropping a write-lock that was used purely to read a CacheValue variant;
  • shaving redundant pre-flight stat syscalls in link_file (~130k saved per alotta-files install) and symlink_package (~3-5k saved), and simplifying the EEXIST recovery to a single return Ok(()) matching pnpm's linkOrCopy.

Bench (alotta-files fixture, verdaccio mock)

Linux CI numbers from the auto-bench on this PR:

Scenario pacquet@main pacquet@HEAD pnpm vs pnpm before → after
Clean Install 19.7 s 5.91 s 4.60 s 4.29× → 1.29× slower
Full Resolution 20.4 s 5.12 s 3.34 s 6.09× → 1.53× slower
Frozen Lockfile 2.02 s 2.17 s 3.57 s already 1.77× faster (unchanged)
Frozen Lockfile (Hot Cache) 756 ms 752 ms 2125 ms already 2.83× faster (unchanged)

User CPU on clean-install drops from 23.9 s (pacquet@main) to 6.19 s (pacquet@HEAD), comparable to pnpm's 6.48 s — meaning the resolver no longer over-uses CPU. The frozen-lockfile path doesn't touch the resolver, so it's unchanged (the small shift is within the stddev bands).

Local macOS sees a smaller win (Clean Install ~1.5× pnpm) — sys time is dominated by APFS metadata-journal contention that bites pacquet harder than pnpm because of pacquet's default 2 × available_parallelism rayon pool. Tracked separately as #11851 / #11857.

Commits

  1. perf(pacquet): pipeline tarball fetches with resolution — wraps the resolver chain in a PrefetchingResolver that tokio::spawns a DownloadTarballToStore::run_with_mem_cache after every successful resolve. Mirrors pnpm's packageRequester.requestPackage shape (the fetching promise is already running by the time the resolver returns). Re-applies an earlier draft (commit f375c9161e on the pacquet-perf branch). Also tightens the bench harness's git-repo check to accept worktree .git files via a gitdir: content check.
  2. perf(pacquet/resolving-npm-resolver): cache disk-loaded packuments in pick_package — the version-spec disk fast path and the publishedBy mtime shortcut loaded the packument mirror from disk but never populated ctx.meta_cache, so every later resolve of the same (registry, name) re-ran spawn_blocking(load_meta) + a multi-MB serde_json::from_str. Promote the disk-loaded packument into the install-scoped cache. Biggest single win — resolve phase 25 s → 3 s, user CPU 4× lower.
  3. perf(pacquet/tarball): dedup prefetch spawns and read-lock cache state — multiple improvements to the prefetcher's tarball download path:
    • Atomic per-URL dedup via a DashSet<String> on the PrefetchingResolver. The previous mem_cache.contains_key() gate was a TOCTOU pair (two concurrent resolvers could both pass it and both tokio::spawn); DashSet::insert is the atomic claim and only the winner spawns. The MemCache still backstops correctness.
    • Route the prefetch's DownloadTarballToStore::run_with_mem_cache through SilentReporter so pnpm:progress fetched / found_in_store don't fire ahead of the install pass's resolved. The install pass's later call to run_with_mem_cache now emits found_in_store on the Available short-circuit, so consumers still see the documented resolved → fetched|found_in_store → imported triple once per URL.
    • Switch the waiter's cache_lock.write().await to read().await — the variant inspection is read-only, and the owner branch is the only writer.
  4. perf(pacquet/package-manager): drop pre-flight stats in link_file — first half of the link_file simplification: the original two fs::metadata + fs::symlink_metadata calls per file collapsed to a single fs::metadata short-circuit, deferring dangling-symlink detection to the EEXIST recovery path.
  5. perf(pacquet/package-manager): drop redundant create_dir_all in symlink_packageforce_symlink_dir already handles a missing parent via its own NotFoundcreate_dir_all retry; the explicit pre-create was paying ~3-5k unneeded stats on a typical install.
  6. fix(pacquet/package-manager): keep one stat in link_file, simplify EEXIST — second half of the link_file simplification. Keeps the single fs::metadata pre-check (needed because fs::copy overwrites silently rather than surfacing EEXIST, so Copy callers need the no-op short-circuit) and simplifies the EEXIST recovery to a single return Ok(()) matching pnpm's linkOrCopy. Drops the dangling-symlink scrub for the same reason — pnpm doesn't scrub either, and CAFS contents are content-addressed so a "stale" dangling symlink from a crashed prior install is a corruption case the install layer is not the right place to repair. Test link_file::tests::dangling_symlink_is_replaceddangling_symlink_is_preserved locks in the new (pnpm-aligned) contract.

Where the remaining gap lives

pacquet@HEAD user-CPU on clean-install is now lower than pnpm's (6.19 s vs 6.48 s on the Linux CI). The remaining ~1.3-1.5× wallclock gap is sys time — file-system syscalls during the link phase (verify-files-cache stats, per-file linkat/clonefile/copy_file_range, store-index lookups). The macOS-only multiplier on top of that is APFS metadata-journal contention from pacquet's over-subscribed rayon pool; see #11851 and the follow-up tracker #11857.

Test plan

  • cargo nextest run -p pacquet-package-manager (link_file, symlink_package, install)
  • cargo nextest run -p pacquet-tarball (MemCache, run_with_mem_cache — the mem_cache_hit_emits_found_in_store_against_callers_reporter test locks in the silent-prefetcher + caller-emit contract)
  • cargo nextest run -p pacquet-resolving-npm-resolver (pick_package fast paths)
  • just integrated-benchmark -- --scenario clean-install --with-pnpm pacquet@HEAD pacquet@main shows the expected improvement
  • Visually verify the integrated bench (with --with-pnpm) shows pacquet@HEAD within ~1.5× of pnpm on a quiet machine
  • CI just ready

Written by an agent (Claude Code, claude-opus-4-7).

Summary by CodeRabbit

  • Performance

    • Tarball prefetching now starts earlier during resolution for faster installs.
    • Reduced cache locking contention to avoid unnecessary serialization.
  • Improvements

    • Disk-loaded package metadata is promoted into the in-memory cache to speed resolves.
    • Integration checks accept git worktree layouts.
  • Bug Fixes

    • Link/symlink handling simplified: dangling symlinks are preserved (treated as no-op) instead of being replaced.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds PrefetchingResolver to spawn background tarball downloads during resolution and refactors the fresh-lockfile install flow to open shared store/index and verified-files cache handles beforehand; also updates tarball mem-cache concurrency, metadata cache promotion, link/symlink handling, and git .git validation.

Changes

Pipelined Prefetch Resolver with Fresh Lockfile Integration

Layer / File(s) Summary
PrefetchingResolver implementation
pacquet/crates/package-manager/src/prefetching_resolver.rs, pacquet/crates/package-manager/src/lib.rs
Introduces PrefetchingResolver, PrefetchContext/OwnedFetchCtx, spawn-dedup gate, and Resolver impl; module declared and re-exported.
Fresh lockfile install flow integration
pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
Opens StoreIndex, spawns StoreIndexWriter, allocates SharedVerifiedFilesCache before resolve_importer, constructs PrefetchContext, wraps default resolver as inner_resolver then PrefetchingResolver, and reuses handles for warm-cache prefetch.
Tarball mem-cache waiter & tests
pacquet/crates/tarball/src/lib.rs, pacquet/crates/tarball/src/tests.rs, pacquet/crates/cli/src/state.rs
Waiter uses RwLock read guard and now emits found_in_store on cache-hit for the caller’s reporter; tarball_mem_cache doc clarified for Arc cloning; tests updated to assert caller-reporter sees found_in_store.
Metadata cache fast-path optimization
pacquet/crates/resolving-npm-resolver/src/pick_package.rs
Promotes disk-loaded packument metadata into install-scoped in-memory PackageMetaCache on pinned-version and published_by mtime fast paths (when not dry-run).
Link file error handling refactor & test
pacquet/crates/package-manager/src/link_file.rs, pacquet/crates/package-manager/src/link_file/tests.rs
Short-circuits when target resolves, centralizes try_import result mapping (treat AlreadyExists as Ok), removes RemoveStale variant, and adjusts dangling-symlink test to expect no-op for Hardlink.
Symlink package simplification
pacquet/crates/package-manager/src/symlink_package.rs
Remove explicit parent-dir creation; delegate missing-parent creation and retry to pacquet_fs::force_symlink_dir.
Git repository validation broadening
pacquet/tasks/integrated-benchmark/src/verify.rs
Accept .git as either a directory or a file starting with gitdir:, reading and prefix-checking the file contents.

Sequence Diagram(s):

sequenceDiagram
  participant Client
  participant PrefetchingResolver
  participant InnerResolver
  participant MemCache
  participant BackgroundTask
  participant StoreIndex

  Client->>PrefetchingResolver: resolve(importer)
  PrefetchingResolver->>InnerResolver: delegate resolve(importer)
  InnerResolver-->>PrefetchingResolver: ResolveResult (tarball URL + integrity)
  PrefetchingResolver->>MemCache: contains_key(tarball URL)?
  alt not cached
    PrefetchingResolver->>BackgroundTask: tokio::spawn DownloadTarballToStore(ctx)
    BackgroundTask->>StoreIndex: record CAS paths / verified files
    BackgroundTask->>MemCache: run_with_mem_cache(write Available/Failed)
  end
  PrefetchingResolver-->>Client: return ResolveResult unchanged
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Poem

"I’m a rabbit that hops and prefetches light,
I nudge tarballs early into the night.
While resolvers scurry, background tasks hum,
Shared caches whisper: the installs come.
Hoppity bytes — fetching feels just right!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'perf(pacquet): close the clean-install gap to pnpm CLI' directly summarizes the main objective: optimizing pacquet's clean-install performance to match pnpm CLI speeds through pipelined tarball fetches, caching, and syscall reductions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pacquet-perf3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00      7.8±0.34ms   557.7 KB/sec    1.06      8.3±0.36ms   524.8 KB/sec

@codecov-commenter

codecov-commenter commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.25352% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.63%. Comparing base (815e507) to head (850fc45).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...crates/package-manager/src/prefetching_resolver.rs 91.01% 8 Missing ⚠️
...package-manager/src/install_with_fresh_lockfile.rs 87.50% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11856      +/-   ##
==========================================
+ Coverage   87.55%   87.63%   +0.07%     
==========================================
  Files         204      205       +1     
  Lines       24394    24486      +92     
==========================================
+ Hits        21358    21458     +100     
+ Misses       3036     3028       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 1.997 ± 0.068 1.934 2.159 1.02 ± 0.04
pacquet@main 1.961 ± 0.035 1.931 2.025 1.00
pnpm 4.191 ± 0.034 4.121 4.245 2.14 ± 0.04
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 1.9974267992999999,
      "stddev": 0.06828647677135137,
      "median": 1.9840420205000002,
      "user": 2.94736074,
      "system": 2.1122347400000003,
      "min": 1.9336476555,
      "max": 2.1586832785,
      "times": [
        1.9696835865000002,
        1.9984004545000003,
        2.0097993135,
        1.9470309355000002,
        1.9337533725000002,
        2.1586832785,
        2.0356189414999997,
        2.0318621255,
        1.9557883295,
        1.9336476555
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 1.9605017801,
      "stddev": 0.03461004868814235,
      "median": 1.9390518275000002,
      "user": 2.9471409399999997,
      "system": 2.18808434,
      "min": 1.9307733975,
      "max": 2.0249281504999996,
      "times": [
        1.9397280815,
        2.0249281504999996,
        1.9333035365000002,
        2.0062975484999996,
        1.9695222105,
        1.9383755735000001,
        1.9884823945,
        1.9307733975,
        1.9373271905000002,
        1.9362797175
      ]
    },
    {
      "command": "pnpm",
      "mean": 4.191451735,
      "stddev": 0.03438895277513079,
      "median": 4.1930295245,
      "user": 8.012735439999998,
      "system": 2.5517106400000005,
      "min": 4.1214152145,
      "max": 4.2449415795,
      "times": [
        4.203513303499999,
        4.2211038755,
        4.1950356145,
        4.2449415795,
        4.1214152145,
        4.1839577225,
        4.2101310204999995,
        4.1897788885,
        4.1910234345,
        4.153616696499999
      ]
    }
  ]
}

Scenario: Frozen Lockfile (Hot Cache)

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 485.3 ± 22.0 471.7 547.2 1.00
pacquet@main 510.4 ± 18.5 496.5 549.2 1.05 ± 0.06
pnpm 2200.9 ± 99.8 2087.7 2361.6 4.54 ± 0.29
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.4852851995000001,
      "stddev": 0.022002477727391907,
      "median": 0.47991935630000004,
      "user": 0.38618528,
      "system": 0.86235394,
      "min": 0.4716730068,
      "max": 0.5472142548000001,
      "times": [
        0.5472142548000001,
        0.4820791628,
        0.4815121538,
        0.4808129258,
        0.47788585180000004,
        0.4755506378,
        0.4716730068,
        0.4810514678,
        0.47604674680000003,
        0.4790257868
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.5103572257000001,
      "stddev": 0.018533734734073617,
      "median": 0.5018195228,
      "user": 0.39075308,
      "system": 0.94856154,
      "min": 0.49648213480000003,
      "max": 0.5492035278,
      "times": [
        0.5492035278,
        0.49648213480000003,
        0.5058233318,
        0.5023046898,
        0.5069313168,
        0.49981765180000004,
        0.5404982528000001,
        0.5012855598,
        0.5013343558000001,
        0.49989143580000006
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.2008550844,
      "stddev": 0.09984419397510373,
      "median": 2.1516487722999997,
      "user": 3.02969548,
      "system": 1.3212159399999999,
      "min": 2.0876935458,
      "max": 2.3615702758,
      "times": [
        2.3615702758,
        2.1433726858,
        2.3409013628,
        2.1528965288,
        2.0876935458,
        2.1167184728,
        2.3023713248,
        2.2249588658,
        2.1276667658,
        2.1504010158
      ]
    }
  ]
}

Scenario: Clean Install

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 8.063 ± 0.129 7.850 8.325 1.36 ± 0.03
pacquet@main 30.784 ± 0.515 30.296 31.668 5.20 ± 0.12
pnpm 5.919 ± 0.088 5.845 6.124 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 8.0629095456,
      "stddev": 0.12859272763366159,
      "median": 8.0476662308,
      "user": 10.27901088,
      "system": 2.70402656,
      "min": 7.8500285933,
      "max": 8.325273235300001,
      "times": [
        8.1151044853,
        8.325273235300001,
        7.9732280973,
        8.0785980273,
        8.0349356503,
        7.8500285933,
        8.0603968113,
        8.034796633300001,
        8.1822937303,
        7.9744401923
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 30.7844599713,
      "stddev": 0.5147723942075939,
      "median": 30.6800501588,
      "user": 40.51922268,
      "system": 5.248058759999999,
      "min": 30.2964708893,
      "max": 31.6676824713,
      "times": [
        31.6676824713,
        30.3801825563,
        30.3182011393,
        31.6279691723,
        30.9081479193,
        30.8378707223,
        30.3714699533,
        30.2964708893,
        30.5222295953,
        30.9143752943
      ]
    },
    {
      "command": "pnpm",
      "mean": 5.9185251039,
      "stddev": 0.08761861302250631,
      "median": 5.8885387673,
      "user": 10.628190479999997,
      "system": 2.82264816,
      "min": 5.8446375073,
      "max": 6.1243359903000005,
      "times": [
        5.8855676633,
        5.8609744013,
        6.0034336963,
        5.911989993300001,
        5.8531669403,
        5.8915098713,
        5.8587309853,
        5.8446375073,
        6.1243359903000005,
        5.9509039903000005
      ]
    }
  ]
}

Scenario: Full Resolution

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 7.036 ± 0.094 6.885 7.202 1.77 ± 0.03
pacquet@main 29.811 ± 0.988 28.960 32.256 7.48 ± 0.26
pnpm 3.986 ± 0.039 3.940 4.058 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 7.036487907040001,
      "stddev": 0.09351767476708212,
      "median": 7.04487229214,
      "user": 7.433831720000001,
      "system": 1.7885837800000004,
      "min": 6.88517168064,
      "max": 7.20186209664,
      "times": [
        7.10776203964,
        7.06396874764,
        7.11195490564,
        6.88517168064,
        7.04074656264,
        6.92627361064,
        6.99667653064,
        7.20186209664,
        7.04899802164,
        6.98146487464
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 29.81140839824,
      "stddev": 0.9883972249588721,
      "median": 29.618189960640002,
      "user": 38.514542819999996,
      "system": 4.52202158,
      "min": 28.95950483164,
      "max": 32.25638409864,
      "times": [
        30.13334026264,
        29.87280291764,
        28.95950483164,
        28.99145739764,
        29.36357700364,
        29.36120439864,
        28.96945766364,
        29.98782941364,
        30.21852599464,
        32.25638409864
      ]
    },
    {
      "command": "pnpm",
      "mean": 3.9864796932399997,
      "stddev": 0.038838037466024766,
      "median": 3.98175383514,
      "user": 5.28751692,
      "system": 1.60615838,
      "min": 3.94011266464,
      "max": 4.05780995864,
      "times": [
        4.03403785464,
        3.9497393256400004,
        3.9664587246400003,
        4.00101610964,
        4.00255300964,
        3.9501755596400003,
        4.05780995864,
        3.9970489456400005,
        3.9658447796400003,
        3.94011266464
      ]
    }
  ]
}

@zkochan zkochan marked this pull request as ready for review May 22, 2026 13:44
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pacquet/crates/package-manager/src/prefetching_resolver.rs`:
- Around line 159-170: The contains_key() check on self.ctx.mem_cache is racy —
reserve the mem-cache entry synchronously before spawning to prevent multiple
tasks from starting for the same package_url; specifically, add an atomic
"in-flight" placeholder/entry into MemCache (or call a new reserve API) keyed by
package_url immediately prior to tokio::spawn in the code paths around
prefetching_resolver::run_with_mem_cache / the block that currently checks
self.ctx.mem_cache.contains_key(package_url), and ensure the spawned task
replaces that placeholder with the real result (or removes it on error) so
subsequent resolve() calls see the in-progress slot and skip spawning.

In `@pacquet/tasks/integrated-benchmark/src/verify.rs`:
- Around line 15-17: The current check accepts any `.git` file if
dot_git.is_file() is true; change the logic so that when `.git` is a file you
read its contents and validate it contains a gitdir pointer (e.g., starts with
"gitdir:" after trimming leading whitespace) rather than accepting arbitrary
files. Update the code path that inspects the `dot_git` variable (the branch
that currently uses dot_git.is_file()) to open and parse the file, reject it if
the prefix is not "gitdir:" (or otherwise malformed), and keep the existing
directory handling unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 730ef95a-e47e-4e6d-88b9-fc921a55d5d9

📥 Commits

Reviewing files that changed from the base of the PR and between d4136eb and baaf666.

📒 Files selected for processing (9)
  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/tasks/integrated-benchmark/src/verify.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs

📄 CodeRabbit inference engine (pacquet/AGENTS.md)

pacquet/**/*.rs: When porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(), or streamParser.write(), mirror the call site, payload, and ordering so the reporter parses pacquet's NDJSON the same way it parses pnpm's.
Declare a newtype wrapper for branded string types. Do not collapse the brand into a plain String or &str.
If upstream always validates before construction, validate in pacquet's wrapper too. The wrapper must construct only via TryFrom<String> and/or FromStr. Do not provide an infallible public constructor.
If upstream never validates, just brand for type-safety. Expose an infallible From<String> (and From<&str> when convenient).
If upstream occasionally constructs without validation, expose from_str_unchecked as an escape hatch alongside the validating constructor.
Match upstream serde behavior for branded types that cross JSON, YAML, or INI boundaries. Use #[serde(try_from = "String")] for deserialization and #[serde(into = "String")] for serialization.
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls. Fall back to manual impl only when conversion needs custom logic.
String-literal unions should become enums, not newtype wrappers. Model closed sets of valid string values as enums.
Template literal types should be treated as branded strings with validation discipline from rules 2-5.
Choose owned vs. borrowed parameters to minimize copies. Widen to the most encompassing type (&Path over &PathBuf, &str over &String) when it doesn't force extra copies.
Prefer Arc::clone(&x) / Rc::clone(&x) over x.clone() for reference-counted types, so the cost is visible at the call site.
Follow Rust API Guidelines for naming conventions.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*;. Two forms stay allowed: external-crate preludes like use rayon::prelude::*; and root-of-module re-...

Files:

  • pacquet/tasks/integrated-benchmark/src/verify.rs
  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/link_file.rs
🧠 Learnings (3)
📚 Learning: 2026-05-20T23:07:58.444Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11784
File: pacquet/crates/resolving-deps-resolver/src/hoist_peers.rs:120-133
Timestamp: 2026-05-20T23:07:58.444Z
Learning: When reviewing code in this pacquet Rust port, follow the upstream pnpm compatibility rule: only match pnpm’s behavior exactly. Do not propose review changes that intentionally deviate from pnpm’s documented/observed behavior, even if pnpm appears buggy. If you identify a real bug in pnpm behavior, the review should prioritize fixing it upstream in pnpm first, and avoid implementing a pnpm-behavior workaround here unless the same fix has already landed upstream.

Applied to files:

  • pacquet/tasks/integrated-benchmark/src/verify.rs
  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/link_file.rs
📚 Learning: 2026-05-20T19:40:55.051Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11774
File: pacquet/crates/resolving-deps-resolver/src/resolve_peers.rs:0-0
Timestamp: 2026-05-20T19:40:55.051Z
Learning: In the pacquet Rust code, ensure the semver implementation uses the `node-semver` crate (not `nodejs-semver`). `node-semver`’s public API does not include a `satisfies_with_prerelease`-style method; prerelease-tolerant matching should be implemented inline by first calling `Range::satisfies`, and when it rejects a prerelease version, retry matching against a stripped `MAJOR.MINOR.PATCH` base of the prerelease version.

Applied to files:

  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/link_file.rs
📚 Learning: 2026-05-22T00:08:44.646Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11837
File: pacquet/crates/resolving-npm-resolver/src/pick_package.rs:33-51
Timestamp: 2026-05-22T00:08:44.646Z
Learning: In the pnpm/pnpm repo’s pacquet Rust crates, do not flag Unicode ellipsis characters (U+2026, `…`) in Rust doc comments (`///` / `/** */`) as a lint violation. The pacquet crate’s `dylint.toml` only enables `perfectionist::derive_ordering`, and the Dylint `unicode-ellipsis` rule is not enabled for this project—so `…` in doc comments is an intentional, repo-consistent style.

Applied to files:

  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/link_file.rs
🔇 Additional comments (5)
pacquet/crates/resolving-npm-resolver/src/pick_package.rs (1)

102-111: LGTM!

Also applies to: 538-549, 569-574

pacquet/crates/tarball/src/lib.rs (1)

1844-1854: LGTM!

pacquet/crates/cli/src/state.rs (1)

14-19: LGTM!

pacquet/crates/package-manager/src/symlink_package.rs (1)

50-64: LGTM!

pacquet/crates/package-manager/src/link_file.rs (1)

152-199: ⚡ Quick win

Extend stale dangling-symlink recovery to copy-backed imports (Copy/Auto/CloneOrCopy)

The stale-symlink scrub in pacquet/crates/package-manager/src/link_file.rs (lines ~152-199) runs only when the initial failure is io::ErrorKind::AlreadyExists. The original rationale that std::fs::copy “follows” a dangling destination symlink target and then returns NotFound doesn’t match the available std::fs::copy documentation (it’s implemented via destination-path open/create semantics, not documented dereferencing behavior). However, the stdlib docs also don’t specify which io::ErrorKind is returned for dangling destination symlinks, so the AlreadyExists-only gate may still skip cleanup for copy-backed branches, breaking pnpm parity.

Adjust the copy-backed path handling so dangling destination symlinks are scrubbed and retried whenever they produce an error for the destination symlink dirent on the supported platforms (or broaden the trigger based on the actual observed io::ErrorKind).

Comment thread pacquet/crates/package-manager/src/prefetching_resolver.rs Outdated
Comment thread pacquet/tasks/integrated-benchmark/src/verify.rs Outdated
zkochan added 2 commits May 22, 2026 16:00
Pacquet currently resolves the entire dependency tree before any
tarball is fetched: `resolve_importer` walks every transitive dep
through the resolver chain and returns; only then does
`install_subtree` start calling `DownloadTarballToStore`. On the
`alotta-files` clean-install benchmark this puts pacquet 2-3x
behind the TypeScript pnpm CLI whenever resolution runs from cold.

This change wraps the resolver chain in a `PrefetchingResolver`
that, after each successful resolve returning a tarball-shaped
result, `tokio::spawn`s a `DownloadTarballToStore::run_with_mem_cache`
into the shared `MemCache`. Resolution of children, siblings, and
the rest of the tree continues in parallel with that download.
When `install_subtree` later calls `run_with_mem_cache` for the
same URL, the per-URL cache either returns `CacheValue::Available`
immediately or briefly blocks on the existing `Notify`.

Mirrors pnpm's `packageRequester.requestPackage` shape, which
returns a `pkgResponse` whose `fetching` field is a Promise that
is already running by the time the resolver returns.

The `store_index`, `store_index_writer`, and `verified_files_cache`
handles are opened above the resolver chain so the prefetcher
shares the same store-index / writer / verify cache with the
later install pass. The original warm-cache `prefetch_cas_paths`
batch still runs after `resolve_importer` returns to fold the
per-package SQL lookups for warm cache hits the resolve-time
prefetcher couldn't satisfy.

Mechanical changes to support the prefetch capture:
- `Reporter` bounds tightened to `+ 'static` on the path that
  feeds the spawn.
- `verify::ensure_git_repo_common` now accepts a `.git` file
  (linked git worktree) as well as a `.git` directory, so the
  benchmark runner works in repos with worktrees.

Originally drafted on the prior `pacquet-perf` branch and
forward-ported here.
… pick_package

Pacquet's `pick_package` has two disk fast paths that load the
on-disk packument mirror and return without going through the
network branch:

- The version-spec fast path (line 517) — when the spec is a
  pinned version and the disk meta already contains that exact
  version.
- The `publishedBy` mtime shortcut (line 542) — when the mirror
  file is fresh enough vs. the `minimumReleaseAge` cutoff.

Both branches populated a *local* `meta_cached_in_store` but never
called `ctx.meta_cache.set(...)` before returning. The
network-fetch branch (step 5) is the only path that sets the
in-memory cache today, so every subsequent resolver call for the
same `(registry, name)` re-acquires the per-key fetch semaphore,
re-checks the empty `meta_cache`, and re-runs
`spawn_blocking(|| load_meta(...))` plus a multi-MB
`serde_json::from_str`. On the alotta-files clean-install
benchmark (1362 resolved nodes, ~800 unique packages,
cache_dir preserved across runs since the harness only wipes
`store-dir`), this drove ~3-4× the user-CPU of pnpm's equivalent
resolve.

Promote the disk-loaded packument into the install-scoped
`meta_cache` after a successful fast-path return so later resolves
for the same key short-circuit to the in-memory cache. The cache
is rebuilt per install, so the promoted entry can't outlive the
freshness window the disk read already accepted — the next install
starts a fresh cache and re-evaluates the disk shortcut. Gated on
`!opts.dry_run` to match the existing `meta_cache.set` in step 5.

Measured impact on the clean-install bench (verdaccio mock, this
machine):

```
before: pacquet 43.62s (resolve 25.6s + install 16.8s),
        user 27.7s sys 36.8s
after:  pacquet 20.26s (resolve  6.2s + install 13.0s),
        user  6.7s sys 30.4s
pnpm:   17s (reference)
```

Resolve walltime drops 4×; user CPU drops 4× because the redundant
JSON parses are gone. Closes most of the remaining gap to pnpm on
this fixture.
@zkochan

zkochan commented May 22, 2026

Copy link
Copy Markdown
Member Author

Re CodeRabbit's worktree-detection nit: applied in fcbf255 (squashed into the pipelining commit). ensure_git_repo_common now reads the .git file and asserts the content begins with `gitdir:` rather than accepting any file at that path. Local verification: `cat /Volumes/.../.git` is `gitdir: /Volumes/src/pnpm/pnpm/worktrees/pacquet-perf3` — passes the new check. The bench still runs from the worktree.

Side note on the auto-benchmark above (~6× gap on the CI runner vs. ~1.5× on my 10-core local): consistent with the rayon over-subscription pattern reported in #11851 — pacquet's pool sizes at `2 × available_parallelism`, which on lower-core CI runners interacts badly with the kernel's FS metadata journal. Tracked as the top item in the #11857 follow-up.


Written by an agent (Claude Code, claude-opus-4-7).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pacquet/crates/package-manager/src/link_file.rs`:
- Around line 167-190: The recovery path currently removes any symlink found at
target_link which can clobber a concurrently created valid symlink; change it to
re-check resolution before removal by calling fs::metadata(target_link) after
confirming meta.file_type().is_symlink(): if fs::metadata returns Ok (the
symlink resolves to a live target) treat it as a concurrent success and return
the original Import error instead of removing it, and only attempt
fs::remove_file when fs::metadata returns an Err that indicates the link is
dangling (e.g., NotFound or other resolution error); also, when calling
fs::remove_file, map and ignore NotFound (treat it as a benign race) and still
convert other remove errors into LinkFileError::RemoveStale using the existing
pattern (references: target_link, source_file, fs::symlink_metadata,
meta.file_type().is_symlink(), fs::metadata, fs::remove_file,
LinkFileError::Import, LinkFileError::RemoveStale).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6319f59e-f25a-408e-9f1a-4febf761c625

📥 Commits

Reviewing files that changed from the base of the PR and between baaf666 and fcbf255.

📒 Files selected for processing (9)
  • pacquet/crates/cli/src/state.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/tasks/integrated-benchmark/src/verify.rs
✅ Files skipped from review due to trivial changes (1)
  • pacquet/crates/cli/src/state.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Compile & Lint
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs

📄 CodeRabbit inference engine (pacquet/AGENTS.md)

pacquet/**/*.rs: When porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(), or streamParser.write(), mirror the call site, payload, and ordering so the reporter parses pacquet's NDJSON the same way it parses pnpm's.
Declare a newtype wrapper for branded string types. Do not collapse the brand into a plain String or &str.
If upstream always validates before construction, validate in pacquet's wrapper too. The wrapper must construct only via TryFrom<String> and/or FromStr. Do not provide an infallible public constructor.
If upstream never validates, just brand for type-safety. Expose an infallible From<String> (and From<&str> when convenient).
If upstream occasionally constructs without validation, expose from_str_unchecked as an escape hatch alongside the validating constructor.
Match upstream serde behavior for branded types that cross JSON, YAML, or INI boundaries. Use #[serde(try_from = "String")] for deserialization and #[serde(into = "String")] for serialization.
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls. Fall back to manual impl only when conversion needs custom logic.
String-literal unions should become enums, not newtype wrappers. Model closed sets of valid string values as enums.
Template literal types should be treated as branded strings with validation discipline from rules 2-5.
Choose owned vs. borrowed parameters to minimize copies. Widen to the most encompassing type (&Path over &PathBuf, &str over &String) when it doesn't force extra copies.
Prefer Arc::clone(&x) / Rc::clone(&x) over x.clone() for reference-counted types, so the cost is visible at the call site.
Follow Rust API Guidelines for naming conventions.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*;. Two forms stay allowed: external-crate preludes like use rayon::prelude::*; and root-of-module re-...

Files:

  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/tasks/integrated-benchmark/src/verify.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
🧠 Learnings (3)
📚 Learning: 2026-05-20T19:40:55.051Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11774
File: pacquet/crates/resolving-deps-resolver/src/resolve_peers.rs:0-0
Timestamp: 2026-05-20T19:40:55.051Z
Learning: In the pacquet Rust code, ensure the semver implementation uses the `node-semver` crate (not `nodejs-semver`). `node-semver`’s public API does not include a `satisfies_with_prerelease`-style method; prerelease-tolerant matching should be implemented inline by first calling `Range::satisfies`, and when it rejects a prerelease version, retry matching against a stripped `MAJOR.MINOR.PATCH` base of the prerelease version.

Applied to files:

  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
📚 Learning: 2026-05-22T00:08:44.646Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11837
File: pacquet/crates/resolving-npm-resolver/src/pick_package.rs:33-51
Timestamp: 2026-05-22T00:08:44.646Z
Learning: In the pnpm/pnpm repo’s pacquet Rust crates, do not flag Unicode ellipsis characters (U+2026, `…`) in Rust doc comments (`///` / `/** */`) as a lint violation. The pacquet crate’s `dylint.toml` only enables `perfectionist::derive_ordering`, and the Dylint `unicode-ellipsis` rule is not enabled for this project—so `…` in doc comments is an intentional, repo-consistent style.

Applied to files:

  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
📚 Learning: 2026-05-20T23:07:58.444Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11784
File: pacquet/crates/resolving-deps-resolver/src/hoist_peers.rs:120-133
Timestamp: 2026-05-20T23:07:58.444Z
Learning: When reviewing code in this pacquet Rust port, follow the upstream pnpm compatibility rule: only match pnpm’s behavior exactly. Do not propose review changes that intentionally deviate from pnpm’s documented/observed behavior, even if pnpm appears buggy. If you identify a real bug in pnpm behavior, the review should prioritize fixing it upstream in pnpm first, and avoid implementing a pnpm-behavior workaround here unless the same fix has already landed upstream.

Applied to files:

  • pacquet/crates/package-manager/src/lib.rs
  • pacquet/crates/resolving-npm-resolver/src/pick_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/tasks/integrated-benchmark/src/verify.rs
  • pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
🔇 Additional comments (17)
pacquet/crates/resolving-npm-resolver/src/pick_package.rs (1)

102-112: LGTM!

Also applies to: 538-549, 569-574

pacquet/tasks/integrated-benchmark/src/verify.rs (1)

15-20: LGTM!

Also applies to: 23-37

pacquet/crates/package-manager/src/prefetching_resolver.rs (5)

125-172: Race condition on prefetch dedup already flagged.

The contains_key() check at line 170 is racy as noted in a prior review: it runs before the spawned task has a chance to populate MemCache, so back-to-back resolves of the same tarball can still enqueue multiple downloads before any task observes the in-progress state.


1-57: LGTM!


58-93: LGTM!


94-123: LGTM!


174-240: LGTM!

pacquet/crates/package-manager/src/lib.rs (1)

26-26: LGTM!

Also applies to: 59-59

pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs (6)

1-6: LGTM!


83-89: LGTM!


373-383: LGTM!


385-437: LGTM!


527-547: LGTM!


554-561: LGTM!

pacquet/crates/tarball/src/lib.rs (1)

1844-1854: LGTM!

pacquet/crates/package-manager/src/link_file.rs (1)

125-165: LGTM!

Also applies to: 192-211

pacquet/crates/package-manager/src/symlink_package.rs (1)

5-5: LGTM!

Also applies to: 50-57

Comment thread pacquet/crates/package-manager/src/link_file.rs Outdated
zkochan added 4 commits May 22, 2026 16:26
Two related improvements on the `run_with_mem_cache` hot path:

1. The `PrefetchingResolver` was firing one
   `tokio::spawn(DownloadTarballToStore::run_with_mem_cache)` per
   resolver call, but the deps-resolver invokes `resolve()` once per
   `(parent, child)` edge. With a tree of ~1362 unique packages and
   ~3-5k edges in the alotta-files fixture, that meant ~3-5k
   spawned download tasks — the first per URL did the actual fetch,
   the rest immediately found `MemCache` populated, parked on the
   per-URL `Notify`, and contributed nothing beyond scheduler /
   lock contention. Skip the spawn when the URL is already in
   `MemCache`. The `pick_package` fetch-locker still collapses
   concurrent metadata fetches, so the duplicate `resolve()` calls
   themselves are already cheap.

2. The waiter path in `run_with_mem_cache` acquired
   `cache_lock.write().await` just to read the `CacheValue` variant
   (return `Available`, capture `Notify`, or surface `Failed`).
   With dozens of late visitors converging on a popular tarball
   slot (peer-suffix variants of the same package, prefetcher
   tasks that raced the install pass), every visitor serialized
   behind an exclusive guard even though none of them mutated
   anything. Switch to `read().await`; the RwLock's owner branch
   below is the only writer, and reader-writer fairness keeps the
   owner unblocked.

The two changes compound: the dedup keeps the late-visitor
population small, the read-lock keeps the few visitors that
remain (legitimate second visits from `install_subtree`'s
per-parent symlink path) from serializing.

Measured on the clean-install bench (verdaccio mock, this
machine): min wallclock drops from ~12s to ~12s but the high
end falls from ~31s to ~20s. Variance shrinks because the
prefetcher no longer creates a thundering-herd at every popular
tarball.
`link_file` ran `fs::metadata(target_link)` followed by
`fs::symlink_metadata(target_link)` before every import syscall,
to detect a live target dirent (skip) or a dangling symlink left
behind by a crashed prior install (scrub + retry). With ~130k
files per clean `alotta-files` install and almost every call
landing on a fresh, unpopulated target directory, that was ~260k
`stat` syscalls on `NotFound` paths — pure overhead in the common
case and not what pnpm does.

Pnpm's `tryImportIndexedDir`
(`fs/indexed-pkg-importer/src/importIndexedDir.ts`) calls
`fs.linkSync(src, dst)` directly and relies on the kernel
returning `EEXIST` when a target is already present. Match that
shape: run the import syscall optimistically, recover on
`AlreadyExists` only. The recovery path covers both the
live-target case (`fs::metadata.is_ok()` → skip) and the
dangling-symlink case (`symlink_metadata` + `is_symlink` →
`remove_file` + retry).

Measured on the clean-install bench (verdaccio mock, this
machine):

```
before: pacquet 12-20s, sys ~45s (varies)
after:  pacquet 11-22s, sys ~30s
pnpm:   9-20s,  sys ~19s (reference)
```

Sys time drops ~33% as the redundant stat syscalls go away.
User CPU is unchanged. Closes the bulk of the install-phase
syscall gap against pnpm.
…nk_package

`symlink_package` called `fs::create_dir_all(symlink_path.parent())`
before every `force_symlink_dir` call. With ~3-5k symlinks per
install on the alotta-files fixture (one per package slot plus the
per-parent edges into each consumer's `node_modules`), that was
~3-5k `stat` syscalls almost always landing on an existing parent
dir — the slot's `node_modules` directory is created once by
`import_indexed_dir::populate_dir` and reused for every child
symlink under it, the root-level `node_modules` is created once
on install start.

`force_symlink_dir` already handles a missing parent: on
`NotFound` from the symlink syscall it calls `create_dir_all`
once and retries. Defer to that path instead of pre-creating
unconditionally — the common case now drops the pre-stat
entirely, and the rare uncreated-parent case still works via
the retry. The `CreateParentDir` variant stays on
`SymlinkPackageError` for the hoisted linker's separate
mkdir loop in `hoist.rs`.
…XIST

The "drop pre-flight stats in link_file" commit removed both
\`fs::metadata\` and \`fs::symlink_metadata\` before each import
syscall, plus the EEXIST-recovery's dangling-symlink scrub. That
broke the \`Copy\` no-op contract (\`fs::copy\` silently overwrites
rather than surfacing EEXIST, so a clean install with a Copy method
would clobber a live target the test suite locks in) and threw away
the live-target short-circuit code review flagged as a TOCTOU
regression.

Restore exactly one \`fs::metadata\` short-circuit at the top of
\`link_file\`: if the target resolves to a live file (or a symlink to
one), return immediately. That keeps the no-op contract for every
import method, including \`Copy\`, with a single stat per file
instead of the original two (~130k extra stats saved per
alotta-files install — half the savings of removing both, but
behavior-preserving).

Simplify the EEXIST handler to a single \`return Ok(())\` that
mirrors pnpm's \`linkOrCopy\` (\`fs/indexed-pkg-importer/src/index.ts\`):
the kernel surfaces EEXIST → the dirent is already there → no-op.
Drop the dangling-symlink scrub for the same reason — pnpm does not
scrub either, and CAFS contents are content-addressed, so a "stale"
dangling symlink left by a crashed prior install is a corruption
case the install layer is not the right place to repair. Update
\`link_file::tests::dangling_symlink_is_replaced\` →
\`dangling_symlink_is_preserved\` to match the new (pnpm-aligned)
contract.
@zkochan

zkochan commented May 22, 2026

Copy link
Copy Markdown
Member Author

Addressing the three codex findings.

P2 — `link_file` TOCTOU regression

Confirmed. My recovery branch matched on `is_symlink()` and rejected the much more common case (a concurrent writer racing in a real file between the pre-flight `fs::metadata` and the import syscall). Replaced the whole EEXIST recovery with a single `return Ok(())`, matching pnpm's `linkOrCopy` (`fs/indexed-pkg-importer/src/index.ts`). Net effect: the live-file race is honoured, and the dangling-symlink scrub is gone — pnpm doesn't scrub either, and a dangling symlink left by a crashed prior install is a corruption case the install layer isn't the right place to repair. Updated `link_file::tests::dangling_symlink_is_replaced` → `dangling_symlink_is_preserved` to lock in the new (pnpm-aligned) contract.

Folded into 850fc45 (squashed).

P2 — Prefetcher reporter ordering

Confirmed. `PrefetchingResolver` routed `DownloadTarballToStore` through the install's reporter, so a prefetch task's `fetched` / `found_in_store` event could land before `install_package_from_registry` emits `resolved` for the same package, breaking the `resolved → fetched|found_in_store → imported` triple `@pnpm/cli.default-reporter` consumes.

Fix has two parts in 14d5e38 (squashed):

  1. The prefetcher now runs its background download through `SilentReporter` so it doesn't emit anything against the install's reporter.
  2. `run_with_mem_cache`'s `Available` short-circuit (both the immediate branch and the post-`notified()` branch) now emits `found_in_store` against the caller's reporter. This is the install pass's only chance to see the cache hit — without it, the install pass would silently skip past the slot and the event triple would be missing its middle. Updated the corresponding tarball test to match (`mem_cache_hit_does_not_re_emit_for_second_requester` → `mem_cache_hit_emits_found_in_store_against_callers_reporter`).

P3 — Non-atomic prefetch dedup

Confirmed. `mem_cache.contains_key()` + `mem_cache.insert` inside `run_with_mem_cache` is a TOCTOU pair; two concurrent resolvers for the same URL could both pass the check and both `tokio::spawn`, then race the actual `MemCache` insert. The `MemCache` would still dedup correctness-wise (one task wins the insert and becomes the owner; the loser parks on `Notify`), but both spawns counted toward scheduler / lock contention — undercutting the "one spawn per unique tarball" target.

Replaced the non-atomic check with an explicit `DashSet` on `OwnedFetchCtx` and gated the spawn on `DashSet::insert`, whose `true` return is the atomic claim. Only the caller that flips the membership spawns; everyone else returns early. Same commit as the reporter-ordering fix (14d5e38).

Local re-bench post-fixes (3 runs, alotta-files, verdaccio): `real ~11-16s, user ~6s, sys ~26-39s`. No regression vs. the prior PR state.


Written by an agent (Claude Code, claude-opus-4-7).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pacquet/crates/tarball/src/lib.rs (1)

1837-1869: ⚡ Quick win

Sync the waiter comment with the new caller-side emit.

The note above still says later awaiters do not re-trigger progress, but Lines 1855-1869 now intentionally emit found_in_store for the caller. Updating that rationale will keep the prefetch/install reporter contract readable.

As per coding guidelines, "Write code that explains itself. Comments are for the non-obvious why, not a translation of the what."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pacquet/crates/tarball/src/lib.rs` around lines 1837 - 1869, The comment
above the cache waiter no longer matches behavior: although earlier text states
later awaiters don't re-trigger progress, the code path in CacheValue::Available
now calls emit_progress_found_in_store::<Reporter> so caller-side reporters do
emit `found_in_store`; update the comment to explain that the first writer still
attaches its handler (run_without_mem_cache / pnpm:progress) but that late
awaiters will emit a caller-specific `found_in_store` via
emit_progress_found_in_store::<Reporter> so consumers see the full `resolved →
found_in_store → imported` sequence; keep the rationale about using
cache_lock.read().await (read-only) and why the owner remains the only writer.
pacquet/crates/tarball/src/tests.rs (1)

1536-1662: ⚡ Quick win

Add coverage for the post-Notify Available path.

This regression test only hits the immediate-Available short-circuit by awaiting the first call before starting the second. The new caller-side emit after notify.notified().await is still unpinned, so the "prefetch still in flight when install asks" case could regress unnoticed. A concurrent variant that forces the second caller through the wait branch would cover the rest of the new behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pacquet/crates/tarball/src/tests.rs` around lines 1536 - 1662, Add a
concurrent variant of the test that forces the second caller through the
post-Notify wait branch: start the first
DownloadTarballToStore::run_with_mem_cache::<pacquet_reporter::SilentReporter>(&mem_cache)
without awaiting its completion (e.g. spawn it or use tokio::join!) so it is
still in-flight, then start the second
run_with_mem_cache::<RecordingReporter>(&mem_cache) which must take the
notify.notified().await path; await both join handles and then assert exactly
one ProgressMessage::FoundInStore was emitted for "second@2.0.0" as in the
existing test. Ensure the mockito expectation remains expect(1) and reuse
EVENTS/RecordingReporter/SilentReporter, mem_cache, pkg_integrity, and the same
URL so the concurrent timing exercises the notify-wait branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pacquet/crates/tarball/src/lib.rs`:
- Around line 1837-1869: The comment above the cache waiter no longer matches
behavior: although earlier text states later awaiters don't re-trigger progress,
the code path in CacheValue::Available now calls
emit_progress_found_in_store::<Reporter> so caller-side reporters do emit
`found_in_store`; update the comment to explain that the first writer still
attaches its handler (run_without_mem_cache / pnpm:progress) but that late
awaiters will emit a caller-specific `found_in_store` via
emit_progress_found_in_store::<Reporter> so consumers see the full `resolved →
found_in_store → imported` sequence; keep the rationale about using
cache_lock.read().await (read-only) and why the owner remains the only writer.

In `@pacquet/crates/tarball/src/tests.rs`:
- Around line 1536-1662: Add a concurrent variant of the test that forces the
second caller through the post-Notify wait branch: start the first
DownloadTarballToStore::run_with_mem_cache::<pacquet_reporter::SilentReporter>(&mem_cache)
without awaiting its completion (e.g. spawn it or use tokio::join!) so it is
still in-flight, then start the second
run_with_mem_cache::<RecordingReporter>(&mem_cache) which must take the
notify.notified().await path; await both join handles and then assert exactly
one ProgressMessage::FoundInStore was emitted for "second@2.0.0" as in the
existing test. Ensure the mockito expectation remains expect(1) and reuse
EVENTS/RecordingReporter/SilentReporter, mem_cache, pkg_integrity, and the same
URL so the concurrent timing exercises the notify-wait branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8b0be60c-f1e7-4cfa-bdc5-753a9f16f0f6

📥 Commits

Reviewing files that changed from the base of the PR and between fcbf255 and 850fc45.

📒 Files selected for processing (6)
  • pacquet/crates/package-manager/src/link_file.rs
  • pacquet/crates/package-manager/src/link_file/tests.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/tarball/src/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Compile & Lint
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Code Coverage
  • GitHub Check: Lint and Test (ubuntu-latest)
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs

📄 CodeRabbit inference engine (pacquet/AGENTS.md)

pacquet/**/*.rs: When porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(), or streamParser.write(), mirror the call site, payload, and ordering so the reporter parses pacquet's NDJSON the same way it parses pnpm's.
Declare a newtype wrapper for branded string types. Do not collapse the brand into a plain String or &str.
If upstream always validates before construction, validate in pacquet's wrapper too. The wrapper must construct only via TryFrom<String> and/or FromStr. Do not provide an infallible public constructor.
If upstream never validates, just brand for type-safety. Expose an infallible From<String> (and From<&str> when convenient).
If upstream occasionally constructs without validation, expose from_str_unchecked as an escape hatch alongside the validating constructor.
Match upstream serde behavior for branded types that cross JSON, YAML, or INI boundaries. Use #[serde(try_from = "String")] for deserialization and #[serde(into = "String")] for serialization.
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls. Fall back to manual impl only when conversion needs custom logic.
String-literal unions should become enums, not newtype wrappers. Model closed sets of valid string values as enums.
Template literal types should be treated as branded strings with validation discipline from rules 2-5.
Choose owned vs. borrowed parameters to minimize copies. Widen to the most encompassing type (&Path over &PathBuf, &str over &String) when it doesn't force extra copies.
Prefer Arc::clone(&x) / Rc::clone(&x) over x.clone() for reference-counted types, so the cost is visible at the call site.
Follow Rust API Guidelines for naming conventions.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*;. Two forms stay allowed: external-crate preludes like use rayon::prelude::*; and root-of-module re-...

Files:

  • pacquet/crates/package-manager/src/link_file/tests.rs
  • pacquet/crates/tarball/src/tests.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
🧠 Learnings (3)
📚 Learning: 2026-05-20T19:40:55.051Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11774
File: pacquet/crates/resolving-deps-resolver/src/resolve_peers.rs:0-0
Timestamp: 2026-05-20T19:40:55.051Z
Learning: In the pacquet Rust code, ensure the semver implementation uses the `node-semver` crate (not `nodejs-semver`). `node-semver`’s public API does not include a `satisfies_with_prerelease`-style method; prerelease-tolerant matching should be implemented inline by first calling `Range::satisfies`, and when it rejects a prerelease version, retry matching against a stripped `MAJOR.MINOR.PATCH` base of the prerelease version.

Applied to files:

  • pacquet/crates/package-manager/src/link_file/tests.rs
  • pacquet/crates/tarball/src/tests.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
📚 Learning: 2026-05-22T00:08:44.646Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11837
File: pacquet/crates/resolving-npm-resolver/src/pick_package.rs:33-51
Timestamp: 2026-05-22T00:08:44.646Z
Learning: In the pnpm/pnpm repo’s pacquet Rust crates, do not flag Unicode ellipsis characters (U+2026, `…`) in Rust doc comments (`///` / `/** */`) as a lint violation. The pacquet crate’s `dylint.toml` only enables `perfectionist::derive_ordering`, and the Dylint `unicode-ellipsis` rule is not enabled for this project—so `…` in doc comments is an intentional, repo-consistent style.

Applied to files:

  • pacquet/crates/package-manager/src/link_file/tests.rs
  • pacquet/crates/tarball/src/tests.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
📚 Learning: 2026-05-20T23:07:58.444Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11784
File: pacquet/crates/resolving-deps-resolver/src/hoist_peers.rs:120-133
Timestamp: 2026-05-20T23:07:58.444Z
Learning: When reviewing code in this pacquet Rust port, follow the upstream pnpm compatibility rule: only match pnpm’s behavior exactly. Do not propose review changes that intentionally deviate from pnpm’s documented/observed behavior, even if pnpm appears buggy. If you identify a real bug in pnpm behavior, the review should prioritize fixing it upstream in pnpm first, and avoid implementing a pnpm-behavior workaround here unless the same fix has already landed upstream.

Applied to files:

  • pacquet/crates/package-manager/src/link_file/tests.rs
  • pacquet/crates/tarball/src/tests.rs
  • pacquet/crates/package-manager/src/prefetching_resolver.rs
  • pacquet/crates/tarball/src/lib.rs
  • pacquet/crates/package-manager/src/symlink_package.rs
  • pacquet/crates/package-manager/src/link_file.rs
🔇 Additional comments (3)
pacquet/crates/package-manager/src/link_file/tests.rs (1)

163-173: LGTM!

Also applies to: 175-175, 179-180, 183-183, 187-188

pacquet/crates/package-manager/src/symlink_package.rs (1)

5-5: LGTM!

Also applies to: 50-57

pacquet/crates/package-manager/src/link_file.rs (1)

136-164: ⚡ Quick win

fs::metadata pre-check doesn’t cause dangling-symlink “mutation” for Copy/Auto
std::fs::copy(from, to) opens the destination path to for writing/truncation; if to is a dangling symlink it fails (e.g., ENOENT), it does not dereference the symlink and write through to its target. Switching this pre-check to symlink_metadata would therefore change behavior by turning a previously failing Copy/Auto case into an early no-op.

			> Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants