Skip to content

perf(pacquet): parse packuments off the reactor and release the network permit before CPU work#12318

Merged
zkochan merged 1 commit into
mainfrom
clean-install-syscall-churn
Jun 10, 2026
Merged

perf(pacquet): parse packuments off the reactor and release the network permit before CPU work#12318
zkochan merged 1 commit into
mainfrom
clean-install-syscall-churn

Conversation

@zkochan

@zkochan zkochan commented Jun 10, 2026

Copy link
Copy Markdown
Member

Why

Follow-up to #12315's investigation: the one variation where pacquet still trailed pnpm on benchmarks.vlt.sh was clean babylon (34.3 s vs 30.1 s on the 2026-06-10 run).

A Linux-side profile (strace + download/linkat timelines in a container) localized the gap to the resolution phase, not store-write/linking as initially suspected:

  • Cold --lockfile-only resolves: pacquet 44 s vs pnpm 17.7 s, with pacquet at ~30% of one core — network-throughput-bound, not CPU (warm-metadata resolves: pacquet 9.8 s vs pnpm 18.7 s, i.e. pacquet's CPU side is already 2× faster).
  • Same bytes on both sides (~520–600 MB of packuments, ~1,480 docs), similar socket counts and syscall counts — yet a third of pnpm's effective throughput, with a 15-second dead zone in the download timeline while the @fluentui/* subtree resolved.

Root cause: fetch_full_metadata_cached parsed every packument body inline on the tokio worker that buffered it — while still holding its network-concurrency permit — and then wrote the multi-MB metadata mirror to disk synchronously on the same worker. High-release-cadence packages ship multi-megabyte packuments; a burst of them pinned every reactor worker in serde for hundreds of milliseconds at a time, stalling the sockets those workers pump and parking the concurrency semaphore's permits on CPU work. The tarball pipeline already avoids exactly this ("body fully buffered; release the network permit before the CPU-bound work" + spawn_blocking); the packument path never got the same treatment.

What

In both metadata fetchers (fetch_full_metadata_cached, fetch_full_metadata):

  • drop the request guard as soon as the body is buffered, so the semaphore goes back to bounding sockets;
  • deserialize the packument and persist the mirror inside one spawn_blocking, keeping the reactor free to pump the remaining transfers;
  • a 304 releases the guard before the mirror disk read for the same reason.

New FetchMetadataError::ParseTask variant carries a JoinError from the blocking task.

Validation

  • Linux container A/B (cold babylon full installs over WAN, interleaved): median ~38.0 s (main) → ~32.7 s (patched), patched winning 5 of 6 paired rounds; cold --lockfile-only 44 s → 24 s in the back-to-back pair.
  • Real vlt.sh harness on a GitHub ubuntu-24.04-arm runner, clean variation, pnpm and patched pacquet in the same hyperfine session (the runner's network was degraded during the run — both tools ~2× slower than the official numbers — which makes the relative result the meaningful one): babylon: pacquet 54.8 s vs pnpm 59.2 s — flipped from −14 % behind (official run) to ahead under identical conditions. astro clean stayed within noise (pacquet already led it officially). All hyperfine runs exited 0.
  • cargo nextest run for pacquet-resolving-npm-resolver (234), pacquet-package-manager + pacquet-cli (740) all pass; clippy -D warnings, dylint, fmt clean.

Profile notes for future work (not addressed here)

The extract+link phase still shows heavy scheduler churn relative to pnpm (~100 k involuntary context switches vs ~750; sched_yield storms from the rayon pools): tokio workers + the dedicated CAS-write pool + the global rayon pool + up to 2×cores blocking threads can leave ~5×cores runnable threads on the box. Wall-clock impact is currently hidden behind network time, but it's the next candidate if the gap reopens on faster runners.


Written by an agent (Claude Code, claude-fable-5).

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling when parsing npm package metadata fails, with a new error type to capture failures during deserialization.
  • Performance

    • Optimized metadata fetching to release network resources earlier, preventing resource exhaustion when processing large package metadata by offloading JSON parsing to a separate task queue.

…lease the network permit first

A cold resolve parsed every packument body inline on the tokio worker
that buffered it — while still holding its network-concurrency permit
— and then wrote the multi-megabyte metadata mirror to disk
synchronously on the same worker. High-release-cadence packages
(`@fluentui/*`, `@types/node`) ship packuments of several megabytes,
so a burst of them pinned every reactor worker in serde for hundreds
of milliseconds at a time, stalling the sockets those workers pump
and parking concurrency permits on CPU work. On a cold babylon
resolve (1,476 packuments, ~520 MB) the metadata phase ran at a third
of pnpm's effective throughput and the tarball pipeline sat idle
behind it (a 15-second dead zone in the download timeline while the
`@fluentui` subtree parsed).

Both metadata fetchers now drop the request guard as soon as the body
is buffered — the semaphore goes back to bounding sockets, matching
the buffer-then-release shape the tarball pipeline already uses — and
deserialize + persist the mirror inside one `spawn_blocking`, keeping
the reactor free to pump the remaining transfers. A 304 releases the
guard before the mirror disk read for the same reason.

Cold babylon full installs in a Linux container improved from a ~38 s
median to ~32.7 s (pnpm: ~27 s), winning 5 of 6 interleaved A/B
rounds; cold `--lockfile-only` resolves dropped from 44 s to 24 s in
the back-to-back pair.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 10, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Socket permit undercount on 304 🐞 Bug ☼ Reliability
Description
In fetch_full_metadata_cached, the 304 branch drops the ThrottledClientGuard but keeps the
reqwest::Response in scope across load_meta_async(...).await, so the HTTP connection can remain
open while the concurrency permit is already released. This can let the semaphore undercount live
sockets during the mirror read and increase peak concurrent connections under load.
Code

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs[R142-148]

    if response.status() == StatusCode::NOT_MODIFIED {
+        // No body to stream — release the connection and its
+        // network-concurrency permit before the mirror disk read.
+        drop(client);
        let Some(path) = mirror_path else {
            // 304 without an existing cache to fall back on — the
            // registry over-reached on `If-None-Match: <stale>`.
Evidence
The 304 branch awaits load_meta_async while response is still in scope; since send_with_retry
pairs the response with a guard specifically to bound concurrent sockets, dropping the guard without
also dropping the response can leave a live connection unaccounted for during the await.

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs[142-156]
pacquet/crates/network/src/retry.rs[108-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
In the `StatusCode::NOT_MODIFIED` (304) branch of `fetch_full_metadata_cached`, the code calls `drop(client)` but leaves `response` alive across an `.await` (`load_meta_async`). In async Rust, locals that remain in scope are held across `.await`, so the underlying connection can stay open while the permit is already released.

### Issue Context
`send_with_retry` returns `(ThrottledClientGuard, Response)` and documents that the guard must be kept alive through body streaming to bound concurrent sockets; releasing the guard while the response/connection is still alive defeats that bounding.

### Fix Focus Areas
- pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs[142-156]

### Suggested change
Before the mirror read `.await`, explicitly drop the `response` (and ideally do it before dropping the guard so permits don’t get released while a socket is still open):

```rust
if response.status() == StatusCode::NOT_MODIFIED {
   drop(response);
   drop(client);
   // ... then await load_meta_async
}
```

(Any equivalent restructuring that ensures `response` is dropped before the awaited disk read is fine.)

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dac8ccfa-8b46-4e7a-bee7-87776418a0dc

📥 Commits

Reviewing files that changed from the base of the PR and between 52be454 and 5f544e0.

📒 Files selected for processing (3)
  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Code Coverage
  • GitHub Check: Lint and Test (ubuntu-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Analyze (javascript)
  • GitHub Check: Dylint
  • GitHub Check: Doc
  • GitHub Check: ubuntu-latest / Node.js 24 / Test
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs

📄 CodeRabbit inference engine (pacquet/AGENTS.md)

pacquet/**/*.rs: Log emissions are part of matching pnpm — when porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(...), or streamParser.write(...), mirror the call site, payload, and ordering so @pnpm/cli.default-reporter parses pacquet's NDJSON the same way
Declare a newtype wrapper for branded string types instead of collapsing the brand into a plain String or &str in Rust
If upstream TypeScript always validates before construction of a branded string, validate in the Rust wrapper too via TryFrom<String> and/or FromStr and do not provide an infallible public constructor
If upstream TypeScript never validates a branded string, just brand for type-safety in Rust by exposing an infallible From<String> constructor
If upstream TypeScript occasionally constructs a branded string without validation, expose from_str_unchecked in Rust as an escape hatch alongside the validating constructor
Match upstream serde behavior for branded strings crossing JSON, YAML, or INI boundaries by using #[serde(try_from = "String")] for deserialization and #[serde(into = "String")] for serialization
Derive simple conversions for branded strings using #[derive(derive_more::From)] and #[derive(derive_more::Into)] instead of handwriting impl blocks; use manual impl only when conversion needs custom logic
Model TypeScript string literal unions (like 'auto' | 'always' | 'never') as Rust enums instead of newtype wrappers, since the set of valid values is closed
Treat TypeScript string template literal types (like `${string}@${string}`) the same as branded string types in Rust, using a newtype wrapper with validation
Follow the code style guide in CODE_STYLE_GUIDE.md — imports, modules, naming, ownership and borrowing, parameter type selection, trait bounds, pattern matching, pipe-trait, error handling, test layout, and cloning of Arc and Rc
Choose owned vs. borrowed parameters to minimize copies; widen to t...

Files:

  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
🧠 Learnings (4)
📚 Learning: 2026-05-20T19:40:55.051Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11774
File: pacquet/crates/resolving-deps-resolver/src/resolve_peers.rs:0-0
Timestamp: 2026-05-20T19:40:55.051Z
Learning: In the pacquet Rust code, ensure the semver implementation uses the `node-semver` crate (not `nodejs-semver`). `node-semver`’s public API does not include a `satisfies_with_prerelease`-style method; prerelease-tolerant matching should be implemented inline by first calling `Range::satisfies`, and when it rejects a prerelease version, retry matching against a stripped `MAJOR.MINOR.PATCH` base of the prerelease version.

Applied to files:

  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
📚 Learning: 2026-05-22T00:08:44.646Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11837
File: pacquet/crates/resolving-npm-resolver/src/pick_package.rs:33-51
Timestamp: 2026-05-22T00:08:44.646Z
Learning: In the pnpm/pnpm repo’s pacquet Rust crates, do not flag Unicode ellipsis characters (U+2026, `…`) in Rust doc comments (`///` / `/** */`) as a lint violation. The pacquet crate’s `dylint.toml` only enables `perfectionist::derive_ordering`, and the Dylint `unicode-ellipsis` rule is not enabled for this project—so `…` in doc comments is an intentional, repo-consistent style.

Applied to files:

  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
📚 Learning: 2026-05-20T23:07:58.444Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11784
File: pacquet/crates/resolving-deps-resolver/src/hoist_peers.rs:120-133
Timestamp: 2026-05-20T23:07:58.444Z
Learning: When reviewing code in this pacquet Rust port, follow the upstream pnpm compatibility rule: only match pnpm’s behavior exactly. Do not propose review changes that intentionally deviate from pnpm’s documented/observed behavior, even if pnpm appears buggy. If you identify a real bug in pnpm behavior, the review should prioritize fixing it upstream in pnpm first, and avoid implementing a pnpm-behavior workaround here unless the same fix has already landed upstream.

Applied to files:

  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
📚 Learning: 2026-06-06T18:58:37.156Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 12243
File: pacquet/crates/package-manager/src/install_package_by_snapshot.rs:319-322
Timestamp: 2026-06-06T18:58:37.156Z
Learning: When reviewing Rust code, do not assume `matches!(expr, Pattern(_))` will move out of `expr` if `Pattern(_)` contains no by-value bindings. `matches!` desugars to a `match` that auto-borrows the scrutinee for discrimination, so even if `expr` is a non-`Copy` value behind a shared reference (e.g., `&T`), the macro should not move-out of the borrowed data purely due to `matches!`. Treat `matches!(&expr, Pattern(_))` as a readability/clarity improvement, not a correctness requirement. Only flag potential move-out-of-borrow risks when the pattern includes by-value bindings that would require moving the matched value.

Applied to files:

  • pacquet/crates/resolving-npm-resolver/src/errors.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
  • pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
🔇 Additional comments (3)
pacquet/crates/resolving-npm-resolver/src/errors.rs (1)

56-64: LGTM!

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs (1)

112-112: LGTM!

Also applies to: 142-154

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs (1)

124-124: LGTM!

Also applies to: 143-146, 173-221


📝 Walkthrough

Walkthrough

The PR introduces a ParseTask error variant to represent JSON deserialization task failures and refactors two npm metadata fetchers to explicitly drop HTTP client handles before offloading JSON parsing to background blocking tasks, releasing network concurrency permits during CPU-bound work.

Changes

Resource lifecycle and spawn_blocking JSON parsing

Layer / File(s) Summary
ParseTask error variant
pacquet/crates/resolving-npm-resolver/src/errors.rs
FetchMetadataError enum gains ParseTask variant with url and tokio::task::JoinError fields, including Display and miette diagnostic metadata.
fetch_full_metadata refactoring
pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs
HTTP client handle is captured and explicitly dropped before JSON deserialization is moved from inline serde_json::from_str to a tokio::task::spawn_blocking task; decode errors map to FetchMetadataError::Decode and task/join failures map to FetchMetadataError::ParseTask.
fetch_full_metadata_cached refactoring
pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs
Client capture and early drop applied to both 304 Not Modified path (before mirror reads) and 2xx path (before spawn_blocking JSON deserialization + cache persistence); task/join failures mapped to FetchMetadataError::ParseTask via double-? error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • pnpm/pnpm#12029: Both PRs modify npm metadata fetchers (fetch_full_metadata.rs and fetch_full_metadata_cached.rs); #12029 changed retry/client handling while this PR offloads JSON parsing to spawn_blocking and introduces ParseTask error mapping.

Poem

🐰 A client released so timely and bright,
JSON parsing off to the threads—out of sight,
Network permits dance free all throughout,
No more heavyweight work holding them out! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main performance optimization: parsing packuments off the reactor thread and releasing the network permit before CPU-bound work.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch clean-install-syscall-churn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

PR Summary by Qodo

perf: parse packuments off the reactor and release network permit before CPU work
✨ Enhancement 🕐 20-40 Minutes

Grey Divider

Walkthroughs

Description
• Moves packument deserialization and mirror persistence into spawn_blocking in both
  fetch_full_metadata and fetch_full_metadata_cached, preventing multi-MB serde parses from
  pinning tokio reactor workers and stalling concurrent socket I/O.
• Drops the ThrottledClientGuard (network-concurrency semaphore permit) immediately after the
  response body is fully buffered — before any CPU work — so the semaphore bounds sockets rather than
  parse time, matching the existing tarball pipeline pattern.
• On 304 responses, releases the guard before the mirror disk read for the same reason.
• Adds a new FetchMetadataError::ParseTask variant to surface JoinError from the blocking task
  (panic or runtime shutdown).
Diagram
sequenceDiagram
    participant R as Tokio Reactor
    participant S as Semaphore
    participant F as Fetcher
    participant B as spawn_blocking
    participant D as Disk Mirror

    R->>S: acquire permit
    S-->>F: ThrottledClientGuard
    F->>R: GET /registry/pkg (stream body)
    R-->>F: raw_body buffered
    F->>S: drop(client) — release permit
    Note over S,R: Semaphore free; reactor pumps next sockets
    F->>B: spawn_blocking(serde_json::from_str)
    B->>D: save_meta (mirror write)
    B-->>F: Package
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. block_in_place instead of spawn_blocking
  • ➕ No thread spawn overhead
  • ➕ Simpler error handling (no JoinError)
  • ➖ Still blocks the current reactor worker thread
  • ➖ Does not free the worker to pump other sockets — defeats the purpose
  • ➖ tokio docs recommend spawn_blocking for CPU-bound work over block_in_place
2. Combine guard drop + parse into a single async step without spawn_blocking
  • ➕ Simpler code, no new error variant needed
  • ➖ Serde parsing is CPU-bound and can run for hundreds of ms on large packuments
  • ➖ Would still pin the reactor worker during the parse, causing the same socket-stall problem

Recommendation: The PR's approach — explicit drop(client) followed by spawn_blocking — is the correct pattern for this codebase. The main alternative worth noting is using tokio::task::block_in_place instead of spawn_blocking, which avoids spawning a new thread but still blocks the current worker, preventing it from stealing other tasks. However, spawn_blocking is strictly better here because it moves the work entirely off the reactor thread pool onto a dedicated blocking pool, allowing the reactor to continue pumping sockets. A second alternative would be using serde_json::from_reader with a Cursor in an async context, but that doesn't avoid the CPU-bound nature of parsing. The PR's approach is optimal.

Grey Divider

File Changes

Enhancement (1)
errors.rs Add ParseTask error variant for blocking task JoinError +10/-0

Add ParseTask error variant for blocking task JoinError

• Adds a new 'FetchMetadataError::ParseTask' variant that wraps a 'tokio::task::JoinError'. This surfaces panics or runtime-shutdown cancellations from the 'spawn_blocking' tasks introduced in the two fetcher modules.

pacquet/crates/resolving-npm-resolver/src/errors.rs


Other (2)
fetch_full_metadata.rs Release network permit before spawn_blocking parse in fetch_full_metadata +14/-3

Release network permit before spawn_blocking parse in fetch_full_metadata

• Captures the 'ThrottledClientGuard' returned by 'send_with_retry' (previously discarded with '_client') and explicitly drops it after the response body is fully buffered. Packument deserialization is moved into 'tokio::task::spawn_blocking' so the reactor thread is free to pump remaining socket I/O during the CPU-bound serde parse.

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs


fetch_full_metadata_cached.rs Release network permit before spawn_blocking parse+persist in fetch_full_metadata_cached +45/-21

Release network permit before spawn_blocking parse+persist in fetch_full_metadata_cached

• Applies the same guard-drop-before-CPU-work pattern as 'fetch_full_metadata'. Additionally drops the guard before the mirror disk read on 304 responses. Deserialization and mirror persistence are combined into a single 'spawn_blocking' closure, keeping the reactor free during both the serde parse and the synchronous file write.

pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata_cached.rs


Grey Divider

Qodo Logo

@github-actions

Copy link
Copy Markdown
Contributor

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00      7.3±0.09ms   590.7 KB/sec    1.02      7.5±0.69ms   576.7 KB/sec

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.71%. Comparing base (d976edf) to head (5f544e0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ing-npm-resolver/src/fetch_full_metadata_cached.rs 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12318      +/-   ##
==========================================
+ Coverage   87.65%   87.71%   +0.06%     
==========================================
  Files         288      289       +1     
  Lines       35475    35624     +149     
==========================================
+ Hits        31095    31247     +152     
+ Misses       4380     4377       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown
Contributor

Integrated-Benchmark Report (Linux)

Each scenario reports direct installs and pnpr installs. Bencher consumes pacquet@HEAD and pnpr@HEAD.

Scenario: Isolated linker: fresh restore, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 10.018 ± 0.161 9.858 10.388 1.86 ± 0.06
pacquet@main 10.057 ± 0.130 9.978 10.417 1.87 ± 0.05
pnpr@HEAD 5.384 ± 0.138 5.277 5.756 1.00
pnpr@main 5.404 ± 0.133 5.287 5.631 1.00 ± 0.04
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 10.017704292480001,
      "stddev": 0.16067292411308276,
      "median": 9.95407801108,
      "user": 3.3073586999999995,
      "system": 3.29330364,
      "min": 9.85795805158,
      "max": 10.38778808558,
      "times": [
        9.92033034858,
        10.38778808558,
        9.93648043258,
        9.91833999058,
        10.058451249580001,
        9.93477742558,
        9.992880553580001,
        10.19836119758,
        9.97167558958,
        9.85795805158
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 10.05747461448,
      "stddev": 0.1299342290316299,
      "median": 10.02964970108,
      "user": 3.1967086,
      "system": 3.2011163399999996,
      "min": 9.97844568758,
      "max": 10.41709596858,
      "times": [
        10.06891978858,
        10.00074651658,
        10.04224697958,
        10.02596453858,
        9.97886235658,
        10.04058469658,
        10.03333486358,
        9.97844568758,
        10.41709596858,
        9.98854474858
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 5.3842999462800005,
      "stddev": 0.13755358585405802,
      "median": 5.35208286508,
      "user": 2.5425676,
      "system": 2.8733183399999995,
      "min": 5.27676659358,
      "max": 5.75598956258,
      "times": [
        5.40803522758,
        5.31567597058,
        5.36640672358,
        5.3951475035800005,
        5.2836461765800005,
        5.27676659358,
        5.36679216558,
        5.75598956258,
        5.33678053258,
        5.33775900658
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 5.40449842818,
      "stddev": 0.133264656044673,
      "median": 5.32742165358,
      "user": 2.5429517,
      "system": 2.8487267399999996,
      "min": 5.2869473755800005,
      "max": 5.63098520158,
      "times": [
        5.63098520158,
        5.2869473755800005,
        5.32624802258,
        5.48295811758,
        5.28955286058,
        5.31052059858,
        5.57216380758,
        5.52359831258,
        5.29341470058,
        5.32859528458
      ]
    }
  ]
}

Scenario: Isolated linker: fresh restore, hot cache + hot store

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 648.4 ± 12.9 630.5 672.5 1.00
pacquet@main 665.2 ± 35.7 637.3 738.4 1.03 ± 0.06
pnpr@HEAD 779.1 ± 81.5 735.8 1009.5 1.20 ± 0.13
pnpr@main 776.4 ± 49.3 743.6 905.5 1.20 ± 0.08
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.64843110006,
      "stddev": 0.012855532797284895,
      "median": 0.6489226985600001,
      "user": 0.36625749999999996,
      "system": 1.3115138799999997,
      "min": 0.6305207220600001,
      "max": 0.6724569330600001,
      "times": [
        0.6724569330600001,
        0.6305207220600001,
        0.64758253706,
        0.65026286006,
        0.6583546570600001,
        0.6534967460600001,
        0.6372756930600001,
        0.6356680450600001,
        0.65862360206,
        0.6400692050600001
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.6651677703600001,
      "stddev": 0.03565056959686553,
      "median": 0.64578285706,
      "user": 0.36864449999999993,
      "system": 1.31026848,
      "min": 0.6373482340600001,
      "max": 0.7383924080600001,
      "times": [
        0.6427776390600001,
        0.69820455406,
        0.7383924080600001,
        0.6651904750600001,
        0.6380965580600001,
        0.64878807506,
        0.7030181590600001,
        0.6388038340600001,
        0.6373482340600001,
        0.6410577670600001
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.77909978656,
      "stddev": 0.08147469586256419,
      "median": 0.75620586106,
      "user": 0.3888621,
      "system": 1.32867268,
      "min": 0.7358225020600001,
      "max": 1.00953619306,
      "times": [
        0.7537868200600001,
        0.7560261970600001,
        0.75638552506,
        0.75874824606,
        0.7358225020600001,
        0.7416027810600001,
        1.00953619306,
        0.7679393290600001,
        0.75935763006,
        0.7517926420600001
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.7764442101600001,
      "stddev": 0.04927642223606241,
      "median": 0.7544804190600001,
      "user": 0.39519039999999994,
      "system": 1.31221738,
      "min": 0.7435991270600001,
      "max": 0.90547633806,
      "times": [
        0.75748533206,
        0.7736609430600001,
        0.80675769606,
        0.7508382770600001,
        0.7469096460600001,
        0.75026090406,
        0.90547633806,
        0.7779783320600001,
        0.7435991270600001,
        0.75147550606
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 9.170 ± 0.023 9.131 9.207 1.80 ± 0.04
pacquet@main 9.278 ± 0.046 9.214 9.363 1.82 ± 0.04
pnpr@HEAD 5.105 ± 0.047 5.042 5.198 1.00 ± 0.02
pnpr@main 5.105 ± 0.110 5.021 5.399 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 9.170304871119999,
      "stddev": 0.023374401523629327,
      "median": 9.16537366612,
      "user": 3.70488122,
      "system": 3.28502824,
      "min": 9.131187692620001,
      "max": 9.20745777562,
      "times": [
        9.20745777562,
        9.16038210462,
        9.17636382262,
        9.16391298862,
        9.18810480762,
        9.131187692620001,
        9.20049435162,
        9.148288282620001,
        9.16002254162,
        9.16683434362
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 9.27756225862,
      "stddev": 0.04630413011140634,
      "median": 9.27123243262,
      "user": 3.6605168199999993,
      "system": 3.2652195399999995,
      "min": 9.214396754620001,
      "max": 9.36318135562,
      "times": [
        9.232666011620001,
        9.280934290620001,
        9.273439917620001,
        9.26902494762,
        9.32562192662,
        9.214396754620001,
        9.36318135562,
        9.31989133862,
        9.24391286162,
        9.25255318162
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 5.10497121992,
      "stddev": 0.04733092717171625,
      "median": 5.10418520762,
      "user": 2.3939623199999995,
      "system": 2.7781312400000004,
      "min": 5.04190246662,
      "max": 5.19754261562,
      "times": [
        5.09856333462,
        5.080519929619999,
        5.123783502619999,
        5.1098070806199996,
        5.04190246662,
        5.19754261562,
        5.08463188762,
        5.042733141619999,
        5.11939551262,
        5.150832727619999
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 5.104766965719998,
      "stddev": 0.10986918684231943,
      "median": 5.08045095162,
      "user": 2.3825613199999998,
      "system": 2.76368494,
      "min": 5.02116636362,
      "max": 5.3985165216199995,
      "times": [
        5.10698995962,
        5.142216501619999,
        5.073444246619999,
        5.048519102619999,
        5.03662251062,
        5.0353945506199995,
        5.08745765662,
        5.3985165216199995,
        5.09734224362,
        5.02116636362
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, hot cache + hot store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 1.412 ± 0.014 1.395 1.433 2.11 ± 0.08
pacquet@main 1.406 ± 0.011 1.394 1.424 2.10 ± 0.08
pnpr@HEAD 0.671 ± 0.024 0.650 0.732 1.00
pnpr@main 0.672 ± 0.039 0.643 0.776 1.00 ± 0.07
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 1.4121855616599999,
      "stddev": 0.013933077579351548,
      "median": 1.40885295176,
      "user": 1.52979132,
      "system": 1.78811608,
      "min": 1.39488547326,
      "max": 1.43274780226,
      "times": [
        1.4010979292599999,
        1.39838453826,
        1.4085775462599999,
        1.40027107926,
        1.39488547326,
        1.42721844426,
        1.40912835726,
        1.43274780226,
        1.4262752862599999,
        1.42326916026
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 1.40617268466,
      "stddev": 0.011358453118770977,
      "median": 1.40469467176,
      "user": 1.5400773200000002,
      "system": 1.78112528,
      "min": 1.39380359126,
      "max": 1.42392898326,
      "times": [
        1.39528005226,
        1.42392898326,
        1.39471976726,
        1.41460778026,
        1.42343435026,
        1.40352645626,
        1.3982765772599999,
        1.4058628872599999,
        1.39380359126,
        1.40828640126
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.67072438896,
      "stddev": 0.023739783761279734,
      "median": 0.66514325176,
      "user": 0.33862531999999995,
      "system": 1.2619566799999997,
      "min": 0.6503194572600001,
      "max": 0.7323890592600001,
      "times": [
        0.7323890592600001,
        0.6738196632600001,
        0.6791604132600001,
        0.66364444826,
        0.6736317202600001,
        0.65553247926,
        0.6666420552600001,
        0.6518109712600001,
        0.6503194572600001,
        0.6602936222600001
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.6716872321599999,
      "stddev": 0.038514660819486735,
      "median": 0.66022310926,
      "user": 0.33849142,
      "system": 1.2521583799999998,
      "min": 0.6429031582600001,
      "max": 0.77645472226,
      "times": [
        0.68714502426,
        0.6618315452600001,
        0.6541664432600001,
        0.6610052832600001,
        0.65944093526,
        0.6429031582600001,
        0.77645472226,
        0.65379351726,
        0.6649766492600001,
        0.6551550432600001
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, cold cache + hot store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 4.896 ± 0.031 4.844 4.937 7.12 ± 0.28
pacquet@main 5.204 ± 0.031 5.160 5.271 7.56 ± 0.30
pnpr@HEAD 0.689 ± 0.018 0.669 0.729 1.00 ± 0.05
pnpr@main 0.688 ± 0.027 0.657 0.757 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 4.896212041420001,
      "stddev": 0.03101217792385769,
      "median": 4.90448953812,
      "user": 1.8125289199999997,
      "system": 1.9896081200000002,
      "min": 4.844166609619999,
      "max": 4.93746342362,
      "times": [
        4.91859012562,
        4.844166609619999,
        4.93746342362,
        4.908537358619999,
        4.90344073062,
        4.89055172662,
        4.851929290619999,
        4.905538345619999,
        4.92755537562,
        4.87434742762
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 5.20378988752,
      "stddev": 0.030883993797042063,
      "median": 5.19615913362,
      "user": 1.7897567200000002,
      "system": 2.1917720199999997,
      "min": 5.15981348362,
      "max": 5.2710631356199995,
      "times": [
        5.22936070862,
        5.18823785862,
        5.204080408619999,
        5.183247402619999,
        5.2710631356199995,
        5.21587934062,
        5.15981348362,
        5.18707937762,
        5.18641696062,
        5.21272019862
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.68897308072,
      "stddev": 0.018057391692706468,
      "median": 0.68267650912,
      "user": 0.34679301999999995,
      "system": 1.2992290199999998,
      "min": 0.6689919186200001,
      "max": 0.7291547906200001,
      "times": [
        0.7291547906200001,
        0.69656302962,
        0.6949073896200001,
        0.6808321726200001,
        0.6779835136200001,
        0.70469350362,
        0.6831554656200001,
        0.67125147062,
        0.68219755262,
        0.6689919186200001
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.68795327802,
      "stddev": 0.027047176533396746,
      "median": 0.6865555696200001,
      "user": 0.33963652,
      "system": 1.27114192,
      "min": 0.65693897162,
      "max": 0.7568784596200001,
      "times": [
        0.7568784596200001,
        0.6860325056200001,
        0.65693897162,
        0.67555786362,
        0.68821351762,
        0.67767559162,
        0.6870786336200001,
        0.6937233986200001,
        0.69292370962,
        0.66451012862
      ]
    }
  ]
}

@github-actions

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchpr/12318
Testbedpacquet
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
milliseconds (ms)
(Result Δ%)
Upper Boundary
milliseconds (ms)
(Limit %)
isolated-linker.fresh-install.cold-cache.cold-store📈 view plot
🚷 view threshold
9,170.30 ms
(+3.52%)Baseline: 8,858.08 ms
10,629.70 ms
(86.27%)
isolated-linker.fresh-install.cold-cache.hot-store📈 view plot
🚷 view threshold
4,896.21 ms
(-2.48%)Baseline: 5,020.81 ms
6,024.97 ms
(81.27%)
isolated-linker.fresh-install.hot-cache.hot-store📈 view plot
🚷 view threshold
1,412.19 ms
(-0.82%)Baseline: 1,423.87 ms
1,708.64 ms
(82.65%)
isolated-linker.fresh-restore.cold-cache.cold-store📈 view plot
🚷 view threshold
10,017.70 ms
(-0.42%)Baseline: 10,059.54 ms
12,071.45 ms
(82.99%)
isolated-linker.fresh-restore.hot-cache.hot-store📈 view plot
🚷 view threshold
648.43 ms
(-0.06%)Baseline: 648.85 ms
778.62 ms
(83.28%)
🐰 View full continuous benchmarking report in Bencher

@github-actions

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchpr/12318
Testbedpnpr

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkLatencymilliseconds (ms)
isolated-linker.fresh-install.cold-cache.cold-store📈 view plot
⚠️ NO THRESHOLD
5,104.97 ms
isolated-linker.fresh-install.cold-cache.hot-store📈 view plot
⚠️ NO THRESHOLD
688.97 ms
isolated-linker.fresh-install.hot-cache.hot-store📈 view plot
⚠️ NO THRESHOLD
670.72 ms
isolated-linker.fresh-restore.cold-cache.cold-store📈 view plot
⚠️ NO THRESHOLD
5,384.30 ms
isolated-linker.fresh-restore.hot-cache.hot-store📈 view plot
⚠️ NO THRESHOLD
779.10 ms
🐰 View full continuous benchmarking report in Bencher

@zkochan zkochan merged commit 9cd8070 into main Jun 10, 2026
28 checks passed
@zkochan zkochan deleted the clean-install-syscall-churn branch June 10, 2026 22:33
zkochan added a commit that referenced this pull request Jun 11, 2026
…ndexed metadata mirror (#12322)

## Why

Profiling a warm babylon resolve (metadata mirrors hot, ~520 MB across 1,476 packuments) showed the dominant cost was not I/O but **hydration**: every version of every packument was deserialized into typed manifests — maps, strings, and `serde_json::Value` trees built, hashed, and dropped — even though a pick consults only the version *strings* plus the handful of manifests it actually considers. typescript ships 3,800 versions; a pick needs one. The `#[serde(flatten)]` catch-all on `PackageVersion` compounded it by routing the whole struct through serde's buffering deserializer. The same hydration cost was paid on cold resolves inside the `spawn_blocking` parses from #12318.

## What

Three changes, applied in profile-driven order:

**1. Lazy packument hydration** (`df6f70eb57`). `Package::versions` becomes a `PackageVersions` map whose entries hold the raw JSON fragment serde captured (`Arc<RawValue>`, shared not copied) and hydrate into `Arc<PackageVersion>` on first access, cached per slot. Key scans never hydrate; `pinned_version` hydrates only the winner; the publish-date filter moves slots without touching manifests; undecodable fragments degrade to "version absent" (matching the JS implementation's tolerance); serialization re-emits raw fragments verbatim. Picked manifests travel as `Arc<PackageVersion>`. Enables serde_json's `raw_value` feature (already a workspace dep).

**2. Sharded in-memory packument cache** (`4c28c6679e`). Every resolve edge consults the shared meta cache, and its single `Mutex<HashMap>` was the top mutex-wait site in the profile; it is now the same `DashMap` shape the crate's other shared maps use. Honest note: wall time was unchanged by this alone — the post-hydration profile shows the warm resolve is critical-path-bound, not lock-bound — but the contention disappears and the cache no longer serializes workers under load.

**3. Indexed on-disk metadata mirror** (`126a416ae8`, maintainer-approved cache-format break). The two-line NDJSON mirror (headers + verbatim body) becomes:

```
pacquet-meta-v1 <headers_len> <index_len>\n
<headers JSON>     etag, modified
<index JSON>       name, dist-tags, time, homepage, versions: [version, offset, len]
<fragments>        concatenated raw per-version JSON
```

The loader reads the file once and hands `PackageVersions` byte spans into that buffer — no structural re-scan, hydration parses a slice in place. The writer streams the registry's own bytes (fragments borrow from the lazily-parsed response body), so the **cold-install cost stays one temp-file + rename per package**. Old-format files read as cache misses and are rewritten on the next 200. A span-per-fragment `pread` variant was tried first and measured *worse* (sys 0.4 s → 4.5 s; the pick paths probe many candidate fragments per package), hence the single-buffer design.

## Measurements (warm babylon `--lockfile-only` resolve, 10-core M-series)

| build | wall | notes |
|---|---|---|
| before this PR | ~9.1 s | malloc/free + `deserialize_any` dominate the profile |
| + lazy hydration | ~7.7–8 s | parse microbench 3–3.8× (`typescript` 36.1→9.5 ms) |
| + indexed mirror | **~5.5–6.7 s** | sys 0.4 s; no whole-body scan |

Cold resolves keep the same hydration savings inside the parse tasks; cold write volume and syscall pattern are unchanged.

## Evaluated and deliberately not included

Consolidating the install-phase thread pools (tokio + global rayon + the dedicated CAS-write pool + capped blocking threads show ~100k involuntary context switches on cold installs vs ~750 for the pnpm CLI). After the resolution fixes, repeated container A/Bs show no measurable wall-clock cost from the churn — it hides entirely behind network time — and history warns that speculative concurrency reshuffles here regress badly (see the #11903 prefetch revert). Deferred until a benchmark shows it on the critical path.

## Tests

New `package_versions` unit tests (hydrate-on-demand + Arc-identity caching, undecodable-fragment-as-absent, verbatim raw round-trip incl. unknown keys, eager construction, hydration-free filtering) and rewritten `mirror` tests (headers/index/fragment round-trip, span hydration, truncation → cache miss, old-format → cache miss, atomic overwrite). Full suites green: `pacquet-registry` (23), `pacquet-resolving-npm-resolver` (235), `pacquet-package-manager` + `pacquet-cli` (768); workspace clippy `-D warnings` (pedantic set), dylint, fmt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants