Skip to content

perf(tarball): parallelize per-file CAS writes within a tarball#12247

Merged
zkochan merged 1 commit into
mainfrom
perf-parallel-tarball-extraction
Jun 6, 2026
Merged

perf(tarball): parallelize per-file CAS writes within a tarball#12247
zkochan merged 1 commit into
mainfrom
perf-parallel-tarball-extraction

Conversation

@zkochan

@zkochan zkochan commented Jun 6, 2026

Copy link
Copy Markdown
Member

Problem

extract_tarball_entries walked the tar in a single serial loop, hashing and writing each file into the content-addressed store one at a time, on the one spawn_blocking thread an extraction runs on. A package with many files (e.g. core-js, which unpacks to thousands) pinned a single core for its whole extraction while the rest of the machine sat idle — most visibly at the makespan tail, when one big package is the last extraction still running and every other core is free. This is why pnpm (which parallelizes extraction across workers) finishes big tarballs faster.

Fix

Split extraction into two phases:

  1. Serial pass — walk the seekable tar stream to validate + clean each regular-file path and capture a borrow of its payload (cheap: header parsing only, no hashing/IO).
  2. Parallel pass — hash and write each file into the CAS across the rayon pool.

StoreDir::write_cas_file is content-addressed and already documented as safe to call concurrently (its shard-creation cache is race-tolerant), so the output — CAS files, the {path → cafs path} map, and the PackageFilesIndex row — is byte-identical. Result order is preserved, so last-entry-wins for duplicate paths is unchanged. Small tarballs (< 32 files) stay serial to skip rayon's per-job dispatch cost.

Measured

Fresh install of a ~1300-package fixture (10-core machine):

extraction tail all extractions done
before core-js@3 @ ~10.7s ~10.7s
after core-js@3 parallelized away ~5.5s

The extraction tail roughly halved. Note: total install time on this fixture is dominated by the downstream hardlink/import phase, so this speeds up the extraction phase specifically rather than the whole install — but it's a clear win on machines/fixtures where extraction is a larger share (more cores, more big packages).

Tests

All 54 pacquet-tarball tests pass (extraction output is byte-identical); clippy clean.


Written by an agent (Claude Code, claude-opus-4-8).

Summary by CodeRabbit

  • Refactor
    • Optimized tarball extraction process to handle large package files more efficiently through improved batch processing.
    • Enhanced package.json manifest parsing and normalization during extraction for better reliability.

`extract_tarball_entries` walked the tar in a single serial loop, hashing
and writing each file into the CAS one at a time on the one `spawn_blocking`
thread the extraction runs on. A package with many files (e.g. `core-js`,
which unpacks to thousands) therefore pinned a single core for the whole
extraction while the rest of the machine sat idle — most visibly at the
makespan tail, when one big package is the last extraction still running and
every other core is free.

Split extraction into two phases: a serial pass that walks the seekable tar
stream to validate + clean each regular-file path and capture a borrow of its
payload, then a parallel pass that hashes and writes each file into the CAS
across the rayon pool. `StoreDir::write_cas_file` is content-addressed and
already documented as safe to call concurrently (its shard-creation cache is
race-tolerant), so the output — the CAS files, the `{path → cafs path}` map,
and the `PackageFilesIndex` row — is byte-identical; result order is
preserved so the last-entry-wins behavior for duplicate paths is unchanged.
Small tarballs (under 32 files) stay on the serial path to avoid rayon's
per-job dispatch cost when there's nothing to gain.

On a fresh install of a ~1300-package fixture this cut the extraction tail
roughly in half: the largest package (`core-js@3`) finished extracting at
~10.7s before and ~5.5s after, and all extractions completed by ~5.5s instead
of ~10.7s. (Total install time on that fixture is dominated by the downstream
hardlink/import phase, so this speeds up extraction specifically rather than
the whole install.)

---
Written by an agent (Claude Code, claude-opus-4-8).
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The refactoring defers CAS writes and per-entry indexing until after tar walking completes. Phase 1 stages entries and captures the bundled manifest; phase 2 batch-processes staged files using Rayon parallelism when beneficial; phase 3 reconstructs the output maps, preserving last-wins semantics for duplicates.

Changes

Tarball extraction staging and batch processing

Layer / File(s) Summary
Staging structures and write helper
pacquet/crates/tarball/src/lib.rs
New PendingFile struct holds validated path, payload slice, and metadata; write_cas_entry helper writes one payload to CAS and produces CafsFileInfo with checked_at timestamp.
Phase 1: Tar walking with staging and manifest capture
pacquet/crates/tarball/src/lib.rs
Setup introduces pending staging buffer and separate manifest variable; in-loop CAS write removed; package.json parsed as JSON, normalized via normalize_bundled_manifest, and regular files staged as PendingFile entries with payload slices.
Phase 2 & 3: Batch extraction and index assembly
pacquet/crates/tarball/src/lib.rs
Phase 2 collects write_cas_entry results using Rayon par_iter when pending.len() ≥ 32 else serial iter; phase 3 reconstructs cas_paths and PackageFilesIndex.files from results with last-wins semantics and duplicate warnings, then constructs final PackageFilesIndex with captured manifest.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pnpm/pnpm#12131: Prior refactor of extract_tarball_entries that modified tar entry payload handling and CAS write behavior; this PR builds on the same extraction-path refactoring.

Poem

🐰 Three phases dance with tarball grace,
Stage the files, then write with pace,
Rayon speeds the work along,
Bundle manifests both right and strong,
Pack it tight, the CAS is blessed!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(tarball): parallelize per-file CAS writes within a tarball' directly and specifically describes the main change: parallelizing CAS writes for tarball extraction.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf-parallel-tarball-extraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.01      7.8±0.27ms   555.4 KB/sec    1.00      7.7±0.32ms   559.9 KB/sec

@zkochan zkochan marked this pull request as ready for review June 6, 2026 17:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pacquet/crates/tarball/src/lib.rs (1)

673-680: ⚡ Quick win

Add a regression test that forces the parallel branch with duplicate paths.

This refactor’s last-wins guarantee now depends on the batched write path preserving pending order through the Rayon collection. A >= 32 entry fixture with a duplicate filename would lock that contract down.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pacquet/crates/tarball/src/lib.rs` around lines 673 - 680, Add a regression
test that exercises the parallel branch (use at least PARALLEL_EXTRACT_THRESHOLD
entries, i.e. 32) and includes duplicate filenames in the pending list to assert
the "last-wins" outcome; construct a pending Vec with ordered entries where the
later duplicate should overwrite the earlier one, call the code path that
invokes write_cas_entry (so the conditional using PARALLEL_EXTRACT_THRESHOLD
triggers the par_iter branch), then assert the resulting written
collection/store reflects the last entry for the duplicate path (compare file
content or CafsFileInfo) to lock down the ordering contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pacquet/crates/tarball/src/lib.rs`:
- Around line 673-680: Add a regression test that exercises the parallel branch
(use at least PARALLEL_EXTRACT_THRESHOLD entries, i.e. 32) and includes
duplicate filenames in the pending list to assert the "last-wins" outcome;
construct a pending Vec with ordered entries where the later duplicate should
overwrite the earlier one, call the code path that invokes write_cas_entry (so
the conditional using PARALLEL_EXTRACT_THRESHOLD triggers the par_iter branch),
then assert the resulting written collection/store reflects the last entry for
the duplicate path (compare file content or CafsFileInfo) to lock down the
ordering contract.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fe7fd72f-ad4d-4793-ade1-20b06bc6ed81

📥 Commits

Reviewing files that changed from the base of the PR and between c199198 and 15f2686.

📒 Files selected for processing (1)
  • pacquet/crates/tarball/src/lib.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run benchmark on ubuntu-latest
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs

📄 CodeRabbit inference engine (pacquet/AGENTS.md)

pacquet/**/*.rs: Log emissions are part of matching pnpm — when porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(...), or streamParser.write(...), mirror the call site, payload, and ordering so @pnpm/cli.default-reporter parses pacquet's NDJSON the same way
Declare a newtype wrapper for branded string types instead of collapsing the brand into a plain String or &str in Rust
If upstream TypeScript always validates before construction of a branded string, validate in the Rust wrapper too via TryFrom<String> and/or FromStr and do not provide an infallible public constructor
If upstream TypeScript never validates a branded string, just brand for type-safety in Rust by exposing an infallible From<String> constructor
If upstream TypeScript occasionally constructs a branded string without validation, expose from_str_unchecked in Rust as an escape hatch alongside the validating constructor
Match upstream serde behavior for branded strings crossing JSON, YAML, or INI boundaries by using #[serde(try_from = "String")] for deserialization and #[serde(into = "String")] for serialization
Derive simple conversions for branded strings using #[derive(derive_more::From)] and #[derive(derive_more::Into)] instead of handwriting impl blocks; use manual impl only when conversion needs custom logic
Model TypeScript string literal unions (like 'auto' | 'always' | 'never') as Rust enums instead of newtype wrappers, since the set of valid values is closed
Treat TypeScript string template literal types (like `${string}@${string}`) the same as branded string types in Rust, using a newtype wrapper with validation
Follow the code style guide in CODE_STYLE_GUIDE.md — imports, modules, naming, ownership and borrowing, parameter type selection, trait bounds, pattern matching, pipe-trait, error handling, test layout, and cloning of Arc and Rc
Choose owned vs. borrowed parameters to minimize copies; widen to t...

Files:

  • pacquet/crates/tarball/src/lib.rs
🧠 Learnings (10)
📓 Common learnings
Learnt from: zkochan
Repo: pnpm/pnpm PR: 12181
File: worker/src/start.ts:504-520
Timestamp: 2026-06-04T06:04:05.107Z
Learning: In pnpm/pnpm's pnpr install accelerator, the `/v1/install` response has a two-level framing structure:
1. **Outer layer** (full HTTP body): `[u32 outer header length][outer header JSON][files payload]` — `fetchFromPnpmRegistry` (pnpr/client/src/fetchFromPnpmRegistry.ts) strips the outer layer with `body.subarray(4 + headerLength)` and passes the remaining bytes to `writeCafsFiles`.
2. **Inner layer** (files payload): the files payload itself starts with its own `[u32 inner json length][inner header JSON]` prefix (built by the server's `build_files_payload` / `empty_files_payload_prefix`), followed by `[64-byte digest][u32 size][1-byte exec][content]` frames and a 64-zero-byte end marker.

`writeCafsFiles` in `worker/src/start.ts` is correct to read `jsonLen = payload.readUInt32BE(0)` and start frames at `offset = 4 + jsonLen` — this skips the inner header. The same two-level structure is mirrored in the Rust reference client (`parse_inline_response` + `write_files_payload`). Do not fla...
Learnt from: CR
Repo: pnpm/pnpm PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-25T12:36:42.202Z
Learning: User-visible changes (CLI flags, defaults, environment variables, lockfile/manifest/state-file formats, error codes/messages, log emissions, store layout, hook semantics) in pnpm must be mirrored to pacquet in the same PR
📚 Learning: 2026-05-29T18:03:24.797Z
Learnt from: CR
Repo: pnpm/pnpm PR: 0
File: pnpr/AGENTS.md:0-0
Timestamp: 2026-05-29T18:03:24.797Z
Learning: Prefer existing pacquet-* crates over writing new code; check pacquet-tarball, pacquet-crypto-hash, pacquet-crypto-shasums-file, pacquet-package-manifest, pacquet-network, pacquet-registry, pacquet-fs, and pacquet-diagnostics before implementing non-trivial functionality

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-06-04T20:24:32.096Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 12198
File: pnpr/crates/pnpr/src/storage.rs:469-477
Timestamp: 2026-06-04T20:24:32.096Z
Learning: In `pnpr/crates/pnpr/src/storage.rs` (pnpm/pnpm repo, Rust), `Store::list_package_names` intentionally uses `fs::try_exists(...).await.unwrap_or(false)` and `if let Ok(mut inner) = fs::read_dir(...)` — NOT `?`-propagation — for per-entry checks. This is deliberate best-effort / verdaccio-style search behavior: (1) `try_exists(stray_file/package.json)` returns `ENOTDIR` (not `NotFound`) for a stray non-package file in the store root, so `?` would fail the entire search; (2) the `@`-scope `read_dir` would fail on a non-directory `@`-named entry; (3) switching to `DirEntry::file_type()` would stop following symlinked package dirs. Failures that DO propagate are preserved: opening the store root itself, and `next_entry()` during the walk. Do not suggest blanket `?`-propagation for these per-entry checks.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-06-04T06:04:05.107Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 12181
File: worker/src/start.ts:504-520
Timestamp: 2026-06-04T06:04:05.107Z
Learning: In pnpm/pnpm's pnpr install accelerator, the `/v1/install` response has a two-level framing structure:
1. **Outer layer** (full HTTP body): `[u32 outer header length][outer header JSON][files payload]` — `fetchFromPnpmRegistry` (pnpr/client/src/fetchFromPnpmRegistry.ts) strips the outer layer with `body.subarray(4 + headerLength)` and passes the remaining bytes to `writeCafsFiles`.
2. **Inner layer** (files payload): the files payload itself starts with its own `[u32 inner json length][inner header JSON]` prefix (built by the server's `build_files_payload` / `empty_files_payload_prefix`), followed by `[64-byte digest][u32 size][1-byte exec][content]` frames and a 64-zero-byte end marker.

`writeCafsFiles` in `worker/src/start.ts` is correct to read `jsonLen = payload.readUInt32BE(0)` and start frames at `offset = 4 + jsonLen` — this skips the inner header. The same two-level structure is mirrored in the Rust reference client (`parse_inline_response` + `write_files_payload`). Do not fla...

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-25T14:58:11.105Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11931
File: pacquet/crates/resolving-npm-resolver/src/create_npm_resolution_verifier.rs:560-589
Timestamp: 2026-05-25T14:58:11.105Z
Learning: In `pacquet/crates/resolving-npm-resolver/src/create_npm_resolution_verifier.rs`, all per-`(registry, name[, version])` caches in `NpmResolutionVerifier` (`published_at`, `full_meta`, `full_meta_for_trust`, `abbreviated_meta`, `local_meta`) intentionally use the same pattern: lock → miss-check → release lock → await fetch/load → re-acquire lock → insert. This uniform pattern is deliberate; do not flag individual caches for using it. The known follow-up improvement (replacing the pattern with `tokio::sync::OnceCell` per key inside a `Mutex<HashMap<…>>`) is tracked as a future structural change to cover all five caches simultaneously.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-23T09:14:43.635Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11867
File: pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs:726-730
Timestamp: 2026-05-23T09:14:43.635Z
Learning: In `pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs`, the fresh-lockfile path intentionally does not invoke `BuildModules` and discards `side_effects_maps_by_snapshot` from `CreateVirtualStoreOutput`. This is pre-existing, documented behavior (mirroring upstream `link.ts:167-170`): `importing_done` fires once extraction and symlink linking are complete, and the fresh-lockfile path does not run lifecycle scripts. The frozen-lockfile path wires `BuildModules` end-to-end as normal. Do not flag this omission as a bug; wiring lifecycle scripts into the fresh-lockfile path is tracked as future work separate from perf refactors.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-20T21:18:55.266Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11778
File: pacquet/crates/resolving-local-resolver/src/parse_bare_specifier.rs:253-278
Timestamp: 2026-05-20T21:18:55.266Z
Learning: In `pacquet/crates/resolving-local-resolver/src/parse_bare_specifier.rs`, the `resolve_path` function intentionally short-circuits absolute specifiers verbatim (returns them unchanged without normalizing `..` components), mirroring the upstream TypeScript `resolvePath` in `resolving/local-resolver/src/parseBareSpecifier.ts` at ef87f3ccff. The OS resolves `..` at `fs.read` time. Do not suggest normalizing the absolute branch — it would invent behavior pnpm doesn't have, violating the pacquet AGENTS.md cardinal rule of fidelity to upstream.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-20T19:40:55.051Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11774
File: pacquet/crates/resolving-deps-resolver/src/resolve_peers.rs:0-0
Timestamp: 2026-05-20T19:40:55.051Z
Learning: In the pacquet Rust code, ensure the semver implementation uses the `node-semver` crate (not `nodejs-semver`). `node-semver`’s public API does not include a `satisfies_with_prerelease`-style method; prerelease-tolerant matching should be implemented inline by first calling `Range::satisfies`, and when it rejects a prerelease version, retry matching against a stripped `MAJOR.MINOR.PATCH` base of the prerelease version.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-22T00:08:44.646Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11837
File: pacquet/crates/resolving-npm-resolver/src/pick_package.rs:33-51
Timestamp: 2026-05-22T00:08:44.646Z
Learning: In the pnpm/pnpm repo’s pacquet Rust crates, do not flag Unicode ellipsis characters (U+2026, `…`) in Rust doc comments (`///` / `/** */`) as a lint violation. The pacquet crate’s `dylint.toml` only enables `perfectionist::derive_ordering`, and the Dylint `unicode-ellipsis` rule is not enabled for this project—so `…` in doc comments is an intentional, repo-consistent style.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
📚 Learning: 2026-05-20T23:07:58.444Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11784
File: pacquet/crates/resolving-deps-resolver/src/hoist_peers.rs:120-133
Timestamp: 2026-05-20T23:07:58.444Z
Learning: When reviewing code in this pacquet Rust port, follow the upstream pnpm compatibility rule: only match pnpm’s behavior exactly. Do not propose review changes that intentionally deviate from pnpm’s documented/observed behavior, even if pnpm appears buggy. If you identify a real bug in pnpm behavior, the review should prioritize fixing it upstream in pnpm first, and avoid implementing a pnpm-behavior workaround here unless the same fix has already landed upstream.

Applied to files:

  • pacquet/crates/tarball/src/lib.rs
🔇 Additional comments (1)
pacquet/crates/tarball/src/lib.rs (1)

438-474: LGTM!

Also applies to: 524-662, 682-703

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Integrated-Benchmark Report (Linux)

Each scenario has pacquet rows (direct install) and pnpr rows (the same client through the pnpr install accelerator), so pnpr@HEAD vs pacquet@HEAD is the pnpr-vs-direct ratio. Cold-store scenarios wipe the client store between runs (warm server); hot-store scenarios keep it warm. The pacquet@HEAD rows feed the pacquet Bencher testbed; the pnpr@HEAD rows feed the pnpr testbed.

Scenario: Isolated linker: fresh restore, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 10.153 ± 0.102 10.032 10.331 2.01 ± 0.03
pacquet@main 10.022 ± 0.124 9.906 10.223 1.99 ± 0.03
pnpr@HEAD 5.163 ± 0.084 5.099 5.348 1.02 ± 0.02
pnpr@main 5.049 ± 0.053 5.001 5.172 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 10.15305668296,
      "stddev": 0.10182340400711702,
      "median": 10.11216729186,
      "user": 3.6059113799999993,
      "system": 4.395553719999999,
      "min": 10.03211567836,
      "max": 10.33094130036,
      "times": [
        10.19475652136,
        10.07486729836,
        10.32629012236,
        10.33094130036,
        10.14884015836,
        10.11019093836,
        10.09677898936,
        10.10164217736,
        10.11414364536,
        10.03211567836
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 10.02193943276,
      "stddev": 0.12395331725879401,
      "median": 9.978084820860001,
      "user": 3.36657918,
      "system": 4.267394819999999,
      "min": 9.90642244536,
      "max": 10.22277175536,
      "times": [
        9.92146659636,
        10.22277175536,
        9.95855224736,
        9.99836254936,
        9.90642244536,
        9.93528379936,
        9.92017655236,
        10.14848433336,
        9.99761739436,
        10.21025665436
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 5.1626692061599995,
      "stddev": 0.08371392044020967,
      "median": 5.12696918386,
      "user": 2.73484238,
      "system": 4.042505220000001,
      "min": 5.09856722536,
      "max": 5.3479248433599995,
      "times": [
        5.13400173036,
        5.28622033836,
        5.15013158036,
        5.3479248433599995,
        5.11883498236,
        5.09856722536,
        5.11455629636,
        5.1259512663599995,
        5.12798710136,
        5.12251669736
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 5.04852455836,
      "stddev": 0.05253407501085945,
      "median": 5.03217273486,
      "user": 2.4778984799999995,
      "system": 3.8919188199999994,
      "min": 5.00134041536,
      "max": 5.17182077936,
      "times": [
        5.03149747536,
        5.00792122836,
        5.00615765436,
        5.01710413236,
        5.03284799436,
        5.05052028636,
        5.08849977636,
        5.17182077936,
        5.07753584136,
        5.00134041536
      ]
    }
  ]
}

Scenario: Isolated linker: fresh restore, hot cache + hot store

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 672.7 ± 16.0 649.3 692.7 1.00
pacquet@main 679.0 ± 29.6 649.7 737.5 1.01 ± 0.05
pnpr@HEAD 803.5 ± 76.6 745.8 999.6 1.19 ± 0.12
pnpr@main 786.5 ± 70.3 742.2 920.1 1.17 ± 0.11
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.6727161420200002,
      "stddev": 0.01600701583256559,
      "median": 0.6713466896200001,
      "user": 0.38210624000000004,
      "system": 1.31715764,
      "min": 0.6493276551200001,
      "max": 0.6927292181200001,
      "times": [
        0.6896765321200001,
        0.6927292181200001,
        0.6781179801200001,
        0.68320303412,
        0.66361954712,
        0.68976473512,
        0.6645753991200001,
        0.6493276551200001,
        0.6636945151200001,
        0.6524528041200001
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.6790251945200001,
      "stddev": 0.029567044382201157,
      "median": 0.66924675062,
      "user": 0.37753994,
      "system": 1.31800454,
      "min": 0.6497485161200001,
      "max": 0.7374711111200001,
      "times": [
        0.7374711111200001,
        0.6497485161200001,
        0.7266614651200001,
        0.6787752181200001,
        0.6704182911200001,
        0.6541672781200001,
        0.6630080061200001,
        0.67893630512,
        0.66807521012,
        0.66299054412
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.8034518379200002,
      "stddev": 0.07656192707562987,
      "median": 0.7692472856200001,
      "user": 0.39216274,
      "system": 1.3236017399999997,
      "min": 0.7457819901200001,
      "max": 0.9996492901200001,
      "times": [
        0.8541020881200001,
        0.7617983531200001,
        0.7659824901200001,
        0.9996492901200001,
        0.8224780311200001,
        0.76437665212,
        0.7725120811200001,
        0.7457819901200001,
        0.7918260581200001,
        0.7560113451200001
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.78653043812,
      "stddev": 0.07034894497147766,
      "median": 0.7551890911200001,
      "user": 0.37835064000000007,
      "system": 1.32344304,
      "min": 0.74216280312,
      "max": 0.9201077241200001,
      "times": [
        0.9201077241200001,
        0.7512044571200001,
        0.7676084561200001,
        0.7460703381200001,
        0.9184218371200001,
        0.7605075181200001,
        0.7499792891200001,
        0.74216280312,
        0.75917372512,
        0.75006823312
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 5.340 ± 0.034 5.293 5.396 2.66 ± 0.08
pacquet@main 5.314 ± 0.040 5.262 5.400 2.65 ± 0.08
pnpr@HEAD 2.018 ± 0.027 1.974 2.058 1.01 ± 0.03
pnpr@main 2.006 ± 0.056 1.957 2.122 1.00
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 5.3403243291,
      "stddev": 0.03418814181014108,
      "median": 5.3357790312999995,
      "user": 3.7151861399999992,
      "system": 3.3741702600000005,
      "min": 5.2926780143,
      "max": 5.3963232093,
      "times": [
        5.3497050703,
        5.3054369653,
        5.3781563753,
        5.2926780143,
        5.3727343723,
        5.3301329333,
        5.3401711823,
        5.3065182883,
        5.3313868803,
        5.3963232093
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 5.314160601999999,
      "stddev": 0.040144265262532695,
      "median": 5.308580321300001,
      "user": 3.6935934399999995,
      "system": 3.29832976,
      "min": 5.2619899803,
      "max": 5.3998589333,
      "times": [
        5.2619899803,
        5.3118421733000005,
        5.3290790373,
        5.3324252763,
        5.3998589333,
        5.3053184693,
        5.3453233363,
        5.2746111173,
        5.2992647023,
        5.2818929943
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 2.0183562845000003,
      "stddev": 0.02740664470225432,
      "median": 2.0191614538,
      "user": 2.53259494,
      "system": 3.3599185599999997,
      "min": 1.9735928763000001,
      "max": 2.0579608283,
      "times": [
        2.0335341213,
        1.9735928763000001,
        2.0065524863,
        2.0450049773,
        2.0115940643,
        2.0010406473,
        2.0425031853,
        2.0579608283,
        1.9850508153000002,
        2.0267288433
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 2.0055636841,
      "stddev": 0.0559111107042792,
      "median": 1.9881875273,
      "user": 2.44694124,
      "system": 3.19423916,
      "min": 1.9567277953000002,
      "max": 2.1215397503,
      "times": [
        1.9567277953000002,
        2.1215397503,
        1.9672740953,
        2.0278262913000002,
        1.9598114903000001,
        1.9961370703,
        1.9802379843000002,
        1.9642380293000001,
        2.0815767653,
        2.0002675693
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, hot cache + hot store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 1.404 ± 0.023 1.352 1.432 2.11 ± 0.12
pacquet@main 1.397 ± 0.049 1.346 1.521 2.10 ± 0.14
pnpr@HEAD 0.664 ± 0.037 0.636 0.766 1.00
pnpr@main 0.667 ± 0.058 0.633 0.828 1.00 ± 0.10
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 1.40421160456,
      "stddev": 0.023036377926355215,
      "median": 1.40735798036,
      "user": 1.5628231599999998,
      "system": 1.7630129999999997,
      "min": 1.35198998636,
      "max": 1.43166520336,
      "times": [
        1.43166520336,
        1.41897878736,
        1.4043142013599998,
        1.40172526436,
        1.40300474336,
        1.41040175936,
        1.4106411723599999,
        1.42693961736,
        1.38245531036,
        1.35198998636
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 1.3972593978599999,
      "stddev": 0.04890659554488921,
      "median": 1.39307946186,
      "user": 1.5455735599999998,
      "system": 1.7752575,
      "min": 1.34595938736,
      "max": 1.52126646336,
      "times": [
        1.35852936036,
        1.36045741736,
        1.52126646336,
        1.34595938736,
        1.37987386836,
        1.40846910136,
        1.40604421236,
        1.40583524436,
        1.38811584836,
        1.39804307536
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.6641498885600001,
      "stddev": 0.037137718026627624,
      "median": 0.6513343608600001,
      "user": 0.32664326,
      "system": 1.2511532,
      "min": 0.6360337823600001,
      "max": 0.7661683243600002,
      "times": [
        0.67004396736,
        0.6665771433600001,
        0.6502201043600001,
        0.6360337823600001,
        0.7661683243600002,
        0.6481242333600001,
        0.6510423363600001,
        0.6463756023600001,
        0.6516263853600001,
        0.6552870063600001
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.6670260052600001,
      "stddev": 0.05815157499711235,
      "median": 0.64638563386,
      "user": 0.32426736,
      "system": 1.2486983999999999,
      "min": 0.6329316713600001,
      "max": 0.8283806463600001,
      "times": [
        0.6482891353600001,
        0.6329316713600001,
        0.6424540753600001,
        0.6396802453600001,
        0.6444821323600001,
        0.65022772636,
        0.8283806463600001,
        0.67750801936,
        0.6643083833600001,
        0.6419980173600001
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, cold cache + hot store

Resolution-only: cold packument cache (full re-resolve over the registry link) with a hot store (no tarball download), so this isolates pnpr offloading the client resolution to its warm server.

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 4.940 ± 0.021 4.915 4.994 7.48 ± 0.18
pacquet@main 4.983 ± 0.040 4.934 5.045 7.54 ± 0.19
pnpr@HEAD 0.660 ± 0.016 0.638 0.696 1.00
pnpr@main 0.667 ± 0.041 0.631 0.761 1.01 ± 0.07
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 4.9404086301,
      "stddev": 0.021164435910854266,
      "median": 4.9369614887,
      "user": 1.6711996,
      "system": 1.8634854399999998,
      "min": 4.9146140527,
      "max": 4.9943750867,
      "times": [
        4.9407150937,
        4.9358562297,
        4.9220174177,
        4.9380667477,
        4.9462401447,
        4.9417158127,
        4.9351079657,
        4.9943750867,
        4.9353777497,
        4.9146140527
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 4.982855574099999,
      "stddev": 0.0396661248571493,
      "median": 4.9693784802,
      "user": 1.7421063,
      "system": 1.89191494,
      "min": 4.9342591927,
      "max": 5.0451760377,
      "times": [
        4.9512037437,
        4.9478029107,
        4.9820144177,
        5.0041191487,
        4.9567425427,
        5.0314896177,
        4.9342591927,
        4.9552071177,
        5.0205410117,
        5.0451760377
      ]
    },
    {
      "command": "pnpr@HEAD",
      "mean": 0.6604425195,
      "stddev": 0.01576705755338339,
      "median": 0.6605498462000001,
      "user": 0.3320259,
      "system": 1.2606485399999998,
      "min": 0.6378699937,
      "max": 0.6960283617,
      "times": [
        0.6378699937,
        0.6463218087,
        0.6593177907000001,
        0.6711741007,
        0.6639390847000001,
        0.6617819017000001,
        0.6515148397,
        0.6536415857,
        0.6960283617,
        0.6628357277
      ]
    },
    {
      "command": "pnpr@main",
      "mean": 0.6665280901999999,
      "stddev": 0.04129406951800342,
      "median": 0.6549629047000001,
      "user": 0.3220881,
      "system": 1.2585762399999998,
      "min": 0.6312259277000001,
      "max": 0.7609289407000001,
      "times": [
        0.6398787587,
        0.6536712627,
        0.6383717657,
        0.6381312247,
        0.6312259277000001,
        0.7609289407000001,
        0.7174773717,
        0.6664194087,
        0.6629216947000001,
        0.6562545467
      ]
    }
  ]
}

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchpr/12247
Testbedpacquet

🚨 2 Alerts

BenchmarkMeasure
Units
ViewBenchmark Result
(Result Δ%)
Upper Boundary
(Limit %)
isolated-linker.fresh-install.cold-cache.cold-storeLatency
seconds (s)
📈 plot
🚷 threshold
🚨 alert (🔔)
5.34 s
(+132.25%)Baseline: 2.30 s
2.76 s
(193.54%)

isolated-linker.fresh-restore.cold-cache.cold-storeLatency
seconds (s)
📈 plot
🚷 threshold
🚨 alert (🔔)
10.15 s
(+126.60%)Baseline: 4.48 s
5.38 s
(188.83%)

Click to view all benchmark results
BenchmarkLatencyBenchmark Result
milliseconds (ms)
(Result Δ%)
Upper Boundary
milliseconds (ms)
(Limit %)
isolated-linker.fresh-install.cold-cache.cold-store📈 view plot
🚷 view threshold
🚨 view alert (🔔)
5,340.32 ms
(+132.25%)Baseline: 2,299.43 ms
2,759.32 ms
(193.54%)

isolated-linker.fresh-install.cold-cache.hot-store📈 view plot
🚷 view threshold
4,940.41 ms
isolated-linker.fresh-install.hot-cache.hot-store📈 view plot
🚷 view threshold
1,404.21 ms
(+5.63%)Baseline: 1,329.33 ms
1,595.19 ms
(88.03%)
isolated-linker.fresh-restore.cold-cache.cold-store📈 view plot
🚷 view threshold
🚨 view alert (🔔)
10,153.06 ms
(+126.60%)Baseline: 4,480.61 ms
5,376.73 ms
(188.83%)

isolated-linker.fresh-restore.hot-cache.hot-store📈 view plot
🚷 view threshold
672.72 ms
(-1.08%)Baseline: 680.04 ms
816.05 ms
(82.44%)
🐰 View full continuous benchmarking report in Bencher

@zkochan zkochan merged commit bea64b2 into main Jun 6, 2026
27 of 28 checks passed
@zkochan zkochan deleted the perf-parallel-tarball-extraction branch June 6, 2026 17:55
zkochan added a commit that referenced this pull request Jun 6, 2026
)

The per-file CAS-write parallelism added in #12247 ran on rayon's global
pool. But the install pipeline overlaps tarball extraction with linking each
resolved package into `node_modules`, and the linker drives its per-package
work through `rayon::join` / `par_iter` on that same global pool. When a
batch of downloads finished at once (hundreds of tarballs entering extraction
together), the extraction work queued ahead of the linker's jobs and stalled
linking for seconds.

Aligning the download/extract trace with the `imported` progress events on a
~1300-package fresh install showed the linker dropping to zero completions
for ~1s right as an extraction surge landed, then grinding the rest out
afterward — extraction had gotten faster, but it stuttered the concurrent
linker, so the net win on the pipeline was lost.

Route the parallel CAS writes through a dedicated rayon pool (sized to the
core count; the work is CPU-bound SHA-512 + CAFS write) so an extraction
burst can't monopolize the global pool the linker uses. The two phases now
run concurrently without one starving the other: on the same fixture the
linker no longer stalls (continuous completions through the extraction
window) and the big-package extraction tail stays parallelized. Falls back
to the global pool if the dedicated pool can't be built.

---
Written by an agent (Claude Code, claude-opus-4-8).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant