Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.
This repository was archived by the owner on May 14, 2026. It is now read-only.

perf: port pnpm v11 HTTP/fetch topology and worker-pool sizing #280

Description

@zkochan

Follow-up to investigations/pacquet-macos-perf.md after #276, #277, and #279 landed. Fresh comparison against pnpm/pnpm v11's deps-restorer + network/fetch layer surfaced several concrete deltas where pnpm's current code is doing something deliberate that pacquet isn't. Collected here so we can pick them off as individual PRs.

Pacquet is now ~2× faster than pnpm on the Linux cold-install FrozenLockfile benchmark (2.9 s vs 6.4 s), but the macOS investigation doc originally flagged pacquet as ~2× slower than pnpm on macOS. Most items below should help narrow the macOS gap specifically — all of them are either network-topology choices pnpm has explicit benchmark evidence for, or CPU/IO cap tunings where pacquet's defaults diverge from upstream.

Findings

1. HTTP/2 is silently enabled in pacquet; pnpm deliberately disables it

Upstream: network/fetch/src/dispatcher.ts:17-22 (pnpm v11):

Note: we intentionally do NOT enable HTTP/2 (allowH2) or HTTP/1.1 pipelining here. With HTTP/2, undici multiplexes many streams over 1-2 TCP connections sharing a single congestion window. In benchmarks this was slower than opening ~50 independent HTTP/1.1 connections that each get their own congestion window and can saturate bandwidth in parallel.

Pacquet: crates/network/src/lib.rs:50-56 builds a default reqwest::Client which negotiates HTTP/2 via ALPN whenever the registry advertises it (registry.npmjs.org does). We're getting the exact topology pnpm measured as slower.

Fix: .http1_only(true) on the Client::builder.

2. Concurrent connection cap is too low vs pnpm

Upstream: network/fetch/src/dispatcher.ts:12, 23-24:

const DEFAULT_MAX_SOCKETS = 50
setGlobalDispatcher(new Agent({ connections: DEFAULT_MAX_SOCKETS, ... }))

Pacquet: crates/network/src/lib.rs:66-68:

const MIN_PERMITS: usize = 16;
let semaphore = num_cpus::get().max(MIN_PERMITS).pipe(Semaphore::new);

On a 4-core GHA runner pacquet has 1/3 of pnpm's concurrent-fetch budget; on a 10-core M3 still 1/5. Cold installs are network-bound, so under-subscription directly stretches wall time.

Fix: raise the floor to match pnpm's DEFAULT_MAX_SOCKETS = 50. Keep a small num_cpus influence as a ceiling if we want to stay gentle on very-small machines, but the common case should sit at 50.

3. Tarball buffer is grown via doubling instead of pre-allocated from Content-Length

Upstream: fetching/tarball-fetcher/src/remoteTarballFetcher.ts:148-164:

if (size !== null) {
  // Known size: pre-allocate and copy directly (avoids intermediate array + second copy pass)
  data = Buffer.from(new SharedArrayBuffer(size))
  for await (const chunk of res.body!) {
    data.set(c, downloaded)
    downloaded += c.byteLength
  }
}

Pnpm v11 CHANGELOG note: "Tarball downloads with known size now pre-allocate memory to avoid double-copy overhead."

Pacquet: crates/tarball/src/lib.rs:509 does response_head.bytes().await. reqwest/hyper internally grows a BytesMut by doubling when Content-Length isn't used to pre-size — multiple reallocs + copies per tarball × 1352 tarballs.

Fix: switch to bytes_stream(), check Content-Length on the response head, pre-allocate a BytesMut::with_capacity(len) when known, and copy chunks in sequentially. Also catches size-mismatch errors (pnpm's BadTarballError) that pacquet currently doesn't catch. Note: #278 tried async-compression streaming here and reverted because the forced flate2/miniz_oxide backend was slower than zune-inflate. This change is orthogonal — the decompressor still runs synchronously inside spawn_blocking on the buffered bytes.

4. Post-download concurrency cap is too high for Apple Silicon

Upstream: worker/src/index.ts:71:

return Math.max(1, availableParallelism() - 1)

Pacquet: crates/tarball/src/lib.rs:38:

SEM.get_or_init(|| Semaphore::new(num_cpus::get().saturating_mul(2).max(4)))

On a 10P-core M3 that's 20 concurrent post-download bodies; pnpm runs 9. Each body is CPU-bound (SHA-512 over compressed tarball + gzip inflate + per-file SHA-512) with interleaved FS writes. Over-subscribing on macOS costs more than on Linux — context switches are slower, and P+E core mixing means some tasks land on efficiency cores and stretch the tail.

Current value was chosen to keep a 2-CPU GHA runner from wedging mid-decompress (#269). num_cpus.saturating_sub(1).max(2) matches pnpm and still clears that floor.

Fix: change the formula, measure on Apple Silicon before/after, confirm the 2-CPU floor still holds.

5. (Lower-confidence) Software SHA-512 on Apple Silicon

Pnpm's SHA-512 goes through Node's crypto.hash → OpenSSL → ARMv8 FEAT_SHA512 hardware instructions.

Pacquet uses sha2 = "0.10.9" with no features. The asm feature pulls in sha2-asm, which historically targeted x86/x86_64 only; aarch64 SHA-512 hardware support in sha2 0.10 is inconsistent.

Fix: either (a) enable the asm feature and verify it activates hardware SHA-512 on aarch64, or (b) swap the per-file / per-tarball hashing to the ring crate, which is BoringSSL-derived and definitively exposes ARMv8 FEAT_SHA512. Blocked on a macOS profile run confirming SHA-512 is actually in the hot path; if it isn't, skip.

Sequencing

Items 1–3 are small, directly translate pnpm's own benchmark-driven decisions, and compose cleanly as a single PR. Item 4 is a one-line change but should be measured before/after on Apple Silicon — the current value was chosen for a Linux CI failure, not a perf decision. Item 5 should wait for a macOS profile to confirm SHA-512 is worth the crate-swap work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceTasks that improve the overall performance of the project

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions