You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 14, 2026. It is now read-only.
On a 1352-snapshot frozen-lockfile install, pacquet is ~12× slower on macOS than on Linux CI even though the same machine runs pnpm ~3× faster than a GH runner does. The gap isn't CPU work; it's a storm of per-tarball StoreIndex::open calls on the write side of the index, paired with an unbounded tokio blocking pool.
Same root shape as #260 (which was the read side, fixed for reads by #261). The write path kept the old pattern.
Data
Integrated benchmark, frozen-lockfile, 1352-snapshot fixture (the one introduced by #262), same harness as CI:
CI (Ubuntu)
Local (M1 Pro, APFS)
pacquet@HEAD wall
1.095 ± 0.009 s
13.235 ± 0.979 s
pacquet@HEAD user
0.40 s
0.62 s
pacquet@HEAD sys
2.59 s
20.87 s
pnpm wall
2.550 ± 0.026 s
0.930 ± 0.048 s
pnpm sys
2.25 s
0.33 s
User time is similar; sys time is ~8× higher on macOS. pnpm doing the same install on the same disk has 0.33 s sys, so the filesystem is fine — this is pacquet-specific.
Forcing each package-import-method doesn't move the number, ruling out the CAS→node_modules linker as the culprit:
method
wall
user
sys
auto
12.40 s
0.61
18.05
hardlink
13.03 s
0.63
18.28
copy
14.70 s
0.65
19.69
clone
12.27 s
0.61
20.42
ps -M shows pacquet holds 534 threads from t+0.8 s through the whole install, most parked in kevent. That's tokio's blocking pool at its default cap of 512 + workers.
A 5 s sample confirms every tokio-rt-worker stack bottoms out in kevent — threads are spawned, do some work, and park.
Root cause
crates/tarball/src/lib.rs:427 spawns a blocking task per tarball to record the per-package row in the store index:
For 1352 tarballs that's 1352 × StoreIndex::open (crates/store-dir/src/store_index.rs:100-130), where each call does:
std::fs::create_dir_all(store_dir)
Connection::open("…/index.db") (sqlite open: stat, open, fstat, pread, fcntl, mmap, plus WAL/SHM sidecar handling for the first writer)
execute_batch of 7 PRAGMAs (busy_timeout=5000, journal_mode=WAL, synchronous=NORMAL, mmap_size=…, cache_size=-32000, temp_store=MEMORY, wal_autocheckpoint=10000) + CREATE TABLE IF NOT EXISTS
Then store_index.set inserts one row. The actual inserts serialize on SQLite's busy_timeout (the callsite comment acknowledges this), so the per-open setup cost is paid concurrently but the inserts run mostly one at a time. The callsite also explicitly notes "One StoreIndex per spawned task keeps the code lock-free" — that's the pattern this issue is asking to replace.
Why Linux CI doesn't see it
ext4 open/fstat/fcntl cost a fraction of APFS's per-syscall.
Linux pthreads + epoll handle 500 threads cheaply; XNU mach ports + kqueue charge more.
SQLite open on ext4 ≈ 0.5 ms; on this APFS ≈ 5–15 ms. 1352 × (5–15 ms) ≈ 7–20 s — matches the observed delta almost exactly.
#261 added StoreIndex::shared_readonly_in for the cache-lookup pass (1352 reads → 1 open). The write path still opens a fresh writable connection per tarball.
Proposed fixes (ordered by expected impact)
Share a single writable StoreIndex across the install. Mirror perf(store-dir): share one read-only StoreIndex across cache lookups #261's pattern for writes: open once, wrap in Arc<Mutex<StoreIndex>> (or run a single writer task fed by an mpsc::channel), and thread it through download_and_extract_tarball. Collapses 1352 opens to 1. Biggest single win, smallest diff.
Batch inserts in a transaction. SQLite WAL commit fsyncs once per transaction. Wrapping the per-package inserts in a single BEGIN IMMEDIATE; … COMMIT; (or small batches) removes 1352 independent commit fsyncs — another hidden APFS amplifier.
Cap tokio's blocking pool.Runtime::max_blocking_threads(N) where N ≈ 2–4 × CPU count. 534 threads is pathological on macOS regardless of workload; this alone won't close the gap but it's cheap insurance.
(1) alone should get local macOS down to a comparable factor over CI (≈ 2–3×, matching the pnpm ratio). (2) and (3) are additive polish.
Repro
# bench env (uses the 1352-snapshot fixture from #262)
just integrated-benchmark --show-output --scenario=frozen-lockfile \
--verdaccio --with-pnpm HEAD main
Bench harness note: verify::ensure_git_repo (tasks/integrated-benchmark/src/verify.rs:17) asserts .git is a directory, which fails when running against a git worktree — pass -R <non-worktree-clone> or fix the harness to accept .git-file worktrees.
Fixture note: pnpm-workspace.yaml's allowBuilds list silences core-js/es5-ext but not fsevents, so pnpm exits 1 on macOS with ERR_PNPM_IGNORED_BUILDS: fsevents@1.2.13. Worth adding so local runs don't wedge the bench.
Summary
On a 1352-snapshot frozen-lockfile install, pacquet is ~12× slower on macOS than on Linux CI even though the same machine runs pnpm ~3× faster than a GH runner does. The gap isn't CPU work; it's a storm of per-tarball
StoreIndex::opencalls on the write side of the index, paired with an unbounded tokio blocking pool.Same root shape as #260 (which was the read side, fixed for reads by #261). The write path kept the old pattern.
Data
Integrated benchmark, frozen-lockfile, 1352-snapshot fixture (the one introduced by #262), same harness as CI:
pacquet@HEADwallpacquet@HEADuserpacquet@HEADsyspnpmwallpnpmsysUser time is similar; sys time is ~8× higher on macOS. pnpm doing the same install on the same disk has 0.33 s sys, so the filesystem is fine — this is pacquet-specific.
Forcing each
package-import-methoddoesn't move the number, ruling out the CAS→node_moduleslinker as the culprit:ps -Mshows pacquet holds 534 threads from t+0.8 s through the whole install, most parked inkevent. That's tokio's blocking pool at its default cap of 512 + workers.A 5 s
sampleconfirms every tokio-rt-worker stack bottoms out inkevent— threads are spawned, do some work, and park.Root cause
crates/tarball/src/lib.rs:427spawns a blocking task per tarball to record the per-package row in the store index:For 1352 tarballs that's 1352 ×
StoreIndex::open(crates/store-dir/src/store_index.rs:100-130), where each call does:std::fs::create_dir_all(store_dir)Connection::open("…/index.db")(sqlite open: stat, open, fstat, pread, fcntl, mmap, plus WAL/SHM sidecar handling for the first writer)execute_batchof 7 PRAGMAs (busy_timeout=5000,journal_mode=WAL,synchronous=NORMAL,mmap_size=…,cache_size=-32000,temp_store=MEMORY,wal_autocheckpoint=10000) +CREATE TABLE IF NOT EXISTSThen
store_index.setinserts one row. The actual inserts serialize on SQLite'sbusy_timeout(the callsite comment acknowledges this), so the per-open setup cost is paid concurrently but the inserts run mostly one at a time. The callsite also explicitly notes "One StoreIndex per spawned task keeps the code lock-free" — that's the pattern this issue is asking to replace.Why Linux CI doesn't see it
open/fstat/fcntlcost a fraction of APFS's per-syscall.Why #261 didn't help this path
#261 added
StoreIndex::shared_readonly_infor the cache-lookup pass (1352 reads → 1 open). The write path still opens a fresh writable connection per tarball.Proposed fixes (ordered by expected impact)
StoreIndexacross the install. Mirror perf(store-dir): share one read-only StoreIndex across cache lookups #261's pattern for writes: open once, wrap inArc<Mutex<StoreIndex>>(or run a single writer task fed by anmpsc::channel), and thread it throughdownload_and_extract_tarball. Collapses 1352 opens to 1. Biggest single win, smallest diff.BEGIN IMMEDIATE; … COMMIT;(or small batches) removes 1352 independent commit fsyncs — another hidden APFS amplifier.Runtime::max_blocking_threads(N)where N ≈ 2–4 × CPU count. 534 threads is pathological on macOS regardless of workload; this alone won't close the gap but it's cheap insurance.(1) alone should get local macOS down to a comparable factor over CI (≈ 2–3×, matching the pnpm ratio). (2) and (3) are additive polish.
Repro
# bench env (uses the 1352-snapshot fixture from #262) just integrated-benchmark --show-output --scenario=frozen-lockfile \ --verdaccio --with-pnpm HEAD mainBench harness note:
verify::ensure_git_repo(tasks/integrated-benchmark/src/verify.rs:17) asserts.gitis a directory, which fails when running against a git worktree — pass-R <non-worktree-clone>or fix the harness to accept.git-file worktrees.Fixture note:
pnpm-workspace.yaml'sallowBuildslist silencescore-js/es5-extbut notfsevents, so pnpm exits 1 on macOS withERR_PNPM_IGNORED_BUILDS: fsevents@1.2.13. Worth adding so local runs don't wedge the bench.