Summary
In the cold-cache, warm-store case (rm -rf node_modules && pacquet install --frozen-lockfile) pacquet spends noticeably more wall time than pnpm — on a 1352-package lockfile: pacquet 17.4s vs pnpm 11.6s, release build, zero network fetches on both sides.
Two per-snapshot costs in the cache-hit path explain most of the gap:
1. Fresh SQLite connection per snapshot (1352 connections)
load_cached_cas_paths at crates/tarball/src/lib.rs:111-158 calls StoreIndex::open_readonly_in(store_dir) unconditionally for every snapshot:
async fn load_cached_cas_paths(
store_dir: &'static StoreDir,
cache_key: String,
) -> Option<HashMap<String, PathBuf>> {
tokio::task::spawn_blocking(move || -> Option<HashMap<String, PathBuf>> {
let index = StoreIndex::open_readonly_in(store_dir).ok()?; // ← per snapshot
let entry = index.get(&cache_key).ok()??;
…
})
…
}
StoreIndex::open_readonly (crates/store-dir/src/store_index.rs:139-146) does Connection::open_with_flags + conn.busy_timeout(5s) on each call. With 1352 snapshots that's 1352 SQLite opens — no sharing, no pool, and because the work runs under spawn_blocking, the calls serialize on tokio's blocking pool (observed CPU utilization stuck at ~135% despite plenty of cores idle, vs ~509% on the warm rerun that short-circuits before this code).
Fix: open one read-only StoreIndex (or a small pool) and reuse it across all snapshots. SQLite read-only connections support concurrent reads, and the index is owned by Npmrc-like static state that already outlives every snapshot.
2. Per-file symlink_metadata on every CAS blob
Once the row is decoded, the same function walks every file in the entry and stats the CAS blob:
// crates/tarball/src/lib.rs:136-150
if !path.symlink_metadata().is_ok_and(|m| {
let file_type = m.file_type();
!file_type.is_symlink() && file_type.is_file()
}) {
tracing::debug!(…, "CAFS path missing or not a regular file; index entry is stale, re-fetching");
return None;
}
For 1352 packages at ~50 files each that's ~65k stat syscalls — on every install — just to pre-check existence before the hardlink pass would discover ENOENT anyway.
What pnpm v11 does instead
Two relevant paths in ~/src/pnpm/pnpm/v11:
Fast path — verifyStoreIntegrity=false (not the default, but illustrative). buildFileMapsFromIndex in store/cafs/src/checkPkgFilesIntegrity.ts:81-119 just reads the index and builds the filename → CAFS path map with zero stat calls.
Default path — verifyStoreIntegrity=true. checkFile in checkPkgFilesIntegrity.ts:39-65:
function checkFile (filename, checkedAt) {
try {
const { mtimeMs, size } = fs.statSync(filename)
return { isModified: (mtimeMs - (checkedAt ?? 0)) > 100, size }
} catch (err) {
if (err.code === 'ENOENT') return null
throw err
}
}
It stats once, and only re-runs the integrity hash if mtimeMs - checkedAt > 100ms:
If a file was not edited, we are skipping integrity check. We assume that nobody will manually remove a file in the store and create a new one.
Lazy ENOENT at hardlink time. fs/indexed-pkg-importer/src/index.ts linkOrCopy falls back on link failure rather than pre-checking every source:
function linkOrCopy (existingPath, newPath) {
try { fs.linkSync(existingPath, newPath) }
catch (err) { … resilientCopyFileSync(existingPath, newPath) }
}
Relevance to pacquet
pacquet already persists the checkedAt field in index.db rows (crates/store-dir/src/msgpackr_records.rs has CafsFileInfo::checked_at, with comments explicitly referencing pnpm's mtimeMs - (checkedAt ?? 0) semantics) — the data is there, the fast-path logic isn't.
Proposed fix
- Share a single read-only
StoreIndex across the cache-lookup pass (biggest win, no behavioural change).
- Replace the per-file
symlink_metadata pre-check with pnpm's model: either skip validation entirely and handle ENOENT lazily when hardlinking, or gate it behind a verify-store-integrity npmrc flag using the checked_at mtime comparison already wired into the row schema.
Measured on a 1352-package lockfile (release build, macOS APFS, warm CAS store) — see PR #258 conversation for the numbers.
Summary
In the cold-cache, warm-store case (
rm -rf node_modules && pacquet install --frozen-lockfile) pacquet spends noticeably more wall time than pnpm — on a 1352-package lockfile: pacquet 17.4s vs pnpm 11.6s, release build, zero network fetches on both sides.Two per-snapshot costs in the cache-hit path explain most of the gap:
1. Fresh SQLite connection per snapshot (1352 connections)
load_cached_cas_pathsatcrates/tarball/src/lib.rs:111-158callsStoreIndex::open_readonly_in(store_dir)unconditionally for every snapshot:StoreIndex::open_readonly(crates/store-dir/src/store_index.rs:139-146) doesConnection::open_with_flags+conn.busy_timeout(5s)on each call. With 1352 snapshots that's 1352 SQLite opens — no sharing, no pool, and because the work runs underspawn_blocking, the calls serialize on tokio's blocking pool (observed CPU utilization stuck at ~135% despite plenty of cores idle, vs ~509% on the warm rerun that short-circuits before this code).Fix: open one read-only
StoreIndex(or a small pool) and reuse it across all snapshots. SQLite read-only connections support concurrent reads, and the index is owned byNpmrc-like static state that already outlives every snapshot.2. Per-file
symlink_metadataon every CAS blobOnce the row is decoded, the same function walks every file in the entry and stats the CAS blob:
For 1352 packages at ~50 files each that's ~65k stat syscalls — on every install — just to pre-check existence before the hardlink pass would discover ENOENT anyway.
What pnpm v11 does instead
Two relevant paths in
~/src/pnpm/pnpm/v11:Fast path —
verifyStoreIntegrity=false(not the default, but illustrative).buildFileMapsFromIndexinstore/cafs/src/checkPkgFilesIntegrity.ts:81-119just reads the index and builds thefilename → CAFS pathmap with zero stat calls.Default path —
verifyStoreIntegrity=true.checkFileincheckPkgFilesIntegrity.ts:39-65:It stats once, and only re-runs the integrity hash if
mtimeMs - checkedAt > 100ms:Lazy ENOENT at hardlink time.
fs/indexed-pkg-importer/src/index.tslinkOrCopyfalls back on link failure rather than pre-checking every source:Relevance to pacquet
pacquet already persists the
checkedAtfield inindex.dbrows (crates/store-dir/src/msgpackr_records.rshasCafsFileInfo::checked_at, with comments explicitly referencing pnpm'smtimeMs - (checkedAt ?? 0)semantics) — the data is there, the fast-path logic isn't.Proposed fix
StoreIndexacross the cache-lookup pass (biggest win, no behavioural change).symlink_metadatapre-check with pnpm's model: either skip validation entirely and handle ENOENT lazily when hardlinking, or gate it behind averify-store-integritynpmrc flag using thechecked_atmtime comparison already wired into the row schema.Measured on a 1352-package lockfile (release build, macOS APFS, warm CAS store) — see PR #258 conversation for the numbers.