Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.
This repository was archived by the owner on May 14, 2026. It is now read-only.

Cache lookup opens a fresh SQLite connection per snapshot; per-file stat storm on every install #260

@zkochan

Description

@zkochan

Summary

In the cold-cache, warm-store case (rm -rf node_modules && pacquet install --frozen-lockfile) pacquet spends noticeably more wall time than pnpm — on a 1352-package lockfile: pacquet 17.4s vs pnpm 11.6s, release build, zero network fetches on both sides.

Two per-snapshot costs in the cache-hit path explain most of the gap:

1. Fresh SQLite connection per snapshot (1352 connections)

load_cached_cas_paths at crates/tarball/src/lib.rs:111-158 calls StoreIndex::open_readonly_in(store_dir) unconditionally for every snapshot:

async fn load_cached_cas_paths(
    store_dir: &'static StoreDir,
    cache_key: String,
) -> Option<HashMap<String, PathBuf>> {
    tokio::task::spawn_blocking(move || -> Option<HashMap<String, PathBuf>> {
        let index = StoreIndex::open_readonly_in(store_dir).ok()?;   // ← per snapshot
        let entry = index.get(&cache_key).ok()??;})}

StoreIndex::open_readonly (crates/store-dir/src/store_index.rs:139-146) does Connection::open_with_flags + conn.busy_timeout(5s) on each call. With 1352 snapshots that's 1352 SQLite opens — no sharing, no pool, and because the work runs under spawn_blocking, the calls serialize on tokio's blocking pool (observed CPU utilization stuck at ~135% despite plenty of cores idle, vs ~509% on the warm rerun that short-circuits before this code).

Fix: open one read-only StoreIndex (or a small pool) and reuse it across all snapshots. SQLite read-only connections support concurrent reads, and the index is owned by Npmrc-like static state that already outlives every snapshot.

2. Per-file symlink_metadata on every CAS blob

Once the row is decoded, the same function walks every file in the entry and stats the CAS blob:

// crates/tarball/src/lib.rs:136-150
if !path.symlink_metadata().is_ok_and(|m| {
    let file_type = m.file_type();
    !file_type.is_symlink() && file_type.is_file()
}) {
    tracing::debug!(, "CAFS path missing or not a regular file; index entry is stale, re-fetching");
    return None;
}

For 1352 packages at ~50 files each that's ~65k stat syscalls — on every install — just to pre-check existence before the hardlink pass would discover ENOENT anyway.

What pnpm v11 does instead

Two relevant paths in ~/src/pnpm/pnpm/v11:

Fast path — verifyStoreIntegrity=false (not the default, but illustrative). buildFileMapsFromIndex in store/cafs/src/checkPkgFilesIntegrity.ts:81-119 just reads the index and builds the filename → CAFS path map with zero stat calls.

Default path — verifyStoreIntegrity=true. checkFile in checkPkgFilesIntegrity.ts:39-65:

function checkFile (filename, checkedAt) {
  try {
    const { mtimeMs, size } = fs.statSync(filename)
    return { isModified: (mtimeMs - (checkedAt ?? 0)) > 100, size }
  } catch (err) {
    if (err.code === 'ENOENT') return null
    throw err
  }
}

It stats once, and only re-runs the integrity hash if mtimeMs - checkedAt > 100ms:

If a file was not edited, we are skipping integrity check. We assume that nobody will manually remove a file in the store and create a new one.

Lazy ENOENT at hardlink time. fs/indexed-pkg-importer/src/index.ts linkOrCopy falls back on link failure rather than pre-checking every source:

function linkOrCopy (existingPath, newPath) {
  try { fs.linkSync(existingPath, newPath) }
  catch (err) {  resilientCopyFileSync(existingPath, newPath) }
}

Relevance to pacquet

pacquet already persists the checkedAt field in index.db rows (crates/store-dir/src/msgpackr_records.rs has CafsFileInfo::checked_at, with comments explicitly referencing pnpm's mtimeMs - (checkedAt ?? 0) semantics) — the data is there, the fast-path logic isn't.

Proposed fix

  1. Share a single read-only StoreIndex across the cache-lookup pass (biggest win, no behavioural change).
  2. Replace the per-file symlink_metadata pre-check with pnpm's model: either skip validation entirely and handle ENOENT lazily when hardlinking, or gate it behind a verify-store-integrity npmrc flag using the checked_at mtime comparison already wired into the row schema.

Measured on a 1352-package lockfile (release build, macOS APFS, warm CAS store) — see PR #258 conversation for the numbers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions