Context
PR #292 introduced an install-wide prefetch_cas_paths that walks every
(integrity, pkg_id) the lockfile mentions in one spawn_blocking task,
returning a cache_key → cas_paths map the per-snapshot futures hit
synchronously. On warm-cache installs that's a ~25% win.
On the integrated benchmark's cold-cache Linux scenario (Frozen
Lockfile against an empty store), pacquet@HEAD shows a ~5% regression
vs main:
| revision |
mean wall |
| pacquet@main |
2.810 ± 0.126 s |
| pacquet@HEAD (#292) |
2.950 ± 0.128 s |
| pnpm |
5.957 s |
Cause
When every key misses, the prefetch still costs ~100-150 ms:
- ~1352 individual SQLite
SELECT data FROM package_index WHERE key = ?
round-trips (each ~40 µs even for a miss) → ~50 ms.
- Build / sort / dedup of the
cache_keys Vec.
- Re-iterating snapshots in the warm/cold partition (which finds zero
warm entries on a fully-empty store).
The empty-store case is the worst-case for prefetch — for partially-
empty stores (the common real-world cold install: lockfile mostly
covered by previous installs, a handful of new packages) the prefetch
still covers the hits via the rayon warm-batch path, so it's a net win.
Proposed fix
Replace the per-key SELECT loop in prefetch_cas_paths with one batched
SELECT key, data FROM package_index WHERE key IN (?, ?, …). SQLite
walks the index B-tree once for the whole set; one round trip across the
mutex instead of N. With rusqlite's bundled SQLite the variable cap is
32766 (well above any realistic lockfile size), but we'd chunk at 999
as a guard for hand-rolled custom builds with the older cap.
Sketch:
// crates/store-dir/src/store_index.rs
impl StoreIndex {
pub fn get_many(
&self,
keys: &[String],
) -> Result<HashMap<String, PackageFilesIndex>, StoreIndexError> {
let mut out = HashMap::with_capacity(keys.len());
if keys.is_empty() { return Ok(out); }
const CHUNK: usize = 999;
for chunk in keys.chunks(CHUNK) {
let placeholders =
std::iter::repeat(\"?\").take(chunk.len()).collect::<Vec<_>>().join(\",\");
let sql =
format!(\"SELECT key, data FROM package_index WHERE key IN ({placeholders})\");
let mut stmt = self.conn.prepare(&sql)?;
let params = rusqlite::params_from_iter(chunk.iter().map(String::as_str));
let rows = stmt.query_map(params, |row| {
Ok((row.get::<_, String>(0)?, row.get::<_, Vec<u8>>(1)?))
})?;
for row in rows {
let (key, bytes) = row?;
if let Ok(entry) = decode_index_value(&bytes) {
out.insert(key, entry);
}
// Skip undecodable rows (matches the per-call `get` flow,
// where `load_cached_cas_paths` already does `.ok()?`).
}
}
Ok(out)
}
}
Then prefetch_cas_paths becomes:
let entries = guard.get_many(&cache_keys).unwrap_or_default();
drop(guard);
// integrity-check loop unchanged
Expected effect: cold-cache regression drops from ~150 ms to ~5 ms;
warm/partial cases get a small bonus (one query vs N).
Risks
- New SQLite code path. Want focused unit tests covering: empty input,
single-chunk, multi-chunk, undecodable row skip, all-miss, all-hit,
mixed.
- Behaviour difference vs the per-call
get: a single decode failure
in the batch is logged and skipped instead of bubbling up; functionally
matches the existing load_cached_cas_paths .ok()? semantics, but
worth calling out in the PR.
Splitting this out keeps #292 focused on the structural perf change
(prefetch + warm-batch on rayon) and lets the SQL refactor land with
its own test coverage.
Acceptance criteria
Context
PR #292 introduced an install-wide
prefetch_cas_pathsthat walks every(integrity, pkg_id)the lockfile mentions in onespawn_blockingtask,returning a
cache_key → cas_pathsmap the per-snapshot futures hitsynchronously. On warm-cache installs that's a ~25% win.
On the integrated benchmark's cold-cache Linux scenario (Frozen
Lockfile against an empty store), pacquet@HEAD shows a ~5% regression
vs main:
Cause
When every key misses, the prefetch still costs ~100-150 ms:
SELECT data FROM package_index WHERE key = ?round-trips (each ~40 µs even for a miss) → ~50 ms.
cache_keysVec.warm entries on a fully-empty store).
The empty-store case is the worst-case for prefetch — for partially-
empty stores (the common real-world cold install: lockfile mostly
covered by previous installs, a handful of new packages) the prefetch
still covers the hits via the rayon warm-batch path, so it's a net win.
Proposed fix
Replace the per-key SELECT loop in
prefetch_cas_pathswith one batchedSELECT key, data FROM package_index WHERE key IN (?, ?, …). SQLitewalks the index B-tree once for the whole set; one round trip across the
mutex instead of N. With
rusqlite's bundled SQLite the variable cap is32766 (well above any realistic lockfile size), but we'd chunk at 999
as a guard for hand-rolled custom builds with the older cap.
Sketch:
Then
prefetch_cas_pathsbecomes:Expected effect: cold-cache regression drops from ~150 ms to ~5 ms;
warm/partial cases get a small bonus (one query vs N).
Risks
single-chunk, multi-chunk, undecodable row skip, all-miss, all-hit,
mixed.
get: a single decode failurein the batch is logged and skipped instead of bubbling up; functionally
matches the existing
load_cached_cas_paths.ok()?semantics, butworth calling out in the PR.
Splitting this out keeps #292 focused on the structural perf change
(prefetch + warm-batch on rayon) and lets the SQL refactor land with
its own test coverage.
Acceptance criteria
noise of
main(was ~2.81 s; target ≤ ~2.85 s).just readyclean.