perf(store-dir): batch the prefetch SELECT to remove cold-cache regression from #292

## Context

PR #292 introduced an install-wide `prefetch_cas_paths` that walks every
`(integrity, pkg_id)` the lockfile mentions in one `spawn_blocking` task,
returning a `cache_key → cas_paths` map the per-snapshot futures hit
synchronously. On warm-cache installs that's a ~25% win.

On the integrated benchmark's **cold-cache** Linux scenario (Frozen
Lockfile against an empty store), pacquet@HEAD shows a ~5% regression
vs main:

| revision | mean wall |
|---|---|
| pacquet@main | 2.810 ± 0.126 s |
| pacquet@HEAD (#292) | 2.950 ± 0.128 s |
| pnpm | 5.957 s |

## Cause

When every key misses, the prefetch still costs ~100-150 ms:
1. ~1352 individual SQLite `SELECT data FROM package_index WHERE key = ?`
   round-trips (each ~40 µs even for a miss) → ~50 ms.
2. Build / sort / dedup of the `cache_keys` Vec.
3. Re-iterating snapshots in the warm/cold partition (which finds zero
   warm entries on a fully-empty store).

The empty-store case is the worst-case for prefetch — for partially-
empty stores (the common real-world cold install: lockfile mostly
covered by previous installs, a handful of new packages) the prefetch
still covers the hits via the rayon warm-batch path, so it's a net win.

## Proposed fix

Replace the per-key SELECT loop in `prefetch_cas_paths` with one batched
`SELECT key, data FROM package_index WHERE key IN (?, ?, …)`. SQLite
walks the index B-tree once for the whole set; one round trip across the
mutex instead of N. With `rusqlite`'s bundled SQLite the variable cap is
32766 (well above any realistic lockfile size), but we'd chunk at 999
as a guard for hand-rolled custom builds with the older cap.

Sketch:

```rust
// crates/store-dir/src/store_index.rs
impl StoreIndex {
    pub fn get_many(
        &self,
        keys: &[String],
    ) -> Result<HashMap<String, PackageFilesIndex>, StoreIndexError> {
        let mut out = HashMap::with_capacity(keys.len());
        if keys.is_empty() { return Ok(out); }
        const CHUNK: usize = 999;
        for chunk in keys.chunks(CHUNK) {
            let placeholders =
                std::iter::repeat(\"?\").take(chunk.len()).collect::<Vec<_>>().join(\",\");
            let sql =
                format!(\"SELECT key, data FROM package_index WHERE key IN ({placeholders})\");
            let mut stmt = self.conn.prepare(&sql)?;
            let params = rusqlite::params_from_iter(chunk.iter().map(String::as_str));
            let rows = stmt.query_map(params, |row| {
                Ok((row.get::<_, String>(0)?, row.get::<_, Vec<u8>>(1)?))
            })?;
            for row in rows {
                let (key, bytes) = row?;
                if let Ok(entry) = decode_index_value(&bytes) {
                    out.insert(key, entry);
                }
                // Skip undecodable rows (matches the per-call `get` flow,
                // where `load_cached_cas_paths` already does `.ok()?`).
            }
        }
        Ok(out)
    }
}
```

Then `prefetch_cas_paths` becomes:

```rust
let entries = guard.get_many(&cache_keys).unwrap_or_default();
drop(guard);
// integrity-check loop unchanged
```

Expected effect: cold-cache regression drops from ~150 ms to ~5 ms;
warm/partial cases get a small bonus (one query vs N).

## Risks

- New SQLite code path. Want focused unit tests covering: empty input,
  single-chunk, multi-chunk, undecodable row skip, all-miss, all-hit,
  mixed.
- Behaviour difference vs the per-call `get`: a single decode failure
  in the batch is logged and skipped instead of bubbling up; functionally
  matches the existing `load_cached_cas_paths` `.ok()?` semantics, but
  worth calling out in the PR.

Splitting this out keeps #292 focused on the structural perf change
(prefetch + warm-batch on rayon) and lets the SQL refactor land with
its own test coverage.

## Acceptance criteria

- Cold-cache integrated-benchmark Linux on alot7-equivalent within
  noise of `main` (was ~2.81 s; target ≤ ~2.85 s).
- Warm-cache benchmark unchanged from #292's level.
- `just ready` clean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(store-dir): batch the prefetch SELECT to remove cold-cache regression from #292 #294

Context

Cause

Proposed fix

Risks

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

revision	mean wall
pacquet@main	2.810 ± 0.126 s
pacquet@HEAD (#292)	2.950 ± 0.128 s
pnpm	5.957 s

perf(store-dir): batch the prefetch SELECT to remove cold-cache regression from #292 #294

Description

Context

Cause

Proposed fix

Risks

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions