Tarball-extract path opens a fresh SQLite connection per snapshot; 500+ blocking threads thrash on macOS

## Summary

On a 1352-snapshot frozen-lockfile install, pacquet is **~12× slower on macOS than on Linux CI** even though the same machine runs pnpm ~3× faster than a GH runner does. The gap isn't CPU work; it's a storm of per-tarball `StoreIndex::open` calls on the **write** side of the index, paired with an unbounded tokio blocking pool.

Same root shape as #260 (which was the *read* side, fixed for reads by #261). The write path kept the old pattern.

## Data

Integrated benchmark, frozen-lockfile, 1352-snapshot fixture (the one introduced by #262), same harness as CI:

| | CI (Ubuntu) | Local (M1 Pro, APFS) |
|---|---:|---:|
| `pacquet@HEAD` wall | 1.095 ± 0.009 s | **13.235 ± 0.979 s** |
| `pacquet@HEAD` user | 0.40 s | 0.62 s |
| `pacquet@HEAD` sys  | 2.59 s | **20.87 s** |
| `pnpm` wall | 2.550 ± 0.026 s | 0.930 ± 0.048 s |
| `pnpm` sys | 2.25 s | 0.33 s |

User time is similar; **sys time is ~8× higher on macOS**. pnpm doing the same install on the same disk has 0.33 s sys, so the filesystem is fine — this is pacquet-specific.

Forcing each `package-import-method` doesn't move the number, ruling out the CAS→`node_modules` linker as the culprit:

| method | wall | user | sys |
|---|---:|---:|---:|
| auto | 12.40 s | 0.61 | 18.05 |
| hardlink | 13.03 s | 0.63 | 18.28 |
| copy | 14.70 s | 0.65 | 19.69 |
| clone | 12.27 s | 0.61 | 20.42 |

`ps -M` shows pacquet holds **534 threads** from t+0.8 s through the whole install, most parked in `kevent`. That's tokio's blocking pool at its default cap of 512 + workers.

A 5 s `sample` confirms every tokio-rt-worker stack bottoms out in `kevent` — threads are spawned, do some work, and park.

## Root cause

`crates/tarball/src/lib.rs:427` spawns a blocking task *per tarball* to record the per-package row in the store index:

```rust
tokio::task::spawn_blocking(move || -> Result<(), StoreIndexError> {
    let store_index = StoreIndex::open(&v11_dir)?;
    store_index.set(&index_key, &pkg_files_idx)?;
    ...
})
```

For 1352 tarballs that's 1352 × `StoreIndex::open` (`crates/store-dir/src/store_index.rs:100-130`), where each call does:

1. `std::fs::create_dir_all(store_dir)`
2. `Connection::open("…/index.db")` (sqlite open: stat, open, fstat, pread, fcntl, mmap, plus WAL/SHM sidecar handling for the first writer)
3. `execute_batch` of 7 PRAGMAs (`busy_timeout=5000`, `journal_mode=WAL`, `synchronous=NORMAL`, `mmap_size=…`, `cache_size=-32000`, `temp_store=MEMORY`, `wal_autocheckpoint=10000`) + `CREATE TABLE IF NOT EXISTS`

Then `store_index.set` inserts one row. The actual inserts serialize on SQLite's `busy_timeout` (the callsite comment acknowledges this), so the per-open setup cost is paid concurrently but the inserts run mostly one at a time. The callsite also explicitly notes *"One StoreIndex per spawned task keeps the code lock-free"* — that's the pattern this issue is asking to replace.

### Why Linux CI doesn't see it

- ext4 `open`/`fstat`/`fcntl` cost a fraction of APFS's per-syscall.
- Linux pthreads + epoll handle 500 threads cheaply; XNU mach ports + kqueue charge more.
- SQLite open on ext4 ≈ 0.5 ms; on this APFS ≈ 5–15 ms. 1352 × (5–15 ms) ≈ 7–20 s — matches the observed delta almost exactly.

### Why #261 didn't help this path

#261 added `StoreIndex::shared_readonly_in` for the *cache-lookup* pass (1352 reads → 1 open). The write path still opens a fresh writable connection per tarball.

## Proposed fixes (ordered by expected impact)

1. **Share a single writable `StoreIndex` across the install.** Mirror #261's pattern for writes: open once, wrap in `Arc<Mutex<StoreIndex>>` (or run a single writer task fed by an `mpsc::channel`), and thread it through `download_and_extract_tarball`. Collapses 1352 opens to 1. Biggest single win, smallest diff.
2. **Batch inserts in a transaction.** SQLite WAL commit fsyncs once per transaction. Wrapping the per-package inserts in a single `BEGIN IMMEDIATE; … COMMIT;` (or small batches) removes 1352 independent commit fsyncs — another hidden APFS amplifier.
3. **Cap tokio's blocking pool.** `Runtime::max_blocking_threads(N)` where N ≈ 2–4 × CPU count. 534 threads is pathological on macOS regardless of workload; this alone won't close the gap but it's cheap insurance.

(1) alone should get local macOS down to a comparable factor over CI (≈ 2–3×, matching the pnpm ratio). (2) and (3) are additive polish.

## Repro

```bash
# bench env (uses the 1352-snapshot fixture from #262)
just integrated-benchmark --show-output --scenario=frozen-lockfile \
    --verdaccio --with-pnpm HEAD main
```

Bench harness note: `verify::ensure_git_repo` (`tasks/integrated-benchmark/src/verify.rs:17`) asserts `.git` is a directory, which fails when running against a git worktree — pass `-R <non-worktree-clone>` or fix the harness to accept `.git`-file worktrees.

Fixture note: `pnpm-workspace.yaml`'s `allowBuilds` list silences `core-js`/`es5-ext` but not `fsevents`, so pnpm exits 1 on macOS with `ERR_PNPM_IGNORED_BUILDS: fsevents@1.2.13`. Worth adding so local runs don't wedge the bench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tarball-extract path opens a fresh SQLite connection per snapshot; 500+ blocking threads thrash on macOS #263

Summary

Data

Root cause

Why Linux CI doesn't see it

Why #261 didn't help this path

Proposed fixes (ordered by expected impact)

Repro

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	CI (Ubuntu)	Local (M1 Pro, APFS)
`pacquet@HEAD` wall	1.095 ± 0.009 s	13.235 ± 0.979 s
`pacquet@HEAD` user	0.40 s	0.62 s
`pacquet@HEAD` sys	2.59 s	20.87 s
`pnpm` wall	2.550 ± 0.026 s	0.930 ± 0.048 s
`pnpm` sys	2.25 s	0.33 s

method	wall	user	sys
auto	12.40 s	0.61	18.05
hardlink	13.03 s	0.63	18.28
copy	14.70 s	0.65	19.69
clone	12.27 s	0.61	20.42

Tarball-extract path opens a fresh SQLite connection per snapshot; 500+ blocking threads thrash on macOS #263

Description

Summary

Data

Root cause

Why Linux CI doesn't see it

Why #261 didn't help this path

Proposed fixes (ordered by expected impact)

Repro

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions