refactor(store-dir): batch store-index writes via a single writer task by zkochan · Pull Request #265 · pnpm/pacquet

zkochan · 2026-04-24T10:51:45Z

Port pnpm v11's queueWrites / flush / setRawMany pattern from store/index/src/index.ts. The write path used to do StoreIndex::open + 7-PRAGMA + solo-INSERT per extracted tarball, each one under its own tokio::task::spawn_blocking — so a 1352-snapshot install opened SQLite 1352 times and ballooned tokio's blocking pool on every install. This PR collapses that to one open + one writer thread + N batched transactions.

Design

StoreIndex::set_many — wraps a batch in BEGIN IMMEDIATE; INSERT OR REPLACE xN; COMMIT;. One WAL fsync amortizes across the batch, with a rollback on per-batch error so a partial apply never leaves the index in a half-written state. Encoding happens on the caller's thread before the transaction begins, so the writer only pays SQLite work.
StoreIndexWriter — Arc-cloneable handle around a tokio unbounded channel. Producers call queue(key, value) — no SQLite, no spawn_blocking, no per-row lock. A single blocking task drains the channel, collects each non-blocking burst into a batch (cap 256), and flushes via set_many.
Orchestration — CreateVirtualStore and InstallWithoutLockfile spawn the writer at the top of the run, thread Arc<StoreIndexWriter> through InstallPackageBySnapshot / InstallPackageFromRegistry / DownloadTarballToStore, drop the orchestration's clone after try_join_all, then await the writer task so the final batch flushes before the install returns.
Tarball path — DownloadTarballToStore::run_without_mem_cache drops its per-tarball spawn_blocking for the index write entirely.

Batch errors, JoinErrors, and writer-open failures are all downgraded to warn! and "skip the row", matching the read side's graceful-degradation stance from #261.

Honest framing — opened as `refactor`, not `perf`

Context: this came out of investigating #263. On macOS I have not been able to demonstrate a wall-time win from this change — fix vs main cold-install runs are in the same 12–24 s variance band with identical sys time. The measurable bottleneck in pacquet's cold-install sys time sits elsewhere, most likely in tarball extraction / write_cas_file and sha512-per-file, see the comment thread on #263 for the investigation notes and sample profiles.

The change still stands as structural cleanup:

Matches pnpm v11's model one-for-one, so pacquet and pnpm use the same write strategy against a shared index.db.
Collapses the per-tarball StoreIndex::open (+ 7 PRAGMAs) churn.
Removes the per-tarball spawn_blocking for index writes from the hot path.
Compounds with whatever fix the real bottleneck eventually gets.

If reviewers would rather hold this until the actual hot path is fixed so the two can land together (the batching framework makes things like #260's per-file stat-storm fix cleaner to wire in), happy to park the branch.

Depends on #264 for anyone wanting to bench this locally on macOS — without that PR the bench silently measures the wrong path.

Test plan

cargo check --workspace --all-targets.
cargo test -p pacquet-tarball -p pacquet-store-dir -p pacquet-package-manager green (against the prepared pacquet-registry-mock instance).
Manual cold install against a real verdaccio exercises both branches (writer Some and writer-task JoinError downgrade) without panicking.
CI Integrated-Benchmark — expected to be a wash on Linux (the open cost is small on ext4 and the 1352 threads were cheap there), primary goal is correctness, not wall-time.

Port pnpm v11's `queueWrites` / `flush` / `setRawMany` pattern from `store/index/src/index.ts` so the per-tarball store-index INSERT stops opening a fresh SQLite connection and pulling its own blocking-pool thread. * Add `StoreIndex::set_many`, which wraps a batch in `BEGIN IMMEDIATE; INSERT OR REPLACE x N; COMMIT;`. One WAL fsync amortizes across the batch, with a rollback on per-batch error so a partial apply never leaves the index in a half-written state. Encoding happens on the caller's thread before the transaction begins, so the writer only pays SQLite work. * Introduce `StoreIndexWriter`, an `Arc`-cloneable handle around a tokio unbounded channel. Producers call `queue(key, value)` — no SQLite, no `spawn_blocking`, no per-row lock. A single blocking task drains the channel, collects each non-blocking burst into a batch (cap 256), and flushes via `set_many`. * `CreateVirtualStore` and `InstallWithoutLockfile` spawn the writer at the top of the run, thread the `Arc<StoreIndexWriter>` through `InstallPackageBySnapshot` / `InstallPackageFromRegistry` / `DownloadTarballToStore`, drop the orchestration's clone after `try_join_all`, then `await` the writer task so the final batch flushes before the install returns. * `DownloadTarballToStore::run_without_mem_cache` drops its per-tarball `spawn_blocking` for the index write entirely. Batch errors, `JoinError`s, and writer-open failures are all downgraded to `warn!` and "skip the row", matching the read side's graceful-degradation stance from #261. Tests pass against the existing `pacquet-registry-mock` harness. Context: opened while investigating #263. On macOS I haven't been able to demonstrate a wall-time win from this change yet — the measurable bottleneck in pacquet's cold-install sys time sits elsewhere (most likely tarball extraction / `write_cas_file` and sha512 per file), see the comment thread on #263 for the investigation notes and sample profile. The change still stands as structural cleanup and the pnpm-model alignment, and it will compound with whatever fix the real bottleneck gets.

codecov · 2026-04-24T10:55:58Z

Codecov Report

❌ Patch coverage is 84.69388% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.50%. Comparing base (edd07c3) to head (1bc861c).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/store-dir/src/store_index.rs	84.90%	8 Missing ⚠️
crates/package-manager/src/create_virtual_store.rs	62.50%	3 Missing ⚠️
...package-manager/src/install_package_by_snapshot.rs	0.00%	2 Missing ⚠️
...es/package-manager/src/install_without_lockfile.rs	90.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #265      +/-   ##
==========================================
- Coverage   89.75%   89.50%   -0.25%     
==========================================
  Files          58       58              
  Lines        4527     4613      +86     
==========================================
+ Hits         4063     4129      +66     
- Misses        464      484      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-04-24T10:57:12Z

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.23     19.6±0.80ms   221.4 KB/sec    1.00     16.0±0.33ms   271.4 KB/sec

github-actions · 2026-04-24T11:01:21Z

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command	Mean [s]	Min [s]	Max [s]	Relative
`pacquet@HEAD`	1.124 ± 0.010	1.108	1.139	1.00 ± 0.02
`pacquet@main`	1.120 ± 0.015	1.106	1.161	1.00
`pnpm`	2.565 ± 0.051	2.502	2.660	2.29 ± 0.06

BENCHMARK_REPORT.json

{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 1.1243672597800003,
      "stddev": 0.010178035218097145,
      "median": 1.12393394188,
      "user": 0.41155511999999994,
      "system": 2.67024266,
      "min": 1.1078520893800001,
      "max": 1.1393580863800001,
      "times": [
        1.1393580863800001,
        1.11056244938,
        1.12388425938,
        1.13867615238,
        1.12387326738,
        1.12125082138,
        1.1244615233800002,
        1.1078520893800001,
        1.1297703243800001,
        1.12398362438
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 1.12036072658,
      "stddev": 0.015341339975821107,
      "median": 1.11680940788,
      "user": 0.41388971999999996,
      "system": 2.6484198599999997,
      "min": 1.1057865553800001,
      "max": 1.16124444138,
      "times": [
        1.12438208138,
        1.11957954738,
        1.11254007138,
        1.1175660543800001,
        1.11045507338,
        1.1212212133800001,
        1.16124444138,
        1.1057865553800001,
        1.1147794663800001,
        1.11605276138
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.56456967258,
      "stddev": 0.05098121894869429,
      "median": 2.5550952903799997,
      "user": 3.1824605199999993,
      "system": 2.2935985600000004,
      "min": 2.5016076903799997,
      "max": 2.6600094723799996,
      "times": [
        2.60640866838,
        2.5505876553799998,
        2.55056304338,
        2.6600094723799996,
        2.5261525993799996,
        2.55960292538,
        2.5029310243799996,
        2.5016076903799997,
        2.5698935223799997,
        2.61794012438
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Copilot

Pull request overview

Refactors pacquet’s store-index write path to avoid per-tarball SQLite opens and per-tarball spawn_blocking, by introducing a single writer task that batches index rows into transactional writes.

Changes:

Add StoreIndexWriter (single writer task + batching) and StoreIndex::set_many (transactional batch insert/replace).
Thread an optional store_index_writer handle through tarball download/extract and package install flows, and await the writer task at the end of installs.
Update benchmarks/tests callsites and add required dependencies (tokio, tracing) to pacquet-store-dir.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tasks/micro-benchmark/src/main.rs	Passes new `store_index_writer: None` field through benchmark setup.
crates/tarball/src/lib.rs	Queues index rows to the shared writer instead of per-tarball SQLite writes; wires new field through tests.
crates/store-dir/src/store_index.rs	Introduces `StoreIndexWriter` and `StoreIndex::set_many` for batched transactional writes.
crates/store-dir/Cargo.toml	Adds `tokio` (sync) and `tracing` dependencies to support the writer task + logging.
crates/package-manager/src/install_without_lockfile.rs	Spawns writer once per run, threads handle through recursion, awaits writer task at end.
crates/package-manager/src/install_package_from_registry.rs	Propagates writer handle into tarball download.
crates/package-manager/src/install_package_by_snapshot.rs	Propagates writer handle into tarball download for lockfile snapshot installs.
crates/package-manager/src/create_virtual_store.rs	Spawns writer once per run, threads handle to snapshot installs, awaits writer task at end.
Cargo.lock	Locks new transitive deps from adding `tokio` + `tracing` to `pacquet-store-dir`.

Comments suppressed due to low confidence (1)

crates/tarball/src/lib.rs:454

tokio::task::JoinHandle::await can return a JoinError (panic or cancellation), but this path uses expect("no join error"), which will panic the whole install. Since TarballError already has a TaskJoin(JoinError) variant, consider mapping the join error into that variant (or degrading to a cache miss) instead of panicking.

        })
        .await
        .expect("no join error")
        .map_err(|error| match error {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T11:09:26Z

+                if let Err(error) = index.set_many(batch.drain(..)) {
+                    // Drop the batch and keep going. One failed flush
+                    // (e.g. a disk-full hiccup) shouldn't silently drop
+                    // the rest of the install's entries; the next install
+                    // will cache-miss those rows and re-populate them,
+                    // matching the "best-effort index" stance the read
+                    // path already takes.
+                    tracing::warn!(
+                        target: "pacquet::store_index",
+                        ?error,
+                        "batched store-index write failed; dropping this batch and continuing",
+                    );
+                    batch.clear();
+                }


On set_many failure, the writer drops the entire batch (up to MAX_BATCH_SIZE entries). The PR description mentions "skip the row", but this behavior can lose many otherwise-valid rows due to a single transient SQLite error. Consider retrying with smaller batches / per-row fallback (or binary-splitting the batch to isolate a bad row) so failures are localized while still keeping batching for the common case.

Took a look and leaving this as-is for now. The "drop the whole batch" behaviour is the same graceful-degradation stance the read path already takes — a failed flush means those rows won't be in index.db, so the next install sees them as cache-miss, re-downloads, and re-writes on a fresh batch. User-visible consequence: "the store gets re-hydrated on the next install", which is also what happens after any interrupted install today.

Binary-splitting or per-row fallback would isolate a genuinely bad row at the cost of O(log N) extra transactions on every failure, plus book-keeping about which rows were already applied. Happy to add that if/when we see real errors in the wild that aren't "disk full" or "DB corrupted" (both of which won't be helped by smaller batches). For now the simpler fallback seems like the right tradeoff.

Copilot · 2026-04-24T11:09:27Z

+        let v11_dir = store_dir.v11();
+        let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<(String, PackageFilesIndex)>();
+        let handle = tokio::task::spawn_blocking(move || -> Result<(), StoreIndexError> {


StoreIndexWriter uses an unbounded mpsc channel, so if producers enqueue faster than SQLite can commit (e.g. slow/durable disk, contention), the queue can grow without bound and cause high memory usage. Consider using a bounded channel (with backpressure or try_send + drop policy) or adding an explicit queue-size cap/metrics to prevent unbounded buffering.

Valid concern at scale — keeping it unbounded for now and noting the tradeoff in the design block. Rough numbers for today's workload: PackageFilesIndex is a small struct (a HashMap of CafsFileInfo per file, most entries a few hundred bytes), so worst case per entry is maybe 10 KB and realistic is closer to 1 KB. A 1352-snapshot install with zero drain capacity would sit at ~5 MB peak. The writer drains in bursts of 256 and SQLite transactions commit fast enough in practice that the buffer stays in the low hundreds even on slow disks.

If we ever see memory pressure in a large-monorepo install we can switch to mpsc::channel::<_>(N) and add try_send handling at the producer. That keeps the fast-path identical but gains backpressure on the bad-path.

`collect::<Result<_, _>>()?` turned a single `encode_package_files_index` failure into "drop the whole batch" — up to 256 otherwise-valid rows lost because of one malformed `PackageFilesIndex`. That's stricter than the "best-effort index" contract the writer task and the read-side already follow. Encode in a loop, log the failing key at `warn!`, and keep the rest of the batch alive for commit. SQLite errors inside the transaction still roll back the whole batch (that's how atomicity is supposed to work); only the per-row encoding step is skipped. Addresses Copilot review on #265.

…irst failure; fix private-doc-link Three follow-up nits from the Copilot review on #265: * The comment in `tarball/src/lib.rs` described #263 as "fixed" — it isn't, this PR is one step toward fixing it. Reworded to "the per-write blocking thread churn described in #263" so the citation survives the landing of both. * `StoreIndexWriter::queue` logged a `warn!` on every `send` failure. When the writer task exits early (open failure, panic) every subsequent `queue` call fails — on a 1352-snapshot install that's a thousand identical warnings drowning everything else out. Add a `warn_on_send_failure: AtomicBool` one-shot guard: first failure logs (with "further failures silenced" hint), subsequent failures go silent. The observable signal the user actually cares about — "this install's rows aren't in the index, the next install will re-download" — surfaces on the next run regardless. * `StoreIndexWriter`'s doc block linked to the private constant `MAX_BATCH_SIZE`. `RUSTDOCFLAGS=-D warnings` in CI rejects the private-intra-doc-link lint, so replace the link with a literal "256 entries" plus a `see MAX_BATCH_SIZE` pointer in prose. Addresses Copilot review on #265.

# Conflicts: # crates/tarball/src/lib.rs

zkochan mentioned this pull request Apr 24, 2026

Tarball-extract path opens a fresh SQLite connection per snapshot; 500+ blocking threads thrash on macOS #263

Open

zkochan requested a review from Copilot April 24, 2026 11:04

Copilot started reviewing on behalf of zkochan April 24, 2026 11:05 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

zkochan added 2 commits April 24, 2026 13:32

zkochan merged commit 5d06cd5 into main Apr 24, 2026
12 of 14 checks passed

zkochan deleted the refactor/batched-store-index-writer branch April 24, 2026 13:06

zkochan added a commit that referenced this pull request Apr 24, 2026

merge main after #265/#268/#264 landed

87c5fe1

# Conflicts: # crates/tarball/src/lib.rs

zkochan mentioned this pull request Apr 24, 2026

Per-file link phase is the real warm-install bottleneck on macOS #273

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(store-dir): batch store-index writes via a single writer task#265

refactor(store-dir): batch store-index writes via a single writer task#265
zkochan merged 3 commits into
mainfrom
refactor/batched-store-index-writer

zkochan commented Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

zkochan Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

zkochan Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zkochan commented Apr 24, 2026

Design

Honest framing — opened as refactor, not perf

Test plan

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Micro-Benchmark Results

Linux

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

zkochan Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

zkochan Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Honest framing — opened as `refactor`, not `perf`

codecov Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading