fix(store): write index.db rows as msgpackr records for pnpm interop by zkochan · Pull Request #252 · pnpm/pacquet

zkochan · 2026-04-23T22:28:01Z

Summary

Follow-up to #251. Pacquet's StoreIndex::set was writing index.db rows via rmp_serde::to_vec_named — plain MessagePack maps. pnpm reads the same file with Packr({useRecords: true, moreTypes: true}).unpack(…), and in that mode every msgpack map at any nesting level decodes as a JS Map. Records are the escape hatch that tells msgpackr "this one's a plain object".

So pacquet-written rows came back as a top-level Map, pkgIndex.files (a property access on that Map) was undefined, and pnpm's for (const [f, fstat] of pkgIndex.files) threw files is not iterable. That surfaced in CI as ERR_PNPM_READ_FROM_STORE the first time a benchmark run picked up a shared pnpm-v11 cache with pacquet-written rows in it — every downstream job's pnpm install crashed trying to decode them.

#251 taught pacquet how to read pnpm's records; b0bed43 fixed the checkedAt encoding (float64 vs uint64-as-BigInt) and called the pnpm-reads-pacquet direction done — but the container shape was still wrong, and nothing in the test suite actually put a pacquet-written row back through msgpackr, so it slipped past.

Fix

New encode_package_files_index in msgpackr_records.rs that emits msgpackr-records bytes. Behaviour matches what msgpackr 1.11.8 itself emits for the same logical input (verified against the version pinned in pnpm v11's catalog:):

Slot allocation is shape-aware. Slot 0x40 is reserved for the top-level PackageFilesIndex. Inner slots in 0x41..=0x7f are allocated lazily, one per distinct record shape, in first-seen order. A CafsFileInfo carrying checkedAt and one without it land in separate slots; same-shape instances downstream collapse to a single bare-slot byte, which is the record-compression win records exist for.
Optional fields are omitted from the schema when None, not encoded as nil. A SideEffectsDiff { added: Some, deleted: None } decodes on the pnpm side to { added: Map }, not { added: Map, deleted: null } — the same JS shape msgpackr itself would produce for the same Rust input, so downstream checks like diff.added != null / diff.deleted != null behave identically regardless of which tool wrote the row.
CafsFileInfo.checkedAt is written as float 64 when Some (uint 64 would decode to a JS BigInt and crash pnpm's mtimeMs - (checkedAt ?? 0)), omitted from the schema when None.
manifest: Some(_) returns EncodeError::ManifestNotSupported. Pacquet doesn't populate it today; encoding serde_json::Value as records is follow-up work.

The encoder also promotes integer values in 0x40..=0x7f to uint 8 instead of positive fixints. Inside a records stream those bytes are slot-reference markers — a bare 0x7b would be parsed as a ref to slot 0x7b rather than the integer 123. msgpackr does the same promotion for the same reason.

State is a tight [Option<u8>; 2] for CafsFileInfo slots and [Option<u8>; 4] for SideEffectsDiff slots, indexed by a shape bitmask — 2 and 4 are the respective maxima, so lookup is a cache-local array index with no HashMap allocation on the hot per-file path.

StoreIndex::set now calls this encoder. StoreIndex::get goes through the existing transcoder which already handled records, so the read path needed no behaviour change — just a doc update to enumerate the three row flavours it now tolerates (pnpm-written records, current pacquet-written records, legacy pacquet plain-msgpack from rows that predate this PR).

Why not a bespoke format?

Considered writing pacquet's rows in a different shape (positional tuple, or keep plain msgpack and switch pnpm's side). Ruled out: the whole point of #244 and #251 is a shared index.db the two tools read each other's entries from. Any shape other than "what pnpm's msgpackr emits" breaks that promise.

Performance

Faster and smaller than rmp_serde::to_vec_named. Measured on a release build, 10k iterations per row-size, with a realistic CafsFileInfo shape (128-hex digest, millisecond timestamp):

Serialize time per row

files	`rmp_serde` (before)	records encoder (after)	speedup
1	304 ns	164 ns	1.9×
10	1.23 µs	838 ns	1.5×
50	3.71 µs	2.25 µs	1.6×
200	11.9 µs	8.11 µs	1.5×
1000	55.9 µs	27.7 µs	2.0×

Row size on disk

files	`rmp_serde`	records encoder	delta
1	214 B	220 B	+6 B
10	1960 B	1723 B	−12%
50	9722 B	8405 B	−14%
200	38822 B	33455 B	−14%
1000	194022 B	167055 B	−14%

The 6-byte regression at n=1 is the outer record header (d4 72 40) plus a slightly longer field-name array; from n≥4 records come out smaller because each subsequent CafsFileInfo instance collapses to a single slot byte instead of re-emitting "digest", "mode", "size", "checkedAt" (~20 bytes of field names). Why it's faster: no serde runtime indirection (no trait dispatch, no generic-over-Serializer code paths), single pre-sized Vec, direct byte writes, zero-alloc slot lookup via the fixed-size arrays.

StoreIndex::get is unchanged, so read cost is identical to before. Tarball writes call encode once per package, so this shows up at ~10–50 µs/package — drowned out by the SQLite INSERT (~100–500 µs) either way, but the payload-size shrink compounds across thousands of packages in the shared v11 store.

Test plan

cargo test -p pacquet-store-dir --lib — 37 passing, 12 new in msgpackr_records::tests:
- encode_emits_record_header_for_top_level_struct — outer 0xd4 0x72 0x40 record-def header is present.
- encode_roundtrips_single_file / encode_roundtrips_many_files_sharing_one_slot — single-file round-trip; N uniform-shape CafsFileInfos produce exactly 2 record defs (outer + one inner shape), the rest are bare slot refs.
- encode_handles_fixint_in_slot_range_safely — size: 0x7b round-trips without triggering UnknownSlot.
- encode_omits_checked_at_when_none — None checkedAt isn't written as nil; the string "checkedAt" doesn't appear anywhere in the output.
- encode_allocates_separate_slots_for_distinct_cafs_shapes — mixed CafsFileInfo shapes (some with checkedAt, some without) land in separate slots; same shape reuses.
- encode_requires_build_when_set / encode_omits_requires_build_when_none — outer-record optional-field handling.
- encode_side_effects_roundtrip — SideEffectsDiff with both fields populated.
- encode_side_effects_with_only_added_omits_deleted_field — exact shape the reviewer flagged: deleted: None must not appear as a field name in the output.
- encode_allocates_separate_slots_for_distinct_side_effects_shapes — two SideEffectsDiffs with distinct shapes land in separate slots.
- encode_rejects_manifest_some — error path for the currently-unimplemented branch.
cargo test -p pacquet-cli --test pnpm_compatibility — new pnpm_reads_pacquet_written_rows passes. pacquet installs into a store, node_modules is wiped, then pnpm install --ignore-scripts reinstalls against the same store. If the record encoding is wrong this reproduces the benchmark's ERR_PNPM_READ_FROM_STORE. Complements the existing same_file_structure / same_index_file_contents which clear the store between the pacquet and pnpm halves and therefore don't exercise this path.
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check
cargo doc -p pacquet-store-dir --no-deps — no warnings.

Pacquet's `StoreIndex::set` used `rmp_serde::to_vec_named`, which emits plain MessagePack maps. pnpm reads `index.db` with `Packr({useRecords: true, moreTypes: true}).unpack(…)`, and in that mode every msgpack map at any nesting level decodes as a JS `Map` — records are the escape hatch that tells msgpackr "this one's a plain object". So pacquet-written rows came back as a top-level `Map`, `pkgIndex.files` (a property access) was `undefined`, and pnpm's `for (const [f, fstat] of pkgIndex.files)` threw `files is not iterable`. That surfaced in CI as `ERR_PNPM_READ_FROM_STORE` the first time a benchmark run hit a shared pnpm-v11 cache that had pacquet-written rows mixed in — every downstream job's `pnpm install` crashed trying to decode them. #251 added records *decoding* but the symmetric write path was wrong: `b0bed43` fixed the `checkedAt` encoding (float64, not uint64-as-BigInt) and called the interop done, but never actually round-tripped a pacquet-written row through msgpackr. ## Fix New `encode_package_files_index` in `msgpackr_records.rs` that emits msgpackr-records bytes directly: - `PackageFilesIndex` → record at slot 0x40. `algo` and `files` always; `requiresBuild` / `sideEffects` only when `Some`, matching msgpackr's field-omit-when-absent convention for plain JS objects. - `CafsFileInfo` → record at slot 0x41. Fixed `[digest, mode, size, checkedAt]` schema so every file in a row shares one slot (msgpackr would split shapes across slots; logical decoding is identical either way and one slot keeps the encoder simple). `checked_at` is `float 64` when `Some`, `nil` when `None` — either maps to pnpm's `?? 0` coalesce. - `SideEffectsDiff` → record at slot 0x42. Fixed `[added, deleted]`. - `manifest: Some(_)` returns `EncodeError::ManifestNotSupported` (pacquet doesn't populate it today; encoding `serde_json::Value` records is follow-up work). The encoder also promotes integer values in `0x40..=0x7f` to `uint 8` instead of positive fixints. Inside a records stream those bytes are slot-reference markers, so a bare `0x7b` would be parsed as a ref to slot 0x7b rather than the integer 123. msgpackr does the same promotion for the same reason. `StoreIndex::set` now calls this encoder; `StoreIndex::get` goes through the existing transcoder which already handled records, so the read path needed no change beyond updating the doc comment that still described the old plain-msgpack write path. ## Tests - Nine new encoder tests in `msgpackr_records::tests`, all using the encode → transcode → `rmp_serde::from_slice` round-trip. Covers record header emission, slot reuse across many `CafsFileInfo` instances, fixint-in-slot-range promotion, nil-for-missing checkedAt, optional field inclusion/omission, side effects, and the `manifest: Some` error path. - New `pnpm_reads_pacquet_written_rows` integration test (in `crates/cli/tests/pnpm_compatibility.rs`) — this is the end-to-end catch: pacquet installs into a store, node_modules is wiped, pnpm reinstalls against the same store. If the record encoding is wrong this reproduces the benchmark's `ERR_PNPM_READ_FROM_STORE`. The existing `same_file_structure` / `same_index_file_contents` tests didn't exercise this path because they always cleared the store between the pacquet and pnpm halves.

codecov · 2026-04-23T22:31:37Z

Codecov Report

❌ Patch coverage is 92.75362% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.26%. Comparing base (0ea1252) to head (8619d64).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/store-dir/src/msgpackr_records.rs	92.71%	30 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
+ Coverage   87.62%   88.26%   +0.63%     
==========================================
  Files          57       57              
  Lines        3621     4031     +410     
==========================================
+ Hits         3173     3558     +385     
- Misses        448      473      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR fixes pnpm v11 interoperability by changing how pacquet serializes index.db rows: it now writes msgpackr records (matching pnpm’s Packr({ useRecords: true, moreTypes: true })) so pnpm can read pacquet-written rows without decoding them as JS Map and crashing.

Changes:

Add a dedicated msgpackr-records encoder for PackageFilesIndex (and nested types) with slot reuse and fixint-slot-range safety.
Switch StoreIndex::set to store rows using the new records encoder and update associated docs/error types.
Add a CLI regression test that ensures pnpm install can read pacquet-written index.db rows from a shared store.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
crates/store-dir/src/store_index.rs	Routes writes through the new msgpackr-records encoder; updates docs and error typing for encoding.
crates/store-dir/src/msgpackr_records.rs	Implements `encode_package_files_index` emitting msgpackr-records bytes with slot reuse and safe integer encoding.
crates/cli/tests/pnpm_compatibility.rs	Adds an end-to-end regression test where pnpm must successfully read pacquet-written store rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-23T22:33:16Z

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     18.1±0.26ms   239.4 KB/sec    1.00     18.1±0.56ms   239.1 KB/sec

msgpackr_records.rs now carries both directions; update the module-level doc so `cargo doc` / IDE summaries aren't introducing it as "Decoder for…" any more. Also spells out both interop failure modes (pnpm → pacquet and pacquet → pnpm) so a future reader understands why the write side couldn't just stay on `rmp_serde`.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-23T22:42:27Z

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command	Mean [s]	Min [s]	Max [s]	Relative
`pacquet@HEAD`	3.378 ± 0.038	3.302	3.423	1.00
`pacquet@main`	3.387 ± 0.099	3.297	3.647	1.00 ± 0.03

BENCHMARK_REPORT.json

{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 3.3783574037,
      "stddev": 0.03816034891649137,
      "median": 3.3881653729,
      "user": 0.15992965999999997,
      "system": 0.20683638,
      "min": 3.3022906794,
      "max": 3.4234185154,
      "times": [
        3.3542556734,
        3.4234185154,
        3.4104485134,
        3.3022906794,
        3.3929876124000002,
        3.4175035734,
        3.3381832024,
        3.3914535354,
        3.3681555214,
        3.3848772104
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 3.3871463393,
      "stddev": 0.09919247718343524,
      "median": 3.3707923999,
      "user": 0.15706785999999998,
      "system": 0.20469468000000002,
      "min": 3.2966378924,
      "max": 3.6465850054,
      "times": [
        3.3260606404,
        3.3983405754000002,
        3.6465850054,
        3.3213373174,
        3.4286715284000002,
        3.3378377714,
        3.3744078624,
        3.3683759164000002,
        3.3732088834,
        3.2966378924
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Both doc blocks still read as if pacquet had only just started writing msgpackr-records and still mostly writes plain msgpack — the opposite of the post-PR reality. Rephrase both so: - `StoreIndex::get` opens with "rows come in three flavours" and lists pnpm-written, current-pacquet-written, and legacy-pacquet plain-msgpack, in that order. - The `pnpm_reads_pacquet_written_rows` regression-test header leads with the post-PR state ("pacquet now writes msgpackr-records") and explains the old encoding as the regression to guard against, rather than describing it as the current behaviour.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Previous encoder hardcoded one slot per record type with fixed 4-field / 2-field schemas, using `nil` for absent optional fields (`checkedAt`, `added`, `deleted`). Produced bytes that *decoded* fine but had a subtly different JS shape than msgpackr's own output — `{added: Map, deleted: null}` instead of `{added: Map}`, for instance. pnpm's own readers are nil-tolerant (`?? 0` / truthy checks), so the actual on-disk effect today is zero, but third-party tools that inspect the index might not be, and we shouldn't write bytes that differ from what msgpackr would emit for the same input. New encoder follows msgpackr's actual strategy (verified against 1.11.8 in the v11 worktree): each distinct **shape** of an inner record type gets its own slot. A `CafsFileInfo { checked_at: Some }` gets slot 0x41, `CafsFileInfo { checked_at: None }` gets 0x42, `SideEffectsDiff { added: Some, deleted: None }` would then get 0x43, etc. Same-shape instances downstream collapse to a single bare-slot byte (still the compression win records exist for), while mixed-shape streams get as many slots as they need without any one slot being redefined mid-stream. State: struct EncodeState { cafs_slots: [Option<u8>; 2], // indexed by cafs_shape side_effects_slots: [Option<u8>; 4], // indexed by side_effects_shape next_slot: u8, } Fixed-size arrays instead of `HashMap` because each record type has at most 4 possible shapes — a 2-slot `Option<u8>` array is cache-local, zero-alloc, and measurably faster than `HashMap::get` on the hot per-file path. The 0x40..=0x7f slot range is 64 entries so we're nowhere near running out. `allocate_slot` panics if we did, which would mean >63 distinct record shapes in one stream; not possible for current pacquet code. Perf + size unchanged for the common uniform-shape case (every `CafsFileInfo` has `checkedAt`) — same one def, same N-1 references. Mixed shapes pay the def cost once per new shape and reuse thereafter, exactly as msgpackr does. Re-ran the bench: ~2× faster than `rmp_serde::to_vec_named` and 14% smaller output at 1000 files, same numbers as the previous iteration within measurement noise. Tests: - Renamed `encode_handles_missing_checked_at_as_nil` → `encode_omits_checked_at_when_none` and updated the assertion: no `checkedAt` field name should appear in the output at all when every instance has `None`. - `encode_allocates_separate_slots_for_distinct_cafs_shapes`: mix with/without-`checkedAt` files and assert exactly three record defs (outer + the two inner shapes). - `encode_side_effects_with_only_added_omits_deleted_field`: the specific shape Copilot flagged — `SideEffectsDiff { added: Some, deleted: None }` must not emit a `deleted` field name. - `encode_allocates_separate_slots_for_distinct_side_effects_shapes`: two `SideEffectsDiff`s with distinct shapes land in separate slots. `write_nil` is now unused (no more nil-for-absent fields) and removed.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Doc block on `encode_package_files_index` still described the old fixed-schema implementation: single slot per Rust type, `[digest, mode, size, checkedAt]` with `nil`, `[added, deleted]` with `nil`. With shape-aware slot allocation in 9296125, none of that is true anymore. Rewrite the **Slot allocation** and **Optional-field handling** sections so they match the code: `0x40` is fixed for `PackageFilesIndex`, `0x41..=0x7f` are allocated lazily one-per-shape in first-seen order, optional fields are omitted from the schema rather than padded with `nil`. Spell out the pnpm-side consequence — a `SideEffectsDiff { added: Some, deleted: None }` now decodes to `{ added: Map }` not `{ added: Map, deleted: null }`. Also fix the stale `EncodeState::slot_for` link on `PKG_FILES_INDEX_SLOT` (the helper is `allocate_slot`) and drop the intra-doc links to the private `cafs_shape` / `side_effects_shape` fns that were triggering rustdoc private-item warnings.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two stale-docs follow-ups: - The encoder rustdoc said pacquet's output is "byte-identical" to msgpackr's. Not true — `HashMap` iteration order is nondeterministic, so `files` / `sideEffects` / `added` entries may be serialised in a different order from msgpackr's JS `Map` iteration. Soften to "wire-compatible" (same record schemas, same slot numbers, same value encodings; pnpm decodes the same JS shape) and explicitly call out that exact bytes can still differ. - The `same_index_file_contents` test header above `pnpm_compatibility.rs` still described pacquet as writing plain msgpack via `rmp_serde::to_vec_named`. Rewrite to "both tools now write msgpackr records; this test compares post-decode structs, not byte-identical rows".

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Top-of-file Strategy section said the write side "allocates a slot per Rust struct type". Not true anymore — a single type can consume multiple slots when optional-field presence varies, because each distinct record shape gets its own slot. Reword so the module intro doesn't imply a 1:1 type↔slot mapping.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

`EncodeState::allocate_slot` used `assert!` to guard against running out of the 0x41..=0x7f inner-slot range. Unreachable today (pacquet has at most 6 possible record shapes across `CafsFileInfo` and `SideEffectsDiff`; 63 slots available), but `assert!` panics in release, which is the wrong failure mode for a corrupted-or-unusual payload — a typed error gets propagated cleanly up through `StoreIndex::set` to the caller. - New `EncodeError::OutOfRecordSlots { max }` variant. - `allocate_slot` now returns `Result<u8, EncodeError>`. - Propagated `?` through `encode_cafs_file_info` and `encode_side_effects_diff` (both were previously `-> ()`). - New `allocate_slot_returns_error_past_0x7f` test exhausts the inner-slot range directly and asserts the 64th allocation yields the typed error rather than panicking.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

`encode_pkg_files_index_value` was writing the top-level record schema as `[algo, files, requiresBuild?, sideEffects?]`, with optional fields tacked onto the end. The existing `decodes_requires_build_true` fixture in this same module was captured from msgpackr 1.11.8 and shows its schema order is `[algo, requiresBuild, files]` — i.e. msgpackr picks up pnpm's TypeScript declaration order (`manifest?`, `requiresBuild?`, `algo`, `files`, `sideEffects?`) as the JS object's insertion order. Reorder emission to match: - schema: `[requiresBuild?, algo, files, sideEffects?]` (still skipping `manifest` until it's wired through). - values: same order. Pnpm decodes either order because records carry field names with the schema, so this was never a correctness bug — but wire-level diffing against a pnpm-written reference row is a routine debugging exercise, and field-order drift makes that harder than it should be. New `encode_outer_field_order_matches_msgpackr` test decodes the outer schema's fixstr field-name array out of the encoder's bytes and asserts `["requiresBuild", "algo", "files"]`, so any future schema drift trips it.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two real catches from the reviewer, plus one preemptive fix: 1. Outer field order was wrong again. My previous "fix" put `requiresBuild` *before* `algo`, but the `decodes_requires_build_true` fixture right here in the module shows msgpackr actually emits `[algo, requiresBuild, files]` — `requiresBuild` goes **between** `algo` and `files`, not at the start. (The `decodes_side_effects` fixture corroborates the corresponding `sideEffects` position after `files`.) Fix the schema + value order to `[algo, requiresBuild?, files, sideEffects?]` and update `encode_outer_field_order_matches_msgpackr` to match. 2. `write_array_header`, `write_map_header`, `write_str` all cast a `usize` length to `u32` unconditionally when emitting the 32- bit variant. On 64-bit hosts a `usize > u32::MAX` would silently truncate to a corrupt length prefix. Swap to `u32::try_from(n).expect(...)` so overflow panics with a clear message instead. Not reachable for any realistic pacquet payload (no tarball ships with 4B+ files), but the silent corruption mode is worse than a loud crash if it's ever hit.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.

…261) * perf(store-dir): share one read-only StoreIndex across cache lookups `DownloadTarballToStore::load_cached_cas_paths` previously opened a fresh SQLite `Connection::open_with_flags` + `PRAGMA busy_timeout` on every snapshot, so a 1352-package frozen-lockfile install paid 1352 connection-opens just to read cache hints — all serialized on tokio's blocking pool because `spawn_blocking` can't parallelize opens against one file. Open a `StoreIndex` once per install, wrap it in `Arc<Mutex<…>>` (rusqlite `Connection` is `Send` but not `Sync`), and thread the handle through `CreateVirtualStore`, `InstallWithoutLockfile`, and `DownloadTarballToStore`. Each lookup now takes the mutex just for the sub-millisecond `SELECT`, releases it before the per-file validation pass, and frees the Arc when the install finishes. Measured on the #260 repro (1352 packages, release build, macOS APFS, warm CAS store): `rm -rf node_modules && pacquet install --frozen- lockfile` drops from ~17.4 s to ~13.9 s, with CPU utilization rising from ~135 % to ~160 % as the serialized-open bottleneck clears. * address review feedback * `create_virtual_store.rs`: bind `store_index.as_ref()` to a local and use `async move` so the closure's copy of the `Option<&…>` is moved into each future, rather than borrowing from the closure's per-iteration scope. * `load_cached_cas_paths`: treat a poisoned index mutex as a cache miss rather than propagating the panic. The `SELECT` is stateless, so the prior panic can't have left the index in an inconsistent shape, and cache lookups are a best-effort hint — falling through to a fresh download is strictly safer than crashing every subsequent snapshot. * park StoreIndex open on the blocking pool Copilot flagged that `StoreIndex::shared_readonly_in` does synchronous SQLite work (`Connection::open_with_flags` + `PRAGMA busy_timeout`) and calling it directly in an `async fn` stalls the tokio reactor thread for the duration of the open. It's once per install, not a hot spot, but matches the pattern the store-index writer already uses and costs us nothing to keep consistent. Wrap both call sites (frozen-lockfile and no-lockfile install paths) in `tokio::task::spawn_blocking` and await the result before kicking off concurrent package work. Also moves `tokio` from `dev-dependencies` into `dependencies` in package-manager's `Cargo.toml` since it's now a runtime dep for the library code, not just tests. * degrade JoinError on StoreIndex open to a cache miss `spawn_blocking(...).await` returns `Result<_, JoinError>` which fires on blocking-task panic or on cancellation during runtime shutdown. Panicking on that turns a transient runtime issue into a hard crash for an install that could otherwise succeed — cache lookups just miss. Drop the `.expect(...)` and degrade to `None` via `.ok().flatten()`. `shared_readonly_in` already yields `None` for a first-time install against an empty store, and every downstream caller already handles that shape, so this is the same failure mode the code already copes with — just reached from a different source. * use async move in install_dependencies_from_registry map closure Makes the recursive no-lockfile closure's captures explicit: bind `&node_modules_path` to a local `node_modules_path_ref` (Copy) and switch to `async move` so every captured variable is moved (copied, for all the `&T` and `Option<&T>` captures) into the future instead of borrowing the closure's per-iteration stack frame. Matches the pattern already used in `create_virtual_store.rs` and in the outer no-lockfile closure — keeps all three parallel-map sites consistent. * log JoinError before degrading to cache miss on StoreIndex open The silent `.ok().flatten()` hid two failure modes behind a cache miss: a panic inside the blocking task, and task cancellation during runtime shutdown. Neither is a crash-worthy error, but a blocking-task panic during install really should be visible somewhere. Promote the two call sites to a `match` and emit a `tracing::warn!` on the `Err(JoinError)` branch before falling back to `None`. Uses the `pacquet::install` target in common with the rest of this crate's tracing. Degradation path is unchanged. * chore: cargo fmt * log JoinError in tarball cache lookup, refresh stale comment Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.

zkochan requested a review from Copilot April 23, 2026 22:28

Copilot started reviewing on behalf of zkochan April 23, 2026 22:28 View session