feat(store): decode pnpm-written msgpackr-records rows in index.db by zkochan · Pull Request #251 · pnpm/pacquet

zkochan · 2026-04-23T19:48:33Z

Summary

Closes the last open item on #244: pacquet can now read index.db rows that pnpm wrote. Before this PR, StoreIndex::get would fail to decode anything pnpm produced (msgpackr records), silently treat it as a cache miss, and trigger a full re-download — so the "shared v11 store" was only shared one direction.

New crates/store-dir/src/msgpackr_records.rs — a focused transcoder that walks a msgpackr-records byte stream and emits equivalent plain MessagePack by expanding each record instance into a string-keyed map. Keeping this as a transcoder (not a fresh Deserializer) lets us reuse the existing Deserialize derive on PackageFilesIndex.
StoreIndex::get sniffs the leading d4 72 bytes and routes msgpackr-records rows through the transcoder before handing the result to rmp_serde. pacquet-written rows still take the zero-copy fast path.
The transcoder narrows integer-valued float 64 / float 32 payloads to uint 64. msgpackr emits JS Number as a double whenever the value exceeds int32 range, so millisecond timestamps like checkedAt = 1_700_000_000_000 arrive as floats even though they're semantically ints, and rmp_serde rejects floats for our size: u64 / checked_at: Option<u128> fields. Non-integer floats (π, etc.) pass through unchanged.
Drops #[ignore] on pnpm_compatibility::same_index_file_contents. End-to-end: pnpm install writes rows, pacquet reads them back, and the snapshot of decoded contents now matches what pacquet itself would have written.

Why a transcoder, not a bespoke format

I considered having pacquet use its own index format in a separate db. Tradeoffs ruled against it: the shared-store promise of #244 requires the two tools to agree on at least the format, and bypassing records would regress on encode size (records buys ~30-50% on tarballs with many files) without moving any other dial.

Wire format recap (from the msgpackr README + real bytes)

Record def: d4 72 <slot> (fixext1, ext type 0x72 'r', 1-byte payload = slot id in 0x40..=0x7f) + msgpack array of N field names + N raw msgpack values (the first instance, inlined).
Record ref (subsequent instances): single byte in 0x40..=0x7f + N raw msgpack values.
Everything else is vanilla MessagePack. Despite moreTypes: true, pnpm's payloads encode JS Map as standard fixmap / map16 / map32 — no ext wrapper.

Test plan

cargo test -p pacquet-store-dir --lib — 22 passing, 10 new in msgpackr_records::tests using fixture bytes captured from msgpackr 1.11.8 in the pnpm v11 worktree:
- decodes_one_file_fixture_from_msgpackr / decodes_two_file_fixture_with_record_reuse — round-trip + slot reuse
- decodes_requires_build_true / decodes_file_without_checked_at / decodes_side_effects — nested records and optional fields
- passes_through_plain_msgpack_unchanged — guards against the transcoder mangling pacquet-written rows
- non_integer_floats_pass_through / integer_valued_float32_is_narrowed_to_uint64 — float-narrowing behaviour
- rejects_reference_to_unknown_slot / rejects_truncated_buffer — error paths
- store_index::tests::get_decodes_msgpackr_records_rows — end-to-end through StoreIndex::get via a direct SQL insert
cargo test -p pacquet-cli --test pnpm_compatibility same_index_file_contents — previously #[ignore]d, now passes. The test has pnpm install a package from the registry-mock, then has pacquet decode pnpm's index.db rows and assert the decoded shape matches what pacquet itself writes.
cargo test -p pacquet-tarball --lib — 7 passing, confirms the read-cache path in run_without_mem_cache (which calls index.get) automatically benefits with no further wiring.
cargo build --workspace --all-targets
cargo clippy --workspace --all-targets -- -D warnings
cargo fmt --all -- --check

Closes #244.

Finishes the last open item on #244. pnpm packs every PackageFilesIndex with `new Packr({ useRecords: true, moreTypes: true })`. Records use a msgpackr-private extension (ext type 0x72, "r") that rewrites the string keys of repeated same-shape structs into 1-byte slot references, so the payload `rmp_serde` saw in a pnpm-written index.db row was effectively garbage — the cache-lookup in `StoreIndex::get` would error, pacquet would fall through to a fresh download, and the whole "shared v11 store" promise evaporated for rows pnpm wrote first. - New `crates/store-dir/src/msgpackr_records.rs`: a focused transcoder that walks a msgpackr-records byte stream and emits equivalent plain MessagePack (expanding each record instance into a string-keyed map). Keeping this as a transcoder — rather than a fresh Deserializer — lets us reuse the existing `Deserialize` derive for `PackageFilesIndex`. 270 lines of decoder + 10 unit tests using real fixture bytes captured from msgpackr 1.11.8 (via `store/index/node_modules/msgpackr` in the pnpm v11 worktree). - `StoreIndex::get` now sniffs the leading `d4 72` bytes and routes msgpackr-records rows through the transcoder before handing the result to `rmp_serde`. Pacquet-written rows (plain msgpack) still take the zero-copy fast path — no transcoding, no extra allocation. - The transcoder also narrows integer-valued `float 64`/`float 32` values to `uint 64`. Msgpackr emits JS Number as float whenever the value exceeds int32 range, so millisecond timestamps like `checkedAt = 1_700_000_000_000` arrive as a double even though they're semantically ints — and `rmp_serde` rejects floats for our integer-typed `size: u64` / `checked_at: Option<u128>` fields. Non-integer floats (π and friends) pass through unchanged. - Drops `#[ignore]` on `crates/cli/tests/pnpm_compatibility::same_index_file_contents`. End-to-end: `pnpm install` writes rows, pacquet reads them back, and the snapshot of decoded contents now matches what pacquet itself would have written. Refs #244.

codecov · 2026-04-23T19:53:24Z

Codecov Report

❌ Patch coverage is 83.18386% with 75 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.62%. Comparing base (15741af) to head (77867a3).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/store-dir/src/msgpackr_records.rs	81.72%	74 Missing ⚠️
crates/store-dir/src/store_index.rs	97.43%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #251      +/-   ##
==========================================
- Coverage   88.26%   87.62%   -0.64%     
==========================================
  Files          56       57       +1     
  Lines        3179     3621     +442     
==========================================
+ Hits         2806     3173     +367     
- Misses        373      448      +75

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds compatibility for pnpm v11’s index.db rows by decoding msgpackr “records” payloads so pacquet can correctly read pnpm-written package index entries from the shared store.

Changes:

Introduces a msgpackr-records → plain MessagePack transcoder (msgpackr_records.rs) and related tests.
Updates StoreIndex::get to detect pnpm-written rows (marker d4 72) and transcode before rmp_serde deserialization.
Re-enables the pnpm compatibility test for same_index_file_contents now that pnpm-written rows decode successfully.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
crates/store-dir/src/store_index.rs	Sniffs msgpackr-records marker and routes pnpm rows through a transcode+decode path; adds end-to-end decoding test.
crates/store-dir/src/msgpackr_records.rs	New transcoder that expands msgpackr record instances into string-keyed maps and narrows integer-valued floats; includes unit tests.
crates/store-dir/src/lib.rs	Exposes the new transcoder module via re-export.
crates/cli/tests/pnpm_compatibility.rs	Drops `#[ignore]` and updates commentary since pnpm index rows are now readable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-23T19:53:37Z

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     19.7±0.88ms   219.8 KB/sec    1.00     19.8±0.95ms   219.1 KB/sec

Symmetric follow-up to the previous commit: with just msgpackr-records decoding in place, pacquet can read what pnpm wrote, but pnpm could not reliably read what *pacquet* wrote. Verified empirically by feeding pacquet's output to `new Packr({useRecords: true, moreTypes: true})` from the pnpm v11 workspace's msgpackr 1.11.8. Two issues surfaced, both in how pacquet encoded `CafsFileInfo`: 1. `checked_at: Option<u128>` — `rmp_serde` has no native MessagePack integer encoding for `u128` and fell back to writing the value as a `bin 16` (16 raw big-endian bytes). msgpackr decoded the field as a `Uint8Array`, not a timestamp, so pnpm's integrity check would have silently misbehaved. Narrow the type to `u64` — millisecond timestamps fit in `u64` until year ~584M, and `u64` maps cleanly to MessagePack's integer encodings. 2. Even after the u128 → u64 fix, writing as `uint 64` (`cf`) made msgpackr return the value as a JS `BigInt`. pnpm's CAFS integrity check does `mtimeMs - (checkedAt ?? 0)`; mixing `Number` and `BigInt` throws `TypeError: Cannot mix BigInt and other types` at runtime. Pnpm itself sidesteps this because JS Number is a double — msgpackr packs large integer-valued Numbers as `float 64` (`cb`). Match that: add a `serialize_with` that emits `checked_at` as `float 64`. Verified: msgpackr now returns the value as `typeof === 'number'`, bit-identical byte encoding to what pnpm itself produces. Supporting read-side changes: - Add `deserialize_with` on `checked_at` that accepts either integer or integer-valued float. Required because plain msgpack reads bypass the records transcoder (the transcoder would reinterpret legitimate positive fixints in 0x40..=0x7f as record slot references, which is correct under records mode but wrong for pacquet-written rows). - Restore the sniff on `decode_index_value` — only invoke the transcoder when the bytes begin with the msgpackr records marker `d4 72`. An earlier version of this commit tried "always transcode" for simplicity; it broke every pacquet-written row whose `size` or `mode` happened to land in the 0x40..=0x7f byte range. Test updates: - `round_trips_plain_msgpack_through_transcoder` replaces `passes_through_plain_msgpack_unchanged`: the byte-for-byte guarantee no longer holds (the transcoder now narrows integer-valued floats), but the decoded-struct round-trip still does. - Add `plain_msgpack_without_floats_passes_through_unchanged` to pin down the narrower pass-through guarantee for non-float inputs. No changes to `tarball/src/lib.rs` beyond the `as_millis() as u64` cast the u128 → u64 narrowing now requires.

Copilot review round one on #251. - Track whether the transcoder has seen a record definition yet, and only treat `0x40..=0x7f` as slot references once records mode is on. Until then those bytes are positive fixints (64..=127), the same as vanilla MessagePack. Before this change the transcoder couldn't be run on plain-msgpack input without corrupting it — pacquet rows with `size: 123` or similar would be mis-read as a reference to an undefined slot 0x7b. The old code sidestepped the problem with a sniff in `decode_index_value` (transcode only when bytes start with `d4 72`); records-mode tracking makes the transcoder safe for both encodings, so `decode_index_value` drops the sniff and always transcodes (which is necessary anyway for the float-to-uint narrowing on pacquet's own `checkedAt` field). - Replace the `v <= u64::MAX as f64` narrowing bound with a strict `v < 2f64.powi(64)`. `u64::MAX as f64` rounds *up* to 2^64 because 2^64 - 1 isn't exactly representable in `f64`, so the old bound accepted a literal 2^64 and silently saturated it to `u64::MAX` on cast. - Add `plain_positive_fixint_in_slot_range_passes_through` and `float64_equal_to_2_pow_64_passes_through` as regression guards for both fixes. - Drop the now-redundant custom `deserialize_with` on `CafsFileInfo::checked_at` — since all reads transcode first and the transcoder narrows integer-valued floats back to `uint 64`, the value always arrives as an integer for `rmp_serde` to deserialize. Refs #244.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…cro-tweaks Copilot review round two on #251. - Split `DecodeError::BadRecordDef` into three precise variants — `ExpectedArrayHeader`, `ExpectedStringHeader`, and `InvalidFieldNameUtf8`. The old shared variant carried a meaningless `count: 0 | 1` indicator that made messages like "Expected record-definition header (0 field-name strings in a msgpack array)" actively confusing. - Add `DecodeError::SlotOutOfRange` and reject any record definition whose slot byte isn't in `0x40..=0x7f`. msgpackr never emits one outside that range; accepting them before meant the slot was registered but unreachable, so the first reference-attempt elsewhere in the stream would have surfaced as an unrelated `UnknownSlot` error. - Drop three needless `to_vec()` allocations on the array16, array32, map16, and map32 header-copy paths — use `w.extend_from_slice(r.read_bytes(n)?)` directly. - Switch the `as_millis() as u64` cast in tarball extraction to `u64::try_from(...).ok()` so a hypothetical u128-overflowing timestamp drops to `None` instead of silently wrapping. - Fix a stale doc comment that still referred to an `Option<u128>` deserializer (now `Option<u64>`). - Rewrite the `StoreIndex::get` doc so it matches the current always-transcode implementation — no more "sniff the leading two bytes" language now that records-mode tracking makes unconditional transcoding safe. Refs #244.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Respond to two Copilot review notes: - Slot-reference decoding used to `.clone()` the full Vec<String> of field names per record instance. A 200-file row allocated 200 Vec<String>s plus one String per field per clone. Store the slot schema under `Rc<[String]>` so the reference path bumps a refcount instead. Single-thread transcoder, so `Rc` over `Arc`. - `StoreIndex::get` had a doc comment claiming a "sniff the leading two bytes" fast path that didn't exist — the implementation always transcodes. Explain why (pacquet writes `checkedAt` as float64 on purpose for msgpackr/pnpm interop, so every real row needs the transcoder to narrow it back), so the always-transcode behaviour reads as deliberate, not a missing optimisation.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-23T21:57:23Z

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command	Mean [s]	Min [s]	Max [s]	Relative
`pacquet@HEAD`	3.352 ± 0.036	3.295	3.400	1.00
`pacquet@main`	3.362 ± 0.027	3.331	3.408	1.00 ± 0.01

BENCHMARK_REPORT.json

{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 3.35226689518,
      "stddev": 0.03633311574723908,
      "median": 3.3458843170800003,
      "user": 0.15669046,
      "system": 0.20156964,
      "min": 3.2949777200800003,
      "max": 3.40018571508,
      "times": [
        3.39682071708,
        3.3266309130800003,
        3.32452622908,
        3.39818581408,
        3.33135331708,
        3.40018571508,
        3.35821989208,
        3.33520803508,
        3.2949777200800003,
        3.3565605990800003
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 3.36213749388,
      "stddev": 0.02738371711332662,
      "median": 3.36748964658,
      "user": 0.16164916,
      "system": 0.19736344,
      "min": 3.33132226808,
      "max": 3.4075607870800004,
      "times": [
        3.3671976630800002,
        3.3331965880800003,
        3.33762074808,
        3.3704619850800004,
        3.33132226808,
        3.3748614100800003,
        3.36778163008,
        3.4075607870800004,
        3.3341940930800003,
        3.39717776608
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Two Copilot nits: - Diagnostic codes said `pacquet_store_dir::msgpackr::…` even though the module is `msgpackr_records`. Match the `store_index::…` shape used elsewhere in the crate so codes map 1:1 to module paths. - `UnexpectedEof` said "msgpackr buffer", but the transcoder also runs on plain MessagePack (see the `get_decodes_…` path through `StoreIndex::get`). Reword to "MessagePack buffer" so the error reads correctly in either case.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…252) ## Summary Follow-up to #251. Pacquet's `StoreIndex::set` was writing `index.db` rows via `rmp_serde::to_vec_named` — plain MessagePack maps. pnpm reads the same file with `Packr({useRecords: true, moreTypes: true}).unpack(…)`, and in that mode **every msgpack map at any nesting level decodes as a JS `Map`**. Records are the escape hatch that tells msgpackr "this one's a plain object". So pacquet-written rows came back as a top-level `Map`, `pkgIndex.files` (a property access on that `Map`) was `undefined`, and pnpm's `for (const [f, fstat] of pkgIndex.files)` threw `files is not iterable`. That surfaced in CI as `ERR_PNPM_READ_FROM_STORE` the first time a benchmark run picked up a shared pnpm-v11 cache with pacquet-written rows in it — every downstream job's `pnpm install` crashed trying to decode them. #251 taught pacquet how to **read** pnpm's records; b0bed43 fixed the `checkedAt` encoding (float64 vs uint64-as-BigInt) and called the pnpm-reads-pacquet direction done — but the container shape was still wrong, and nothing in the test suite actually put a pacquet-written row back through msgpackr, so it slipped past. ## Fix New `encode_package_files_index` in `msgpackr_records.rs` that emits msgpackr-records bytes. Behaviour matches what msgpackr 1.11.8 itself emits for the same logical input (verified against the version pinned in pnpm v11's `catalog:`): - **Slot allocation is shape-aware.** Slot `0x40` is reserved for the top-level `PackageFilesIndex`. Inner slots in `0x41..=0x7f` are allocated lazily, one per distinct record shape, in first-seen order. A `CafsFileInfo` carrying `checkedAt` and one without it land in separate slots; same-shape instances downstream collapse to a single bare-slot byte, which is the record-compression win records exist for. - **Optional fields are omitted from the schema when `None`**, not encoded as `nil`. A `SideEffectsDiff { added: Some, deleted: None }` decodes on the pnpm side to `{ added: Map }`, not `{ added: Map, deleted: null }` — the same JS shape msgpackr itself would produce for the same Rust input, so downstream checks like `diff.added != null` / `diff.deleted != null` behave identically regardless of which tool wrote the row. - **`CafsFileInfo.checkedAt`** is written as `float 64` when `Some` (uint 64 would decode to a JS `BigInt` and crash pnpm's `mtimeMs - (checkedAt ?? 0)`), omitted from the schema when `None`. - **`manifest: Some(_)` returns `EncodeError::ManifestNotSupported`.** Pacquet doesn't populate it today; encoding `serde_json::Value` as records is follow-up work. The encoder also **promotes integer values in `0x40..=0x7f` to `uint 8`** instead of positive fixints. Inside a records stream those bytes are slot-reference markers — a bare `0x7b` would be parsed as a ref to slot `0x7b` rather than the integer `123`. msgpackr does the same promotion for the same reason. State is a tight `[Option<u8>; 2]` for `CafsFileInfo` slots and `[Option<u8>; 4]` for `SideEffectsDiff` slots, indexed by a shape bitmask — 2 and 4 are the respective maxima, so lookup is a cache-local array index with no `HashMap` allocation on the hot per-file path. `StoreIndex::set` now calls this encoder. `StoreIndex::get` goes through the existing transcoder which already handled records, so the read path needed no behaviour change — just a doc update to enumerate the three row flavours it now tolerates (pnpm-written records, current pacquet-written records, legacy pacquet plain-msgpack from rows that predate this PR). ## Why not a bespoke format? Considered writing pacquet's rows in a different shape (positional tuple, or keep plain msgpack and switch pnpm's side). Ruled out: the whole point of #244 and #251 is a shared `index.db` the two tools read each other's entries from. Any shape other than "what pnpm's msgpackr emits" breaks that promise. ## Performance Faster **and** smaller than `rmp_serde::to_vec_named`. Measured on a release build, 10k iterations per row-size, with a realistic `CafsFileInfo` shape (128-hex digest, millisecond timestamp): **Serialize time per row** | files | `rmp_serde` (before) | records encoder (after) | speedup | |---|---:|---:|---:| | 1 | 304 ns | 164 ns | **1.9×** | | 10 | 1.23 µs | 838 ns | **1.5×** | | 50 | 3.71 µs | 2.25 µs | **1.6×** | | 200 | 11.9 µs | 8.11 µs | **1.5×** | | 1000 | 55.9 µs | 27.7 µs | **2.0×** | **Row size on disk** | files | `rmp_serde` | records encoder | delta | |---|---:|---:|---:| | 1 | 214 B | 220 B | +6 B | | 10 | 1960 B | 1723 B | **−12%** | | 50 | 9722 B | 8405 B | **−14%** | | 200 | 38822 B | 33455 B | **−14%** | | 1000 | 194022 B | 167055 B | **−14%** | The 6-byte regression at `n=1` is the outer record header (`d4 72 40`) plus a slightly longer field-name array; from `n≥4` records come out smaller because each subsequent `CafsFileInfo` instance collapses to a single slot byte instead of re-emitting `"digest"`, `"mode"`, `"size"`, `"checkedAt"` (~20 bytes of field names). Why it's faster: no serde runtime indirection (no trait dispatch, no generic-over-Serializer code paths), single pre-sized `Vec`, direct byte writes, zero-alloc slot lookup via the fixed-size arrays. `StoreIndex::get` is unchanged, so read cost is identical to before. Tarball writes call `encode` once per package, so this shows up at ~10–50 µs/package — drowned out by the SQLite `INSERT` (~100–500 µs) either way, but the payload-size shrink compounds across thousands of packages in the shared v11 store. ## Test plan - [x] `cargo test -p pacquet-store-dir --lib` — **37 passing, 12 new** in `msgpackr_records::tests`: - `encode_emits_record_header_for_top_level_struct` — outer `0xd4 0x72 0x40` record-def header is present. - `encode_roundtrips_single_file` / `encode_roundtrips_many_files_sharing_one_slot` — single-file round-trip; N uniform-shape `CafsFileInfo`s produce exactly 2 record defs (outer + one inner shape), the rest are bare slot refs. - `encode_handles_fixint_in_slot_range_safely` — `size: 0x7b` round-trips without triggering `UnknownSlot`. - `encode_omits_checked_at_when_none` — `None` `checkedAt` isn't written as `nil`; the string `"checkedAt"` doesn't appear anywhere in the output. - `encode_allocates_separate_slots_for_distinct_cafs_shapes` — mixed `CafsFileInfo` shapes (some with `checkedAt`, some without) land in separate slots; same shape reuses. - `encode_requires_build_when_set` / `encode_omits_requires_build_when_none` — outer-record optional-field handling. - `encode_side_effects_roundtrip` — `SideEffectsDiff` with both fields populated. - `encode_side_effects_with_only_added_omits_deleted_field` — exact shape the reviewer flagged: `deleted: None` must not appear as a field name in the output. - `encode_allocates_separate_slots_for_distinct_side_effects_shapes` — two `SideEffectsDiff`s with distinct shapes land in separate slots. - `encode_rejects_manifest_some` — error path for the currently-unimplemented branch. - [x] `cargo test -p pacquet-cli --test pnpm_compatibility` — new `pnpm_reads_pacquet_written_rows` passes. pacquet installs into a store, `node_modules` is wiped, then `pnpm install --ignore-scripts` reinstalls against the same store. If the record encoding is wrong this reproduces the benchmark's `ERR_PNPM_READ_FROM_STORE`. Complements the existing `same_file_structure` / `same_index_file_contents` which clear the store between the pacquet and pnpm halves and therefore don't exercise this path. - [x] `cargo clippy --workspace --all-targets -- -D warnings` - [x] `cargo fmt --all -- --check` - [x] `cargo doc -p pacquet-store-dir --no-deps` — no warnings.

Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.

…261) * perf(store-dir): share one read-only StoreIndex across cache lookups `DownloadTarballToStore::load_cached_cas_paths` previously opened a fresh SQLite `Connection::open_with_flags` + `PRAGMA busy_timeout` on every snapshot, so a 1352-package frozen-lockfile install paid 1352 connection-opens just to read cache hints — all serialized on tokio's blocking pool because `spawn_blocking` can't parallelize opens against one file. Open a `StoreIndex` once per install, wrap it in `Arc<Mutex<…>>` (rusqlite `Connection` is `Send` but not `Sync`), and thread the handle through `CreateVirtualStore`, `InstallWithoutLockfile`, and `DownloadTarballToStore`. Each lookup now takes the mutex just for the sub-millisecond `SELECT`, releases it before the per-file validation pass, and frees the Arc when the install finishes. Measured on the #260 repro (1352 packages, release build, macOS APFS, warm CAS store): `rm -rf node_modules && pacquet install --frozen- lockfile` drops from ~17.4 s to ~13.9 s, with CPU utilization rising from ~135 % to ~160 % as the serialized-open bottleneck clears. * address review feedback * `create_virtual_store.rs`: bind `store_index.as_ref()` to a local and use `async move` so the closure's copy of the `Option<&…>` is moved into each future, rather than borrowing from the closure's per-iteration scope. * `load_cached_cas_paths`: treat a poisoned index mutex as a cache miss rather than propagating the panic. The `SELECT` is stateless, so the prior panic can't have left the index in an inconsistent shape, and cache lookups are a best-effort hint — falling through to a fresh download is strictly safer than crashing every subsequent snapshot. * park StoreIndex open on the blocking pool Copilot flagged that `StoreIndex::shared_readonly_in` does synchronous SQLite work (`Connection::open_with_flags` + `PRAGMA busy_timeout`) and calling it directly in an `async fn` stalls the tokio reactor thread for the duration of the open. It's once per install, not a hot spot, but matches the pattern the store-index writer already uses and costs us nothing to keep consistent. Wrap both call sites (frozen-lockfile and no-lockfile install paths) in `tokio::task::spawn_blocking` and await the result before kicking off concurrent package work. Also moves `tokio` from `dev-dependencies` into `dependencies` in package-manager's `Cargo.toml` since it's now a runtime dep for the library code, not just tests. * degrade JoinError on StoreIndex open to a cache miss `spawn_blocking(...).await` returns `Result<_, JoinError>` which fires on blocking-task panic or on cancellation during runtime shutdown. Panicking on that turns a transient runtime issue into a hard crash for an install that could otherwise succeed — cache lookups just miss. Drop the `.expect(...)` and degrade to `None` via `.ok().flatten()`. `shared_readonly_in` already yields `None` for a first-time install against an empty store, and every downstream caller already handles that shape, so this is the same failure mode the code already copes with — just reached from a different source. * use async move in install_dependencies_from_registry map closure Makes the recursive no-lockfile closure's captures explicit: bind `&node_modules_path` to a local `node_modules_path_ref` (Copy) and switch to `async move` so every captured variable is moved (copied, for all the `&T` and `Option<&T>` captures) into the future instead of borrowing the closure's per-iteration stack frame. Matches the pattern already used in `create_virtual_store.rs` and in the outer no-lockfile closure — keeps all three parallel-map sites consistent. * log JoinError before degrading to cache miss on StoreIndex open The silent `.ok().flatten()` hid two failure modes behind a cache miss: a panic inside the blocking task, and task cancellation during runtime shutdown. Neither is a crash-worthy error, but a blocking-task panic during install really should be visible somewhere. Promote the two call sites to a `match` and emit a `tracing::warn!` on the `Err(JoinError)` branch before falling back to `None`. Uses the `pacquet::install` target in common with the rest of this crate's tracing. Degradation path is unchanged. * chore: cargo fmt * log JoinError in tarball cache lookup, refresh stale comment Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.

zkochan marked this pull request as ready for review April 23, 2026 19:49

zkochan requested a review from Copilot April 23, 2026 19:49

Copilot started reviewing on behalf of zkochan April 23, 2026 19:49 View session