feat(store): decode pnpm-written msgpackr-records rows in index.db#251
Conversation
Finishes the last open item on #244. pnpm packs every PackageFilesIndex with `new Packr({ useRecords: true, moreTypes: true })`. Records use a msgpackr-private extension (ext type 0x72, "r") that rewrites the string keys of repeated same-shape structs into 1-byte slot references, so the payload `rmp_serde` saw in a pnpm-written index.db row was effectively garbage — the cache-lookup in `StoreIndex::get` would error, pacquet would fall through to a fresh download, and the whole "shared v11 store" promise evaporated for rows pnpm wrote first. - New `crates/store-dir/src/msgpackr_records.rs`: a focused transcoder that walks a msgpackr-records byte stream and emits equivalent plain MessagePack (expanding each record instance into a string-keyed map). Keeping this as a transcoder — rather than a fresh Deserializer — lets us reuse the existing `Deserialize` derive for `PackageFilesIndex`. 270 lines of decoder + 10 unit tests using real fixture bytes captured from msgpackr 1.11.8 (via `store/index/node_modules/msgpackr` in the pnpm v11 worktree). - `StoreIndex::get` now sniffs the leading `d4 72` bytes and routes msgpackr-records rows through the transcoder before handing the result to `rmp_serde`. Pacquet-written rows (plain msgpack) still take the zero-copy fast path — no transcoding, no extra allocation. - The transcoder also narrows integer-valued `float 64`/`float 32` values to `uint 64`. Msgpackr emits JS Number as float whenever the value exceeds int32 range, so millisecond timestamps like `checkedAt = 1_700_000_000_000` arrive as a double even though they're semantically ints — and `rmp_serde` rejects floats for our integer-typed `size: u64` / `checked_at: Option<u128>` fields. Non-integer floats (π and friends) pass through unchanged. - Drops `#[ignore]` on `crates/cli/tests/pnpm_compatibility::same_index_file_contents`. End-to-end: `pnpm install` writes rows, pacquet reads them back, and the snapshot of decoded contents now matches what pacquet itself would have written. Refs #244.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #251 +/- ##
==========================================
- Coverage 88.26% 87.62% -0.64%
==========================================
Files 56 57 +1
Lines 3179 3621 +442
==========================================
+ Hits 2806 3173 +367
- Misses 373 448 +75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds compatibility for pnpm v11’s index.db rows by decoding msgpackr “records” payloads so pacquet can correctly read pnpm-written package index entries from the shared store.
Changes:
- Introduces a msgpackr-records → plain MessagePack transcoder (
msgpackr_records.rs) and related tests. - Updates
StoreIndex::getto detect pnpm-written rows (markerd4 72) and transcode beforermp_serdedeserialization. - Re-enables the pnpm compatibility test for
same_index_file_contentsnow that pnpm-written rows decode successfully.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/store-dir/src/store_index.rs | Sniffs msgpackr-records marker and routes pnpm rows through a transcode+decode path; adds end-to-end decoding test. |
| crates/store-dir/src/msgpackr_records.rs | New transcoder that expands msgpackr record instances into string-keyed maps and narrows integer-valued floats; includes unit tests. |
| crates/store-dir/src/lib.rs | Exposes the new transcoder module via re-export. |
| crates/cli/tests/pnpm_compatibility.rs | Drops #[ignore] and updates commentary since pnpm index rows are now readable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Micro-Benchmark ResultsLinux |
Symmetric follow-up to the previous commit: with just msgpackr-records
decoding in place, pacquet can read what pnpm wrote, but pnpm could
not reliably read what *pacquet* wrote. Verified empirically by
feeding pacquet's output to `new Packr({useRecords: true, moreTypes:
true})` from the pnpm v11 workspace's msgpackr 1.11.8.
Two issues surfaced, both in how pacquet encoded `CafsFileInfo`:
1. `checked_at: Option<u128>` — `rmp_serde` has no native MessagePack
integer encoding for `u128` and fell back to writing the value as a
`bin 16` (16 raw big-endian bytes). msgpackr decoded the field as
a `Uint8Array`, not a timestamp, so pnpm's integrity check would
have silently misbehaved. Narrow the type to `u64` — millisecond
timestamps fit in `u64` until year ~584M, and `u64` maps cleanly
to MessagePack's integer encodings.
2. Even after the u128 → u64 fix, writing as `uint 64` (`cf`) made
msgpackr return the value as a JS `BigInt`. pnpm's CAFS integrity
check does `mtimeMs - (checkedAt ?? 0)`; mixing `Number` and
`BigInt` throws `TypeError: Cannot mix BigInt and other types` at
runtime. Pnpm itself sidesteps this because JS Number is a double
— msgpackr packs large integer-valued Numbers as `float 64` (`cb`).
Match that: add a `serialize_with` that emits `checked_at` as
`float 64`. Verified: msgpackr now returns the value as
`typeof === 'number'`, bit-identical byte encoding to what pnpm
itself produces.
Supporting read-side changes:
- Add `deserialize_with` on `checked_at` that accepts either integer
or integer-valued float. Required because plain msgpack reads bypass
the records transcoder (the transcoder would reinterpret legitimate
positive fixints in 0x40..=0x7f as record slot references, which is
correct under records mode but wrong for pacquet-written rows).
- Restore the sniff on `decode_index_value` — only invoke the
transcoder when the bytes begin with the msgpackr records marker
`d4 72`. An earlier version of this commit tried "always transcode"
for simplicity; it broke every pacquet-written row whose `size` or
`mode` happened to land in the 0x40..=0x7f byte range.
Test updates:
- `round_trips_plain_msgpack_through_transcoder` replaces
`passes_through_plain_msgpack_unchanged`: the byte-for-byte
guarantee no longer holds (the transcoder now narrows integer-valued
floats), but the decoded-struct round-trip still does.
- Add `plain_msgpack_without_floats_passes_through_unchanged` to pin
down the narrower pass-through guarantee for non-float inputs.
No changes to `tarball/src/lib.rs` beyond the `as_millis() as u64`
cast the u128 → u64 narrowing now requires.
Copilot review round one on #251. - Track whether the transcoder has seen a record definition yet, and only treat `0x40..=0x7f` as slot references once records mode is on. Until then those bytes are positive fixints (64..=127), the same as vanilla MessagePack. Before this change the transcoder couldn't be run on plain-msgpack input without corrupting it — pacquet rows with `size: 123` or similar would be mis-read as a reference to an undefined slot 0x7b. The old code sidestepped the problem with a sniff in `decode_index_value` (transcode only when bytes start with `d4 72`); records-mode tracking makes the transcoder safe for both encodings, so `decode_index_value` drops the sniff and always transcodes (which is necessary anyway for the float-to-uint narrowing on pacquet's own `checkedAt` field). - Replace the `v <= u64::MAX as f64` narrowing bound with a strict `v < 2f64.powi(64)`. `u64::MAX as f64` rounds *up* to 2^64 because 2^64 - 1 isn't exactly representable in `f64`, so the old bound accepted a literal 2^64 and silently saturated it to `u64::MAX` on cast. - Add `plain_positive_fixint_in_slot_range_passes_through` and `float64_equal_to_2_pow_64_passes_through` as regression guards for both fixes. - Drop the now-redundant custom `deserialize_with` on `CafsFileInfo::checked_at` — since all reads transcode first and the transcoder narrows integer-valued floats back to `uint 64`, the value always arrives as an integer for `rmp_serde` to deserialize. Refs #244.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…cro-tweaks Copilot review round two on #251. - Split `DecodeError::BadRecordDef` into three precise variants — `ExpectedArrayHeader`, `ExpectedStringHeader`, and `InvalidFieldNameUtf8`. The old shared variant carried a meaningless `count: 0 | 1` indicator that made messages like "Expected record-definition header (0 field-name strings in a msgpack array)" actively confusing. - Add `DecodeError::SlotOutOfRange` and reject any record definition whose slot byte isn't in `0x40..=0x7f`. msgpackr never emits one outside that range; accepting them before meant the slot was registered but unreachable, so the first reference-attempt elsewhere in the stream would have surfaced as an unrelated `UnknownSlot` error. - Drop three needless `to_vec()` allocations on the array16, array32, map16, and map32 header-copy paths — use `w.extend_from_slice(r.read_bytes(n)?)` directly. - Switch the `as_millis() as u64` cast in tarball extraction to `u64::try_from(...).ok()` so a hypothetical u128-overflowing timestamp drops to `None` instead of silently wrapping. - Fix a stale doc comment that still referred to an `Option<u128>` deserializer (now `Option<u64>`). - Rewrite the `StoreIndex::get` doc so it matches the current always-transcode implementation — no more "sniff the leading two bytes" language now that records-mode tracking makes unconditional transcoding safe. Refs #244.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Respond to two Copilot review notes: - Slot-reference decoding used to `.clone()` the full Vec<String> of field names per record instance. A 200-file row allocated 200 Vec<String>s plus one String per field per clone. Store the slot schema under `Rc<[String]>` so the reference path bumps a refcount instead. Single-thread transcoder, so `Rc` over `Arc`. - `StoreIndex::get` had a doc comment claiming a "sniff the leading two bytes" fast path that didn't exist — the implementation always transcodes. Explain why (pacquet writes `checkedAt` as float64 on purpose for msgpackr/pnpm interop, so every real row needs the transcoder to narrow it back), so the always-transcode behaviour reads as deliberate, not a missing optimisation.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 3.35226689518,
"stddev": 0.03633311574723908,
"median": 3.3458843170800003,
"user": 0.15669046,
"system": 0.20156964,
"min": 3.2949777200800003,
"max": 3.40018571508,
"times": [
3.39682071708,
3.3266309130800003,
3.32452622908,
3.39818581408,
3.33135331708,
3.40018571508,
3.35821989208,
3.33520803508,
3.2949777200800003,
3.3565605990800003
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pacquet@main",
"mean": 3.36213749388,
"stddev": 0.02738371711332662,
"median": 3.36748964658,
"user": 0.16164916,
"system": 0.19736344,
"min": 3.33132226808,
"max": 3.4075607870800004,
"times": [
3.3671976630800002,
3.3331965880800003,
3.33762074808,
3.3704619850800004,
3.33132226808,
3.3748614100800003,
3.36778163008,
3.4075607870800004,
3.3341940930800003,
3.39717776608
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
]
} |
Two Copilot nits: - Diagnostic codes said `pacquet_store_dir::msgpackr::…` even though the module is `msgpackr_records`. Match the `store_index::…` shape used elsewhere in the crate so codes map 1:1 to module paths. - `UnexpectedEof` said "msgpackr buffer", but the transcoder also runs on plain MessagePack (see the `get_decodes_…` path through `StoreIndex::get`). Reword to "MessagePack buffer" so the error reads correctly in either case.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…252) ## Summary Follow-up to #251. Pacquet's `StoreIndex::set` was writing `index.db` rows via `rmp_serde::to_vec_named` — plain MessagePack maps. pnpm reads the same file with `Packr({useRecords: true, moreTypes: true}).unpack(…)`, and in that mode **every msgpack map at any nesting level decodes as a JS `Map`**. Records are the escape hatch that tells msgpackr "this one's a plain object". So pacquet-written rows came back as a top-level `Map`, `pkgIndex.files` (a property access on that `Map`) was `undefined`, and pnpm's `for (const [f, fstat] of pkgIndex.files)` threw `files is not iterable`. That surfaced in CI as `ERR_PNPM_READ_FROM_STORE` the first time a benchmark run picked up a shared pnpm-v11 cache with pacquet-written rows in it — every downstream job's `pnpm install` crashed trying to decode them. #251 taught pacquet how to **read** pnpm's records; b0bed43 fixed the `checkedAt` encoding (float64 vs uint64-as-BigInt) and called the pnpm-reads-pacquet direction done — but the container shape was still wrong, and nothing in the test suite actually put a pacquet-written row back through msgpackr, so it slipped past. ## Fix New `encode_package_files_index` in `msgpackr_records.rs` that emits msgpackr-records bytes. Behaviour matches what msgpackr 1.11.8 itself emits for the same logical input (verified against the version pinned in pnpm v11's `catalog:`): - **Slot allocation is shape-aware.** Slot `0x40` is reserved for the top-level `PackageFilesIndex`. Inner slots in `0x41..=0x7f` are allocated lazily, one per distinct record shape, in first-seen order. A `CafsFileInfo` carrying `checkedAt` and one without it land in separate slots; same-shape instances downstream collapse to a single bare-slot byte, which is the record-compression win records exist for. - **Optional fields are omitted from the schema when `None`**, not encoded as `nil`. A `SideEffectsDiff { added: Some, deleted: None }` decodes on the pnpm side to `{ added: Map }`, not `{ added: Map, deleted: null }` — the same JS shape msgpackr itself would produce for the same Rust input, so downstream checks like `diff.added != null` / `diff.deleted != null` behave identically regardless of which tool wrote the row. - **`CafsFileInfo.checkedAt`** is written as `float 64` when `Some` (uint 64 would decode to a JS `BigInt` and crash pnpm's `mtimeMs - (checkedAt ?? 0)`), omitted from the schema when `None`. - **`manifest: Some(_)` returns `EncodeError::ManifestNotSupported`.** Pacquet doesn't populate it today; encoding `serde_json::Value` as records is follow-up work. The encoder also **promotes integer values in `0x40..=0x7f` to `uint 8`** instead of positive fixints. Inside a records stream those bytes are slot-reference markers — a bare `0x7b` would be parsed as a ref to slot `0x7b` rather than the integer `123`. msgpackr does the same promotion for the same reason. State is a tight `[Option<u8>; 2]` for `CafsFileInfo` slots and `[Option<u8>; 4]` for `SideEffectsDiff` slots, indexed by a shape bitmask — 2 and 4 are the respective maxima, so lookup is a cache-local array index with no `HashMap` allocation on the hot per-file path. `StoreIndex::set` now calls this encoder. `StoreIndex::get` goes through the existing transcoder which already handled records, so the read path needed no behaviour change — just a doc update to enumerate the three row flavours it now tolerates (pnpm-written records, current pacquet-written records, legacy pacquet plain-msgpack from rows that predate this PR). ## Why not a bespoke format? Considered writing pacquet's rows in a different shape (positional tuple, or keep plain msgpack and switch pnpm's side). Ruled out: the whole point of #244 and #251 is a shared `index.db` the two tools read each other's entries from. Any shape other than "what pnpm's msgpackr emits" breaks that promise. ## Performance Faster **and** smaller than `rmp_serde::to_vec_named`. Measured on a release build, 10k iterations per row-size, with a realistic `CafsFileInfo` shape (128-hex digest, millisecond timestamp): **Serialize time per row** | files | `rmp_serde` (before) | records encoder (after) | speedup | |---|---:|---:|---:| | 1 | 304 ns | 164 ns | **1.9×** | | 10 | 1.23 µs | 838 ns | **1.5×** | | 50 | 3.71 µs | 2.25 µs | **1.6×** | | 200 | 11.9 µs | 8.11 µs | **1.5×** | | 1000 | 55.9 µs | 27.7 µs | **2.0×** | **Row size on disk** | files | `rmp_serde` | records encoder | delta | |---|---:|---:|---:| | 1 | 214 B | 220 B | +6 B | | 10 | 1960 B | 1723 B | **−12%** | | 50 | 9722 B | 8405 B | **−14%** | | 200 | 38822 B | 33455 B | **−14%** | | 1000 | 194022 B | 167055 B | **−14%** | The 6-byte regression at `n=1` is the outer record header (`d4 72 40`) plus a slightly longer field-name array; from `n≥4` records come out smaller because each subsequent `CafsFileInfo` instance collapses to a single slot byte instead of re-emitting `"digest"`, `"mode"`, `"size"`, `"checkedAt"` (~20 bytes of field names). Why it's faster: no serde runtime indirection (no trait dispatch, no generic-over-Serializer code paths), single pre-sized `Vec`, direct byte writes, zero-alloc slot lookup via the fixed-size arrays. `StoreIndex::get` is unchanged, so read cost is identical to before. Tarball writes call `encode` once per package, so this shows up at ~10–50 µs/package — drowned out by the SQLite `INSERT` (~100–500 µs) either way, but the payload-size shrink compounds across thousands of packages in the shared v11 store. ## Test plan - [x] `cargo test -p pacquet-store-dir --lib` — **37 passing, 12 new** in `msgpackr_records::tests`: - `encode_emits_record_header_for_top_level_struct` — outer `0xd4 0x72 0x40` record-def header is present. - `encode_roundtrips_single_file` / `encode_roundtrips_many_files_sharing_one_slot` — single-file round-trip; N uniform-shape `CafsFileInfo`s produce exactly 2 record defs (outer + one inner shape), the rest are bare slot refs. - `encode_handles_fixint_in_slot_range_safely` — `size: 0x7b` round-trips without triggering `UnknownSlot`. - `encode_omits_checked_at_when_none` — `None` `checkedAt` isn't written as `nil`; the string `"checkedAt"` doesn't appear anywhere in the output. - `encode_allocates_separate_slots_for_distinct_cafs_shapes` — mixed `CafsFileInfo` shapes (some with `checkedAt`, some without) land in separate slots; same shape reuses. - `encode_requires_build_when_set` / `encode_omits_requires_build_when_none` — outer-record optional-field handling. - `encode_side_effects_roundtrip` — `SideEffectsDiff` with both fields populated. - `encode_side_effects_with_only_added_omits_deleted_field` — exact shape the reviewer flagged: `deleted: None` must not appear as a field name in the output. - `encode_allocates_separate_slots_for_distinct_side_effects_shapes` — two `SideEffectsDiff`s with distinct shapes land in separate slots. - `encode_rejects_manifest_some` — error path for the currently-unimplemented branch. - [x] `cargo test -p pacquet-cli --test pnpm_compatibility` — new `pnpm_reads_pacquet_written_rows` passes. pacquet installs into a store, `node_modules` is wiped, then `pnpm install --ignore-scripts` reinstalls against the same store. If the record encoding is wrong this reproduces the benchmark's `ERR_PNPM_READ_FROM_STORE`. Complements the existing `same_file_structure` / `same_index_file_contents` which clear the store between the pacquet and pnpm halves and therefore don't exercise this path. - [x] `cargo clippy --workspace --all-targets -- -D warnings` - [x] `cargo fmt --all -- --check` - [x] `cargo doc -p pacquet-store-dir --no-deps` — no warnings.
Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.
…261) * perf(store-dir): share one read-only StoreIndex across cache lookups `DownloadTarballToStore::load_cached_cas_paths` previously opened a fresh SQLite `Connection::open_with_flags` + `PRAGMA busy_timeout` on every snapshot, so a 1352-package frozen-lockfile install paid 1352 connection-opens just to read cache hints — all serialized on tokio's blocking pool because `spawn_blocking` can't parallelize opens against one file. Open a `StoreIndex` once per install, wrap it in `Arc<Mutex<…>>` (rusqlite `Connection` is `Send` but not `Sync`), and thread the handle through `CreateVirtualStore`, `InstallWithoutLockfile`, and `DownloadTarballToStore`. Each lookup now takes the mutex just for the sub-millisecond `SELECT`, releases it before the per-file validation pass, and frees the Arc when the install finishes. Measured on the #260 repro (1352 packages, release build, macOS APFS, warm CAS store): `rm -rf node_modules && pacquet install --frozen- lockfile` drops from ~17.4 s to ~13.9 s, with CPU utilization rising from ~135 % to ~160 % as the serialized-open bottleneck clears. * address review feedback * `create_virtual_store.rs`: bind `store_index.as_ref()` to a local and use `async move` so the closure's copy of the `Option<&…>` is moved into each future, rather than borrowing from the closure's per-iteration scope. * `load_cached_cas_paths`: treat a poisoned index mutex as a cache miss rather than propagating the panic. The `SELECT` is stateless, so the prior panic can't have left the index in an inconsistent shape, and cache lookups are a best-effort hint — falling through to a fresh download is strictly safer than crashing every subsequent snapshot. * park StoreIndex open on the blocking pool Copilot flagged that `StoreIndex::shared_readonly_in` does synchronous SQLite work (`Connection::open_with_flags` + `PRAGMA busy_timeout`) and calling it directly in an `async fn` stalls the tokio reactor thread for the duration of the open. It's once per install, not a hot spot, but matches the pattern the store-index writer already uses and costs us nothing to keep consistent. Wrap both call sites (frozen-lockfile and no-lockfile install paths) in `tokio::task::spawn_blocking` and await the result before kicking off concurrent package work. Also moves `tokio` from `dev-dependencies` into `dependencies` in package-manager's `Cargo.toml` since it's now a runtime dep for the library code, not just tests. * degrade JoinError on StoreIndex open to a cache miss `spawn_blocking(...).await` returns `Result<_, JoinError>` which fires on blocking-task panic or on cancellation during runtime shutdown. Panicking on that turns a transient runtime issue into a hard crash for an install that could otherwise succeed — cache lookups just miss. Drop the `.expect(...)` and degrade to `None` via `.ok().flatten()`. `shared_readonly_in` already yields `None` for a first-time install against an empty store, and every downstream caller already handles that shape, so this is the same failure mode the code already copes with — just reached from a different source. * use async move in install_dependencies_from_registry map closure Makes the recursive no-lockfile closure's captures explicit: bind `&node_modules_path` to a local `node_modules_path_ref` (Copy) and switch to `async move` so every captured variable is moved (copied, for all the `&T` and `Option<&T>` captures) into the future instead of borrowing the closure's per-iteration stack frame. Matches the pattern already used in `create_virtual_store.rs` and in the outer no-lockfile closure — keeps all three parallel-map sites consistent. * log JoinError before degrading to cache miss on StoreIndex open The silent `.ok().flatten()` hid two failure modes behind a cache miss: a panic inside the blocking task, and task cancellation during runtime shutdown. Neither is a crash-worthy error, but a blocking-task panic during install really should be visible somewhere. Promote the two call sites to a `match` and emit a `tracing::warn!` on the `Err(JoinError)` branch before falling back to `None`. Uses the `pacquet::install` target in common with the rest of this crate's tracing. Degradation path is unchanged. * chore: cargo fmt * log JoinError in tarball cache lookup, refresh stale comment Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.
Summary
Closes the last open item on #244: pacquet can now read
index.dbrows that pnpm wrote. Before this PR,StoreIndex::getwould fail to decode anything pnpm produced (msgpackr records), silently treat it as a cache miss, and trigger a full re-download — so the "shared v11 store" was only shared one direction.crates/store-dir/src/msgpackr_records.rs— a focused transcoder that walks a msgpackr-records byte stream and emits equivalent plain MessagePack by expanding each record instance into a string-keyed map. Keeping this as a transcoder (not a freshDeserializer) lets us reuse the existingDeserializederive onPackageFilesIndex.StoreIndex::getsniffs the leadingd4 72bytes and routes msgpackr-records rows through the transcoder before handing the result tormp_serde. pacquet-written rows still take the zero-copy fast path.float 64/float 32payloads touint 64. msgpackr emits JS Number as a double whenever the value exceeds int32 range, so millisecond timestamps likecheckedAt = 1_700_000_000_000arrive as floats even though they're semantically ints, andrmp_serderejects floats for oursize: u64/checked_at: Option<u128>fields. Non-integer floats (π, etc.) pass through unchanged.#[ignore]onpnpm_compatibility::same_index_file_contents. End-to-end:pnpm installwrites rows, pacquet reads them back, and the snapshot of decoded contents now matches what pacquet itself would have written.Why a transcoder, not a bespoke format
I considered having pacquet use its own index format in a separate db. Tradeoffs ruled against it: the shared-store promise of #244 requires the two tools to agree on at least the format, and bypassing records would regress on encode size (records buys ~30-50% on tarballs with many files) without moving any other dial.
Wire format recap (from the msgpackr README + real bytes)
d4 72 <slot>(fixext1, ext type 0x72 'r', 1-byte payload = slot id in0x40..=0x7f) + msgpack array of N field names + N raw msgpack values (the first instance, inlined).0x40..=0x7f+ N raw msgpack values.moreTypes: true, pnpm's payloads encode JSMapas standardfixmap/map16/map32— no ext wrapper.Test plan
cargo test -p pacquet-store-dir --lib— 22 passing, 10 new inmsgpackr_records::testsusing fixture bytes captured from msgpackr 1.11.8 in the pnpm v11 worktree:decodes_one_file_fixture_from_msgpackr/decodes_two_file_fixture_with_record_reuse— round-trip + slot reusedecodes_requires_build_true/decodes_file_without_checked_at/decodes_side_effects— nested records and optional fieldspasses_through_plain_msgpack_unchanged— guards against the transcoder mangling pacquet-written rowsnon_integer_floats_pass_through/integer_valued_float32_is_narrowed_to_uint64— float-narrowing behaviourrejects_reference_to_unknown_slot/rejects_truncated_buffer— error pathsstore_index::tests::get_decodes_msgpackr_records_rows— end-to-end throughStoreIndex::getvia a direct SQL insertcargo test -p pacquet-cli --test pnpm_compatibility same_index_file_contents— previously#[ignore]d, now passes. The test has pnpm install a package from the registry-mock, then has pacquet decode pnpm'sindex.dbrows and assert the decoded shape matches what pacquet itself writes.cargo test -p pacquet-tarball --lib— 7 passing, confirms the read-cache path inrun_without_mem_cache(which callsindex.get) automatically benefits with no further wiring.cargo build --workspace --all-targetscargo clippy --workspace --all-targets -- -D warningscargo fmt --all -- --checkCloses #244.