fix(store): write index.db rows as msgpackr records for pnpm interop#252
Conversation
Pacquet's `StoreIndex::set` used `rmp_serde::to_vec_named`, which emits
plain MessagePack maps. pnpm reads `index.db` with
`Packr({useRecords: true, moreTypes: true}).unpack(…)`, and in that
mode every msgpack map at any nesting level decodes as a JS `Map` —
records are the escape hatch that tells msgpackr "this one's a plain
object". So pacquet-written rows came back as a top-level `Map`,
`pkgIndex.files` (a property access) was `undefined`, and pnpm's
`for (const [f, fstat] of pkgIndex.files)` threw
`files is not iterable`.
That surfaced in CI as `ERR_PNPM_READ_FROM_STORE` the first time a
benchmark run hit a shared pnpm-v11 cache that had pacquet-written
rows mixed in — every downstream job's `pnpm install` crashed trying
to decode them. #251 added records *decoding* but the symmetric
write path was wrong: `b0bed43` fixed the `checkedAt` encoding
(float64, not uint64-as-BigInt) and called the interop done, but
never actually round-tripped a pacquet-written row through msgpackr.
## Fix
New `encode_package_files_index` in `msgpackr_records.rs` that emits
msgpackr-records bytes directly:
- `PackageFilesIndex` → record at slot 0x40. `algo` and `files`
always; `requiresBuild` / `sideEffects` only when `Some`, matching
msgpackr's field-omit-when-absent convention for plain JS objects.
- `CafsFileInfo` → record at slot 0x41. Fixed `[digest, mode, size,
checkedAt]` schema so every file in a row shares one slot (msgpackr
would split shapes across slots; logical decoding is identical
either way and one slot keeps the encoder simple). `checked_at` is
`float 64` when `Some`, `nil` when `None` — either maps to pnpm's
`?? 0` coalesce.
- `SideEffectsDiff` → record at slot 0x42. Fixed `[added, deleted]`.
- `manifest: Some(_)` returns `EncodeError::ManifestNotSupported`
(pacquet doesn't populate it today; encoding `serde_json::Value`
records is follow-up work).
The encoder also promotes integer values in `0x40..=0x7f` to `uint 8`
instead of positive fixints. Inside a records stream those bytes are
slot-reference markers, so a bare `0x7b` would be parsed as a ref to
slot 0x7b rather than the integer 123. msgpackr does the same
promotion for the same reason.
`StoreIndex::set` now calls this encoder; `StoreIndex::get` goes
through the existing transcoder which already handled records, so
the read path needed no change beyond updating the doc comment that
still described the old plain-msgpack write path.
## Tests
- Nine new encoder tests in `msgpackr_records::tests`, all using the
encode → transcode → `rmp_serde::from_slice` round-trip. Covers
record header emission, slot reuse across many `CafsFileInfo`
instances, fixint-in-slot-range promotion, nil-for-missing
checkedAt, optional field inclusion/omission, side effects, and
the `manifest: Some` error path.
- New `pnpm_reads_pacquet_written_rows` integration test (in
`crates/cli/tests/pnpm_compatibility.rs`) — this is the end-to-end
catch: pacquet installs into a store, node_modules is wiped, pnpm
reinstalls against the same store. If the record encoding is wrong
this reproduces the benchmark's `ERR_PNPM_READ_FROM_STORE`. The
existing `same_file_structure` / `same_index_file_contents` tests
didn't exercise this path because they always cleared the store
between the pacquet and pnpm halves.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #252 +/- ##
==========================================
+ Coverage 87.62% 88.26% +0.63%
==========================================
Files 57 57
Lines 3621 4031 +410
==========================================
+ Hits 3173 3558 +385
- Misses 448 473 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes pnpm v11 interoperability by changing how pacquet serializes index.db rows: it now writes msgpackr records (matching pnpm’s Packr({ useRecords: true, moreTypes: true })) so pnpm can read pacquet-written rows without decoding them as JS Map and crashing.
Changes:
- Add a dedicated msgpackr-records encoder for
PackageFilesIndex(and nested types) with slot reuse and fixint-slot-range safety. - Switch
StoreIndex::setto store rows using the new records encoder and update associated docs/error types. - Add a CLI regression test that ensures
pnpm installcan read pacquet-writtenindex.dbrows from a shared store.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/store-dir/src/store_index.rs | Routes writes through the new msgpackr-records encoder; updates docs and error typing for encoding. |
| crates/store-dir/src/msgpackr_records.rs | Implements encode_package_files_index emitting msgpackr-records bytes with slot reuse and safe integer encoding. |
| crates/cli/tests/pnpm_compatibility.rs | Adds an end-to-end regression test where pnpm must successfully read pacquet-written store rows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Micro-Benchmark ResultsLinux |
msgpackr_records.rs now carries both directions; update the module-level doc so `cargo doc` / IDE summaries aren't introducing it as "Decoder for…" any more. Also spells out both interop failure modes (pnpm → pacquet and pacquet → pnpm) so a future reader understands why the write side couldn't just stay on `rmp_serde`.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 3.3783574037,
"stddev": 0.03816034891649137,
"median": 3.3881653729,
"user": 0.15992965999999997,
"system": 0.20683638,
"min": 3.3022906794,
"max": 3.4234185154,
"times": [
3.3542556734,
3.4234185154,
3.4104485134,
3.3022906794,
3.3929876124000002,
3.4175035734,
3.3381832024,
3.3914535354,
3.3681555214,
3.3848772104
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pacquet@main",
"mean": 3.3871463393,
"stddev": 0.09919247718343524,
"median": 3.3707923999,
"user": 0.15706785999999998,
"system": 0.20469468000000002,
"min": 3.2966378924,
"max": 3.6465850054,
"times": [
3.3260606404,
3.3983405754000002,
3.6465850054,
3.3213373174,
3.4286715284000002,
3.3378377714,
3.3744078624,
3.3683759164000002,
3.3732088834,
3.2966378924
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
]
} |
Both doc blocks still read as if pacquet had only just started
writing msgpackr-records and still mostly writes plain msgpack — the
opposite of the post-PR reality. Rephrase both so:
- `StoreIndex::get` opens with "rows come in three flavours" and
lists pnpm-written, current-pacquet-written, and legacy-pacquet
plain-msgpack, in that order.
- The `pnpm_reads_pacquet_written_rows` regression-test header leads
with the post-PR state ("pacquet now writes msgpackr-records") and
explains the old encoding as the regression to guard against,
rather than describing it as the current behaviour.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Previous encoder hardcoded one slot per record type with fixed 4-field
/ 2-field schemas, using `nil` for absent optional fields
(`checkedAt`, `added`, `deleted`). Produced bytes that *decoded* fine
but had a subtly different JS shape than msgpackr's own output —
`{added: Map, deleted: null}` instead of `{added: Map}`, for
instance. pnpm's own readers are nil-tolerant (`?? 0` / truthy
checks), so the actual on-disk effect today is zero, but third-party
tools that inspect the index might not be, and we shouldn't write
bytes that differ from what msgpackr would emit for the same input.
New encoder follows msgpackr's actual strategy (verified against
1.11.8 in the v11 worktree): each distinct **shape** of an inner
record type gets its own slot. A `CafsFileInfo { checked_at: Some }`
gets slot 0x41, `CafsFileInfo { checked_at: None }` gets 0x42,
`SideEffectsDiff { added: Some, deleted: None }` would then get 0x43,
etc. Same-shape instances downstream collapse to a single bare-slot
byte (still the compression win records exist for), while mixed-shape
streams get as many slots as they need without any one slot being
redefined mid-stream.
State:
struct EncodeState {
cafs_slots: [Option<u8>; 2], // indexed by cafs_shape
side_effects_slots: [Option<u8>; 4], // indexed by side_effects_shape
next_slot: u8,
}
Fixed-size arrays instead of `HashMap` because each record type has at
most 4 possible shapes — a 2-slot `Option<u8>` array is cache-local,
zero-alloc, and measurably faster than `HashMap::get` on the hot
per-file path. The 0x40..=0x7f slot range is 64 entries so we're
nowhere near running out. `allocate_slot` panics if we did, which
would mean >63 distinct record shapes in one stream; not possible for
current pacquet code.
Perf + size unchanged for the common uniform-shape case (every
`CafsFileInfo` has `checkedAt`) — same one def, same N-1 references.
Mixed shapes pay the def cost once per new shape and reuse thereafter,
exactly as msgpackr does. Re-ran the bench: ~2× faster than
`rmp_serde::to_vec_named` and 14% smaller output at 1000 files, same
numbers as the previous iteration within measurement noise.
Tests:
- Renamed `encode_handles_missing_checked_at_as_nil` →
`encode_omits_checked_at_when_none` and updated the assertion: no
`checkedAt` field name should appear in the output at all when
every instance has `None`.
- `encode_allocates_separate_slots_for_distinct_cafs_shapes`: mix
with/without-`checkedAt` files and assert exactly three record
defs (outer + the two inner shapes).
- `encode_side_effects_with_only_added_omits_deleted_field`: the
specific shape Copilot flagged — `SideEffectsDiff { added: Some,
deleted: None }` must not emit a `deleted` field name.
- `encode_allocates_separate_slots_for_distinct_side_effects_shapes`:
two `SideEffectsDiff`s with distinct shapes land in separate slots.
`write_nil` is now unused (no more nil-for-absent fields) and removed.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Doc block on `encode_package_files_index` still described the old fixed-schema implementation: single slot per Rust type, `[digest, mode, size, checkedAt]` with `nil`, `[added, deleted]` with `nil`. With shape-aware slot allocation in 9296125, none of that is true anymore. Rewrite the **Slot allocation** and **Optional-field handling** sections so they match the code: `0x40` is fixed for `PackageFilesIndex`, `0x41..=0x7f` are allocated lazily one-per-shape in first-seen order, optional fields are omitted from the schema rather than padded with `nil`. Spell out the pnpm-side consequence — a `SideEffectsDiff { added: Some, deleted: None }` now decodes to `{ added: Map }` not `{ added: Map, deleted: null }`. Also fix the stale `EncodeState::slot_for` link on `PKG_FILES_INDEX_SLOT` (the helper is `allocate_slot`) and drop the intra-doc links to the private `cafs_shape` / `side_effects_shape` fns that were triggering rustdoc private-item warnings.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two stale-docs follow-ups: - The encoder rustdoc said pacquet's output is "byte-identical" to msgpackr's. Not true — `HashMap` iteration order is nondeterministic, so `files` / `sideEffects` / `added` entries may be serialised in a different order from msgpackr's JS `Map` iteration. Soften to "wire-compatible" (same record schemas, same slot numbers, same value encodings; pnpm decodes the same JS shape) and explicitly call out that exact bytes can still differ. - The `same_index_file_contents` test header above `pnpm_compatibility.rs` still described pacquet as writing plain msgpack via `rmp_serde::to_vec_named`. Rewrite to "both tools now write msgpackr records; this test compares post-decode structs, not byte-identical rows".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Top-of-file Strategy section said the write side "allocates a slot per Rust struct type". Not true anymore — a single type can consume multiple slots when optional-field presence varies, because each distinct record shape gets its own slot. Reword so the module intro doesn't imply a 1:1 type↔slot mapping.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`EncodeState::allocate_slot` used `assert!` to guard against running
out of the 0x41..=0x7f inner-slot range. Unreachable today (pacquet
has at most 6 possible record shapes across `CafsFileInfo` and
`SideEffectsDiff`; 63 slots available), but `assert!` panics in
release, which is the wrong failure mode for a corrupted-or-unusual
payload — a typed error gets propagated cleanly up through
`StoreIndex::set` to the caller.
- New `EncodeError::OutOfRecordSlots { max }` variant.
- `allocate_slot` now returns `Result<u8, EncodeError>`.
- Propagated `?` through `encode_cafs_file_info` and
`encode_side_effects_diff` (both were previously `-> ()`).
- New `allocate_slot_returns_error_past_0x7f` test exhausts the
inner-slot range directly and asserts the 64th allocation yields
the typed error rather than panicking.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`encode_pkg_files_index_value` was writing the top-level record schema as `[algo, files, requiresBuild?, sideEffects?]`, with optional fields tacked onto the end. The existing `decodes_requires_build_true` fixture in this same module was captured from msgpackr 1.11.8 and shows its schema order is `[algo, requiresBuild, files]` — i.e. msgpackr picks up pnpm's TypeScript declaration order (`manifest?`, `requiresBuild?`, `algo`, `files`, `sideEffects?`) as the JS object's insertion order. Reorder emission to match: - schema: `[requiresBuild?, algo, files, sideEffects?]` (still skipping `manifest` until it's wired through). - values: same order. Pnpm decodes either order because records carry field names with the schema, so this was never a correctness bug — but wire-level diffing against a pnpm-written reference row is a routine debugging exercise, and field-order drift makes that harder than it should be. New `encode_outer_field_order_matches_msgpackr` test decodes the outer schema's fixstr field-name array out of the encoder's bytes and asserts `["requiresBuild", "algo", "files"]`, so any future schema drift trips it.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two real catches from the reviewer, plus one preemptive fix: 1. Outer field order was wrong again. My previous "fix" put `requiresBuild` *before* `algo`, but the `decodes_requires_build_true` fixture right here in the module shows msgpackr actually emits `[algo, requiresBuild, files]` — `requiresBuild` goes **between** `algo` and `files`, not at the start. (The `decodes_side_effects` fixture corroborates the corresponding `sideEffects` position after `files`.) Fix the schema + value order to `[algo, requiresBuild?, files, sideEffects?]` and update `encode_outer_field_order_matches_msgpackr` to match. 2. `write_array_header`, `write_map_header`, `write_str` all cast a `usize` length to `u32` unconditionally when emitting the 32- bit variant. On 64-bit hosts a `usize > u32::MAX` would silently truncate to a corrupt length prefix. Swap to `u32::try_from(n).expect(...)` so overflow panics with a clear message instead. Not reachable for any realistic pacquet payload (no tarball ships with 4B+ files), but the silent corruption mode is worse than a loud crash if it's ever hit.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.
…261) * perf(store-dir): share one read-only StoreIndex across cache lookups `DownloadTarballToStore::load_cached_cas_paths` previously opened a fresh SQLite `Connection::open_with_flags` + `PRAGMA busy_timeout` on every snapshot, so a 1352-package frozen-lockfile install paid 1352 connection-opens just to read cache hints — all serialized on tokio's blocking pool because `spawn_blocking` can't parallelize opens against one file. Open a `StoreIndex` once per install, wrap it in `Arc<Mutex<…>>` (rusqlite `Connection` is `Send` but not `Sync`), and thread the handle through `CreateVirtualStore`, `InstallWithoutLockfile`, and `DownloadTarballToStore`. Each lookup now takes the mutex just for the sub-millisecond `SELECT`, releases it before the per-file validation pass, and frees the Arc when the install finishes. Measured on the #260 repro (1352 packages, release build, macOS APFS, warm CAS store): `rm -rf node_modules && pacquet install --frozen- lockfile` drops from ~17.4 s to ~13.9 s, with CPU utilization rising from ~135 % to ~160 % as the serialized-open bottleneck clears. * address review feedback * `create_virtual_store.rs`: bind `store_index.as_ref()` to a local and use `async move` so the closure's copy of the `Option<&…>` is moved into each future, rather than borrowing from the closure's per-iteration scope. * `load_cached_cas_paths`: treat a poisoned index mutex as a cache miss rather than propagating the panic. The `SELECT` is stateless, so the prior panic can't have left the index in an inconsistent shape, and cache lookups are a best-effort hint — falling through to a fresh download is strictly safer than crashing every subsequent snapshot. * park StoreIndex open on the blocking pool Copilot flagged that `StoreIndex::shared_readonly_in` does synchronous SQLite work (`Connection::open_with_flags` + `PRAGMA busy_timeout`) and calling it directly in an `async fn` stalls the tokio reactor thread for the duration of the open. It's once per install, not a hot spot, but matches the pattern the store-index writer already uses and costs us nothing to keep consistent. Wrap both call sites (frozen-lockfile and no-lockfile install paths) in `tokio::task::spawn_blocking` and await the result before kicking off concurrent package work. Also moves `tokio` from `dev-dependencies` into `dependencies` in package-manager's `Cargo.toml` since it's now a runtime dep for the library code, not just tests. * degrade JoinError on StoreIndex open to a cache miss `spawn_blocking(...).await` returns `Result<_, JoinError>` which fires on blocking-task panic or on cancellation during runtime shutdown. Panicking on that turns a transient runtime issue into a hard crash for an install that could otherwise succeed — cache lookups just miss. Drop the `.expect(...)` and degrade to `None` via `.ok().flatten()`. `shared_readonly_in` already yields `None` for a first-time install against an empty store, and every downstream caller already handles that shape, so this is the same failure mode the code already copes with — just reached from a different source. * use async move in install_dependencies_from_registry map closure Makes the recursive no-lockfile closure's captures explicit: bind `&node_modules_path` to a local `node_modules_path_ref` (Copy) and switch to `async move` so every captured variable is moved (copied, for all the `&T` and `Option<&T>` captures) into the future instead of borrowing the closure's per-iteration stack frame. Matches the pattern already used in `create_virtual_store.rs` and in the outer no-lockfile closure — keeps all three parallel-map sites consistent. * log JoinError before degrading to cache miss on StoreIndex open The silent `.ok().flatten()` hid two failure modes behind a cache miss: a panic inside the blocking task, and task cancellation during runtime shutdown. Neither is a crash-worthy error, but a blocking-task panic during install really should be visible somewhere. Promote the two call sites to a `match` and emit a `tracing::warn!` on the `Err(JoinError)` branch before falling back to `None`. Uses the `pacquet::install` target in common with the rest of this crate's tracing. Degradation path is unchanged. * chore: cargo fmt * log JoinError in tarball cache lookup, refresh stale comment Mirror the tracing change from the index-open call sites: when the blocking cache-lookup task returns `Err(JoinError)`, emit a `tracing::warn!` with the error and the `cache_key` before degrading to `None`. Panic / cancellation stays diagnosable; the caller still falls through to a fresh download. Also drop the stale "msgpackr-records decoding is a follow-up" wording from the cache-lookup rationale — `StoreIndex::get` has decoded pnpm-written msgpackr-records since 0ea1252 (#251), and `transcode_to_plain_msgpack` has handled it on the read side since 90be4f6 (#252). Simplify to "an undecodable entry … falls through to the download path" so the comment reflects today's decoder.
Summary
Follow-up to #251. Pacquet's
StoreIndex::setwas writingindex.dbrows viarmp_serde::to_vec_named— plain MessagePack maps. pnpm reads the same file withPackr({useRecords: true, moreTypes: true}).unpack(…), and in that mode every msgpack map at any nesting level decodes as a JSMap. Records are the escape hatch that tells msgpackr "this one's a plain object".So pacquet-written rows came back as a top-level
Map,pkgIndex.files(a property access on thatMap) wasundefined, and pnpm'sfor (const [f, fstat] of pkgIndex.files)threwfiles is not iterable. That surfaced in CI asERR_PNPM_READ_FROM_STOREthe first time a benchmark run picked up a shared pnpm-v11 cache with pacquet-written rows in it — every downstream job'spnpm installcrashed trying to decode them.#251 taught pacquet how to read pnpm's records; b0bed43 fixed the
checkedAtencoding (float64 vs uint64-as-BigInt) and called the pnpm-reads-pacquet direction done — but the container shape was still wrong, and nothing in the test suite actually put a pacquet-written row back through msgpackr, so it slipped past.Fix
New
encode_package_files_indexinmsgpackr_records.rsthat emits msgpackr-records bytes. Behaviour matches what msgpackr 1.11.8 itself emits for the same logical input (verified against the version pinned in pnpm v11'scatalog:):0x40is reserved for the top-levelPackageFilesIndex. Inner slots in0x41..=0x7fare allocated lazily, one per distinct record shape, in first-seen order. ACafsFileInfocarryingcheckedAtand one without it land in separate slots; same-shape instances downstream collapse to a single bare-slot byte, which is the record-compression win records exist for.None, not encoded asnil. ASideEffectsDiff { added: Some, deleted: None }decodes on the pnpm side to{ added: Map }, not{ added: Map, deleted: null }— the same JS shape msgpackr itself would produce for the same Rust input, so downstream checks likediff.added != null/diff.deleted != nullbehave identically regardless of which tool wrote the row.CafsFileInfo.checkedAtis written asfloat 64whenSome(uint 64 would decode to a JSBigIntand crash pnpm'smtimeMs - (checkedAt ?? 0)), omitted from the schema whenNone.manifest: Some(_)returnsEncodeError::ManifestNotSupported. Pacquet doesn't populate it today; encodingserde_json::Valueas records is follow-up work.The encoder also promotes integer values in
0x40..=0x7ftouint 8instead of positive fixints. Inside a records stream those bytes are slot-reference markers — a bare0x7bwould be parsed as a ref to slot0x7brather than the integer123. msgpackr does the same promotion for the same reason.State is a tight
[Option<u8>; 2]forCafsFileInfoslots and[Option<u8>; 4]forSideEffectsDiffslots, indexed by a shape bitmask — 2 and 4 are the respective maxima, so lookup is a cache-local array index with noHashMapallocation on the hot per-file path.StoreIndex::setnow calls this encoder.StoreIndex::getgoes through the existing transcoder which already handled records, so the read path needed no behaviour change — just a doc update to enumerate the three row flavours it now tolerates (pnpm-written records, current pacquet-written records, legacy pacquet plain-msgpack from rows that predate this PR).Why not a bespoke format?
Considered writing pacquet's rows in a different shape (positional tuple, or keep plain msgpack and switch pnpm's side). Ruled out: the whole point of #244 and #251 is a shared
index.dbthe two tools read each other's entries from. Any shape other than "what pnpm's msgpackr emits" breaks that promise.Performance
Faster and smaller than
rmp_serde::to_vec_named. Measured on a release build, 10k iterations per row-size, with a realisticCafsFileInfoshape (128-hex digest, millisecond timestamp):Serialize time per row
rmp_serde(before)Row size on disk
rmp_serdeThe 6-byte regression at
n=1is the outer record header (d4 72 40) plus a slightly longer field-name array; fromn≥4records come out smaller because each subsequentCafsFileInfoinstance collapses to a single slot byte instead of re-emitting"digest","mode","size","checkedAt"(~20 bytes of field names). Why it's faster: no serde runtime indirection (no trait dispatch, no generic-over-Serializer code paths), single pre-sizedVec, direct byte writes, zero-alloc slot lookup via the fixed-size arrays.StoreIndex::getis unchanged, so read cost is identical to before. Tarball writes callencodeonce per package, so this shows up at ~10–50 µs/package — drowned out by the SQLiteINSERT(~100–500 µs) either way, but the payload-size shrink compounds across thousands of packages in the shared v11 store.Test plan
cargo test -p pacquet-store-dir --lib— 37 passing, 12 new inmsgpackr_records::tests:encode_emits_record_header_for_top_level_struct— outer0xd4 0x72 0x40record-def header is present.encode_roundtrips_single_file/encode_roundtrips_many_files_sharing_one_slot— single-file round-trip; N uniform-shapeCafsFileInfos produce exactly 2 record defs (outer + one inner shape), the rest are bare slot refs.encode_handles_fixint_in_slot_range_safely—size: 0x7bround-trips without triggeringUnknownSlot.encode_omits_checked_at_when_none—NonecheckedAtisn't written asnil; the string"checkedAt"doesn't appear anywhere in the output.encode_allocates_separate_slots_for_distinct_cafs_shapes— mixedCafsFileInfoshapes (some withcheckedAt, some without) land in separate slots; same shape reuses.encode_requires_build_when_set/encode_omits_requires_build_when_none— outer-record optional-field handling.encode_side_effects_roundtrip—SideEffectsDiffwith both fields populated.encode_side_effects_with_only_added_omits_deleted_field— exact shape the reviewer flagged:deleted: Nonemust not appear as a field name in the output.encode_allocates_separate_slots_for_distinct_side_effects_shapes— twoSideEffectsDiffs with distinct shapes land in separate slots.encode_rejects_manifest_some— error path for the currently-unimplemented branch.cargo test -p pacquet-cli --test pnpm_compatibility— newpnpm_reads_pacquet_written_rowspasses. pacquet installs into a store,node_modulesis wiped, thenpnpm install --ignore-scriptsreinstalls against the same store. If the record encoding is wrong this reproduces the benchmark'sERR_PNPM_READ_FROM_STORE. Complements the existingsame_file_structure/same_index_file_contentswhich clear the store between the pacquet and pnpm halves and therefore don't exercise this path.cargo clippy --workspace --all-targets -- -D warningscargo fmt --all -- --checkcargo doc -p pacquet-store-dir --no-deps— no warnings.