Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

fix(store): write index.db rows as msgpackr records for pnpm interop#252

Merged
zkochan merged 10 commits into
mainfrom
msgpackr-records-encoder
Apr 23, 2026
Merged

fix(store): write index.db rows as msgpackr records for pnpm interop#252
zkochan merged 10 commits into
mainfrom
msgpackr-records-encoder

Conversation

@zkochan

@zkochan zkochan commented Apr 23, 2026

Copy link
Copy Markdown
Member

Summary

Follow-up to #251. Pacquet's StoreIndex::set was writing index.db rows via rmp_serde::to_vec_named — plain MessagePack maps. pnpm reads the same file with Packr({useRecords: true, moreTypes: true}).unpack(…), and in that mode every msgpack map at any nesting level decodes as a JS Map. Records are the escape hatch that tells msgpackr "this one's a plain object".

So pacquet-written rows came back as a top-level Map, pkgIndex.files (a property access on that Map) was undefined, and pnpm's for (const [f, fstat] of pkgIndex.files) threw files is not iterable. That surfaced in CI as ERR_PNPM_READ_FROM_STORE the first time a benchmark run picked up a shared pnpm-v11 cache with pacquet-written rows in it — every downstream job's pnpm install crashed trying to decode them.

#251 taught pacquet how to read pnpm's records; b0bed43 fixed the checkedAt encoding (float64 vs uint64-as-BigInt) and called the pnpm-reads-pacquet direction done — but the container shape was still wrong, and nothing in the test suite actually put a pacquet-written row back through msgpackr, so it slipped past.

Fix

New encode_package_files_index in msgpackr_records.rs that emits msgpackr-records bytes. Behaviour matches what msgpackr 1.11.8 itself emits for the same logical input (verified against the version pinned in pnpm v11's catalog:):

  • Slot allocation is shape-aware. Slot 0x40 is reserved for the top-level PackageFilesIndex. Inner slots in 0x41..=0x7f are allocated lazily, one per distinct record shape, in first-seen order. A CafsFileInfo carrying checkedAt and one without it land in separate slots; same-shape instances downstream collapse to a single bare-slot byte, which is the record-compression win records exist for.
  • Optional fields are omitted from the schema when None, not encoded as nil. A SideEffectsDiff { added: Some, deleted: None } decodes on the pnpm side to { added: Map }, not { added: Map, deleted: null } — the same JS shape msgpackr itself would produce for the same Rust input, so downstream checks like diff.added != null / diff.deleted != null behave identically regardless of which tool wrote the row.
  • CafsFileInfo.checkedAt is written as float 64 when Some (uint 64 would decode to a JS BigInt and crash pnpm's mtimeMs - (checkedAt ?? 0)), omitted from the schema when None.
  • manifest: Some(_) returns EncodeError::ManifestNotSupported. Pacquet doesn't populate it today; encoding serde_json::Value as records is follow-up work.

The encoder also promotes integer values in 0x40..=0x7f to uint 8 instead of positive fixints. Inside a records stream those bytes are slot-reference markers — a bare 0x7b would be parsed as a ref to slot 0x7b rather than the integer 123. msgpackr does the same promotion for the same reason.

State is a tight [Option<u8>; 2] for CafsFileInfo slots and [Option<u8>; 4] for SideEffectsDiff slots, indexed by a shape bitmask — 2 and 4 are the respective maxima, so lookup is a cache-local array index with no HashMap allocation on the hot per-file path.

StoreIndex::set now calls this encoder. StoreIndex::get goes through the existing transcoder which already handled records, so the read path needed no behaviour change — just a doc update to enumerate the three row flavours it now tolerates (pnpm-written records, current pacquet-written records, legacy pacquet plain-msgpack from rows that predate this PR).

Why not a bespoke format?

Considered writing pacquet's rows in a different shape (positional tuple, or keep plain msgpack and switch pnpm's side). Ruled out: the whole point of #244 and #251 is a shared index.db the two tools read each other's entries from. Any shape other than "what pnpm's msgpackr emits" breaks that promise.

Performance

Faster and smaller than rmp_serde::to_vec_named. Measured on a release build, 10k iterations per row-size, with a realistic CafsFileInfo shape (128-hex digest, millisecond timestamp):

Serialize time per row

files rmp_serde (before) records encoder (after) speedup
1 304 ns 164 ns 1.9×
10 1.23 µs 838 ns 1.5×
50 3.71 µs 2.25 µs 1.6×
200 11.9 µs 8.11 µs 1.5×
1000 55.9 µs 27.7 µs 2.0×

Row size on disk

files rmp_serde records encoder delta
1 214 B 220 B +6 B
10 1960 B 1723 B −12%
50 9722 B 8405 B −14%
200 38822 B 33455 B −14%
1000 194022 B 167055 B −14%

The 6-byte regression at n=1 is the outer record header (d4 72 40) plus a slightly longer field-name array; from n≥4 records come out smaller because each subsequent CafsFileInfo instance collapses to a single slot byte instead of re-emitting "digest", "mode", "size", "checkedAt" (~20 bytes of field names). Why it's faster: no serde runtime indirection (no trait dispatch, no generic-over-Serializer code paths), single pre-sized Vec, direct byte writes, zero-alloc slot lookup via the fixed-size arrays.

StoreIndex::get is unchanged, so read cost is identical to before. Tarball writes call encode once per package, so this shows up at ~10–50 µs/package — drowned out by the SQLite INSERT (~100–500 µs) either way, but the payload-size shrink compounds across thousands of packages in the shared v11 store.

Test plan

  • cargo test -p pacquet-store-dir --lib37 passing, 12 new in msgpackr_records::tests:
    • encode_emits_record_header_for_top_level_struct — outer 0xd4 0x72 0x40 record-def header is present.
    • encode_roundtrips_single_file / encode_roundtrips_many_files_sharing_one_slot — single-file round-trip; N uniform-shape CafsFileInfos produce exactly 2 record defs (outer + one inner shape), the rest are bare slot refs.
    • encode_handles_fixint_in_slot_range_safelysize: 0x7b round-trips without triggering UnknownSlot.
    • encode_omits_checked_at_when_noneNone checkedAt isn't written as nil; the string "checkedAt" doesn't appear anywhere in the output.
    • encode_allocates_separate_slots_for_distinct_cafs_shapes — mixed CafsFileInfo shapes (some with checkedAt, some without) land in separate slots; same shape reuses.
    • encode_requires_build_when_set / encode_omits_requires_build_when_none — outer-record optional-field handling.
    • encode_side_effects_roundtripSideEffectsDiff with both fields populated.
    • encode_side_effects_with_only_added_omits_deleted_field — exact shape the reviewer flagged: deleted: None must not appear as a field name in the output.
    • encode_allocates_separate_slots_for_distinct_side_effects_shapes — two SideEffectsDiffs with distinct shapes land in separate slots.
    • encode_rejects_manifest_some — error path for the currently-unimplemented branch.
  • cargo test -p pacquet-cli --test pnpm_compatibility — new pnpm_reads_pacquet_written_rows passes. pacquet installs into a store, node_modules is wiped, then pnpm install --ignore-scripts reinstalls against the same store. If the record encoding is wrong this reproduces the benchmark's ERR_PNPM_READ_FROM_STORE. Complements the existing same_file_structure / same_index_file_contents which clear the store between the pacquet and pnpm halves and therefore don't exercise this path.
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo fmt --all -- --check
  • cargo doc -p pacquet-store-dir --no-deps — no warnings.

Pacquet's `StoreIndex::set` used `rmp_serde::to_vec_named`, which emits
plain MessagePack maps. pnpm reads `index.db` with
`Packr({useRecords: true, moreTypes: true}).unpack(…)`, and in that
mode every msgpack map at any nesting level decodes as a JS `Map` —
records are the escape hatch that tells msgpackr "this one's a plain
object". So pacquet-written rows came back as a top-level `Map`,
`pkgIndex.files` (a property access) was `undefined`, and pnpm's
`for (const [f, fstat] of pkgIndex.files)` threw
`files is not iterable`.

That surfaced in CI as `ERR_PNPM_READ_FROM_STORE` the first time a
benchmark run hit a shared pnpm-v11 cache that had pacquet-written
rows mixed in — every downstream job's `pnpm install` crashed trying
to decode them. #251 added records *decoding* but the symmetric
write path was wrong: `b0bed43` fixed the `checkedAt` encoding
(float64, not uint64-as-BigInt) and called the interop done, but
never actually round-tripped a pacquet-written row through msgpackr.

## Fix

New `encode_package_files_index` in `msgpackr_records.rs` that emits
msgpackr-records bytes directly:

- `PackageFilesIndex` → record at slot 0x40. `algo` and `files`
  always; `requiresBuild` / `sideEffects` only when `Some`, matching
  msgpackr's field-omit-when-absent convention for plain JS objects.
- `CafsFileInfo` → record at slot 0x41. Fixed `[digest, mode, size,
  checkedAt]` schema so every file in a row shares one slot (msgpackr
  would split shapes across slots; logical decoding is identical
  either way and one slot keeps the encoder simple). `checked_at` is
  `float 64` when `Some`, `nil` when `None` — either maps to pnpm's
  `?? 0` coalesce.
- `SideEffectsDiff` → record at slot 0x42. Fixed `[added, deleted]`.
- `manifest: Some(_)` returns `EncodeError::ManifestNotSupported`
  (pacquet doesn't populate it today; encoding `serde_json::Value`
  records is follow-up work).

The encoder also promotes integer values in `0x40..=0x7f` to `uint 8`
instead of positive fixints. Inside a records stream those bytes are
slot-reference markers, so a bare `0x7b` would be parsed as a ref to
slot 0x7b rather than the integer 123. msgpackr does the same
promotion for the same reason.

`StoreIndex::set` now calls this encoder; `StoreIndex::get` goes
through the existing transcoder which already handled records, so
the read path needed no change beyond updating the doc comment that
still described the old plain-msgpack write path.

## Tests

- Nine new encoder tests in `msgpackr_records::tests`, all using the
  encode → transcode → `rmp_serde::from_slice` round-trip. Covers
  record header emission, slot reuse across many `CafsFileInfo`
  instances, fixint-in-slot-range promotion, nil-for-missing
  checkedAt, optional field inclusion/omission, side effects, and
  the `manifest: Some` error path.
- New `pnpm_reads_pacquet_written_rows` integration test (in
  `crates/cli/tests/pnpm_compatibility.rs`) — this is the end-to-end
  catch: pacquet installs into a store, node_modules is wiped, pnpm
  reinstalls against the same store. If the record encoding is wrong
  this reproduces the benchmark's `ERR_PNPM_READ_FROM_STORE`. The
  existing `same_file_structure` / `same_index_file_contents` tests
  didn't exercise this path because they always cleared the store
  between the pacquet and pnpm halves.
@codecov

codecov Bot commented Apr 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.75362% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.26%. Comparing base (0ea1252) to head (8619d64).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/store-dir/src/msgpackr_records.rs 92.71% 30 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
+ Coverage   87.62%   88.26%   +0.63%     
==========================================
  Files          57       57              
  Lines        3621     4031     +410     
==========================================
+ Hits         3173     3558     +385     
- Misses        448      473      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes pnpm v11 interoperability by changing how pacquet serializes index.db rows: it now writes msgpackr records (matching pnpm’s Packr({ useRecords: true, moreTypes: true })) so pnpm can read pacquet-written rows without decoding them as JS Map and crashing.

Changes:

  • Add a dedicated msgpackr-records encoder for PackageFilesIndex (and nested types) with slot reuse and fixint-slot-range safety.
  • Switch StoreIndex::set to store rows using the new records encoder and update associated docs/error types.
  • Add a CLI regression test that ensures pnpm install can read pacquet-written index.db rows from a shared store.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
crates/store-dir/src/store_index.rs Routes writes through the new msgpackr-records encoder; updates docs and error typing for encoding.
crates/store-dir/src/msgpackr_records.rs Implements encode_package_files_index emitting msgpackr-records bytes with slot reuse and safe integer encoding.
crates/cli/tests/pnpm_compatibility.rs Adds an end-to-end regression test where pnpm must successfully read pacquet-written store rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/store_index.rs
Comment thread crates/store-dir/src/msgpackr_records.rs
@github-actions

github-actions Bot commented Apr 23, 2026

Copy link
Copy Markdown

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     18.1±0.26ms   239.4 KB/sec    1.00     18.1±0.56ms   239.1 KB/sec

msgpackr_records.rs now carries both directions; update the
module-level doc so `cargo doc` / IDE summaries aren't introducing it
as "Decoder for…" any more. Also spells out both interop failure
modes (pnpm → pacquet and pacquet → pnpm) so a future reader
understands why the write side couldn't just stay on `rmp_serde`.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/store_index.rs Outdated
Comment thread crates/cli/tests/pnpm_compatibility.rs Outdated
@github-actions

github-actions Bot commented Apr 23, 2026

Copy link
Copy Markdown

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 3.378 ± 0.038 3.302 3.423 1.00
pacquet@main 3.387 ± 0.099 3.297 3.647 1.00 ± 0.03
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 3.3783574037,
      "stddev": 0.03816034891649137,
      "median": 3.3881653729,
      "user": 0.15992965999999997,
      "system": 0.20683638,
      "min": 3.3022906794,
      "max": 3.4234185154,
      "times": [
        3.3542556734,
        3.4234185154,
        3.4104485134,
        3.3022906794,
        3.3929876124000002,
        3.4175035734,
        3.3381832024,
        3.3914535354,
        3.3681555214,
        3.3848772104
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 3.3871463393,
      "stddev": 0.09919247718343524,
      "median": 3.3707923999,
      "user": 0.15706785999999998,
      "system": 0.20469468000000002,
      "min": 3.2966378924,
      "max": 3.6465850054,
      "times": [
        3.3260606404,
        3.3983405754000002,
        3.6465850054,
        3.3213373174,
        3.4286715284000002,
        3.3378377714,
        3.3744078624,
        3.3683759164000002,
        3.3732088834,
        3.2966378924
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Both doc blocks still read as if pacquet had only just started
writing msgpackr-records and still mostly writes plain msgpack — the
opposite of the post-PR reality. Rephrase both so:

- `StoreIndex::get` opens with "rows come in three flavours" and
  lists pnpm-written, current-pacquet-written, and legacy-pacquet
  plain-msgpack, in that order.
- The `pnpm_reads_pacquet_written_rows` regression-test header leads
  with the post-PR state ("pacquet now writes msgpackr-records") and
  explains the old encoding as the regression to guard against,
  rather than describing it as the current behaviour.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Previous encoder hardcoded one slot per record type with fixed 4-field
/ 2-field schemas, using `nil` for absent optional fields
(`checkedAt`, `added`, `deleted`). Produced bytes that *decoded* fine
but had a subtly different JS shape than msgpackr's own output —
`{added: Map, deleted: null}` instead of `{added: Map}`, for
instance. pnpm's own readers are nil-tolerant (`?? 0` / truthy
checks), so the actual on-disk effect today is zero, but third-party
tools that inspect the index might not be, and we shouldn't write
bytes that differ from what msgpackr would emit for the same input.

New encoder follows msgpackr's actual strategy (verified against
1.11.8 in the v11 worktree): each distinct **shape** of an inner
record type gets its own slot. A `CafsFileInfo { checked_at: Some }`
gets slot 0x41, `CafsFileInfo { checked_at: None }` gets 0x42,
`SideEffectsDiff { added: Some, deleted: None }` would then get 0x43,
etc. Same-shape instances downstream collapse to a single bare-slot
byte (still the compression win records exist for), while mixed-shape
streams get as many slots as they need without any one slot being
redefined mid-stream.

State:

    struct EncodeState {
        cafs_slots: [Option<u8>; 2],          // indexed by cafs_shape
        side_effects_slots: [Option<u8>; 4],  // indexed by side_effects_shape
        next_slot: u8,
    }

Fixed-size arrays instead of `HashMap` because each record type has at
most 4 possible shapes — a 2-slot `Option<u8>` array is cache-local,
zero-alloc, and measurably faster than `HashMap::get` on the hot
per-file path. The 0x40..=0x7f slot range is 64 entries so we're
nowhere near running out. `allocate_slot` panics if we did, which
would mean >63 distinct record shapes in one stream; not possible for
current pacquet code.

Perf + size unchanged for the common uniform-shape case (every
`CafsFileInfo` has `checkedAt`) — same one def, same N-1 references.
Mixed shapes pay the def cost once per new shape and reuse thereafter,
exactly as msgpackr does. Re-ran the bench: ~2× faster than
`rmp_serde::to_vec_named` and 14% smaller output at 1000 files, same
numbers as the previous iteration within measurement noise.

Tests:

- Renamed `encode_handles_missing_checked_at_as_nil` →
  `encode_omits_checked_at_when_none` and updated the assertion: no
  `checkedAt` field name should appear in the output at all when
  every instance has `None`.
- `encode_allocates_separate_slots_for_distinct_cafs_shapes`: mix
  with/without-`checkedAt` files and assert exactly three record
  defs (outer + the two inner shapes).
- `encode_side_effects_with_only_added_omits_deleted_field`: the
  specific shape Copilot flagged — `SideEffectsDiff { added: Some,
  deleted: None }` must not emit a `deleted` field name.
- `encode_allocates_separate_slots_for_distinct_side_effects_shapes`:
  two `SideEffectsDiff`s with distinct shapes land in separate slots.

`write_nil` is now unused (no more nil-for-absent fields) and removed.
@zkochan zkochan requested a review from Copilot April 23, 2026 23:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Doc block on `encode_package_files_index` still described the old
fixed-schema implementation: single slot per Rust type, `[digest,
mode, size, checkedAt]` with `nil`, `[added, deleted]` with `nil`.
With shape-aware slot allocation in 9296125, none of that is true
anymore.

Rewrite the **Slot allocation** and **Optional-field handling**
sections so they match the code: `0x40` is fixed for
`PackageFilesIndex`, `0x41..=0x7f` are allocated lazily one-per-shape
in first-seen order, optional fields are omitted from the schema
rather than padded with `nil`. Spell out the pnpm-side consequence —
a `SideEffectsDiff { added: Some, deleted: None }` now decodes to
`{ added: Map }` not `{ added: Map, deleted: null }`.

Also fix the stale `EncodeState::slot_for` link on
`PKG_FILES_INDEX_SLOT` (the helper is `allocate_slot`) and drop the
intra-doc links to the private `cafs_shape` / `side_effects_shape`
fns that were triggering rustdoc private-item warnings.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Comment thread crates/cli/tests/pnpm_compatibility.rs
Two stale-docs follow-ups:

- The encoder rustdoc said pacquet's output is "byte-identical" to
  msgpackr's. Not true — `HashMap` iteration order is
  nondeterministic, so `files` / `sideEffects` / `added` entries
  may be serialised in a different order from msgpackr's JS `Map`
  iteration. Soften to "wire-compatible" (same record schemas, same
  slot numbers, same value encodings; pnpm decodes the same JS
  shape) and explicitly call out that exact bytes can still differ.
- The `same_index_file_contents` test header above
  `pnpm_compatibility.rs` still described pacquet as writing plain
  msgpack via `rmp_serde::to_vec_named`. Rewrite to "both tools
  now write msgpackr records; this test compares post-decode
  structs, not byte-identical rows".

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Top-of-file Strategy section said the write side "allocates a slot
per Rust struct type". Not true anymore — a single type can consume
multiple slots when optional-field presence varies, because each
distinct record shape gets its own slot. Reword so the module intro
doesn't imply a 1:1 type↔slot mapping.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
`EncodeState::allocate_slot` used `assert!` to guard against running
out of the 0x41..=0x7f inner-slot range. Unreachable today (pacquet
has at most 6 possible record shapes across `CafsFileInfo` and
`SideEffectsDiff`; 63 slots available), but `assert!` panics in
release, which is the wrong failure mode for a corrupted-or-unusual
payload — a typed error gets propagated cleanly up through
`StoreIndex::set` to the caller.

- New `EncodeError::OutOfRecordSlots { max }` variant.
- `allocate_slot` now returns `Result<u8, EncodeError>`.
- Propagated `?` through `encode_cafs_file_info` and
  `encode_side_effects_diff` (both were previously `-> ()`).
- New `allocate_slot_returns_error_past_0x7f` test exhausts the
  inner-slot range directly and asserts the 64th allocation yields
  the typed error rather than panicking.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs
`encode_pkg_files_index_value` was writing the top-level record
schema as `[algo, files, requiresBuild?, sideEffects?]`, with
optional fields tacked onto the end. The existing
`decodes_requires_build_true` fixture in this same module was
captured from msgpackr 1.11.8 and shows its schema order is
`[algo, requiresBuild, files]` — i.e. msgpackr picks up pnpm's
TypeScript declaration order (`manifest?`, `requiresBuild?`, `algo`,
`files`, `sideEffects?`) as the JS object's insertion order.

Reorder emission to match:

- schema: `[requiresBuild?, algo, files, sideEffects?]` (still
  skipping `manifest` until it's wired through).
- values: same order.

Pnpm decodes either order because records carry field names with
the schema, so this was never a correctness bug — but wire-level
diffing against a pnpm-written reference row is a routine debugging
exercise, and field-order drift makes that harder than it should be.

New `encode_outer_field_order_matches_msgpackr` test decodes the
outer schema's fixstr field-name array out of the encoder's bytes
and asserts `["requiresBuild", "algo", "files"]`, so any future
schema drift trips it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/store-dir/src/msgpackr_records.rs
Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Comment thread crates/store-dir/src/msgpackr_records.rs Outdated
Two real catches from the reviewer, plus one preemptive fix:

1. Outer field order was wrong again. My previous "fix" put
   `requiresBuild` *before* `algo`, but the
   `decodes_requires_build_true` fixture right here in the module
   shows msgpackr actually emits `[algo, requiresBuild, files]` —
   `requiresBuild` goes **between** `algo` and `files`, not at the
   start. (The `decodes_side_effects` fixture corroborates the
   corresponding `sideEffects` position after `files`.) Fix the
   schema + value order to `[algo, requiresBuild?, files,
   sideEffects?]` and update
   `encode_outer_field_order_matches_msgpackr` to match.
2. `write_array_header`, `write_map_header`, `write_str` all cast
   a `usize` length to `u32` unconditionally when emitting the 32-
   bit variant. On 64-bit hosts a `usize > u32::MAX` would silently
   truncate to a corrupt length prefix. Swap to
   `u32::try_from(n).expect(...)` so overflow panics with a clear
   message instead. Not reachable for any realistic pacquet
   payload (no tarball ships with 4B+ files), but the silent
   corruption mode is worse than a loud crash if it's ever hit.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zkochan zkochan merged commit 90be4f6 into main Apr 23, 2026
17 checks passed
@zkochan zkochan deleted the msgpackr-records-encoder branch April 23, 2026 23:58
zkochan added a commit that referenced this pull request Apr 24, 2026
Mirror the tracing change from the index-open call sites: when the
blocking cache-lookup task returns `Err(JoinError)`, emit a
`tracing::warn!` with the error and the `cache_key` before degrading
to `None`. Panic / cancellation stays diagnosable; the caller still
falls through to a fresh download.

Also drop the stale "msgpackr-records decoding is a follow-up"
wording from the cache-lookup rationale — `StoreIndex::get` has
decoded pnpm-written msgpackr-records since 0ea1252 (#251), and
`transcode_to_plain_msgpack` has handled it on the read side since
90be4f6 (#252). Simplify to "an undecodable entry … falls through
to the download path" so the comment reflects today's decoder.
zkochan added a commit that referenced this pull request Apr 24, 2026
…261)

* perf(store-dir): share one read-only StoreIndex across cache lookups

`DownloadTarballToStore::load_cached_cas_paths` previously opened a
fresh SQLite `Connection::open_with_flags` + `PRAGMA busy_timeout` on
every snapshot, so a 1352-package frozen-lockfile install paid 1352
connection-opens just to read cache hints — all serialized on tokio's
blocking pool because `spawn_blocking` can't parallelize opens against
one file.

Open a `StoreIndex` once per install, wrap it in `Arc<Mutex<…>>`
(rusqlite `Connection` is `Send` but not `Sync`), and thread the handle
through `CreateVirtualStore`, `InstallWithoutLockfile`, and
`DownloadTarballToStore`. Each lookup now takes the mutex just for the
sub-millisecond `SELECT`, releases it before the per-file validation
pass, and frees the Arc when the install finishes.

Measured on the #260 repro (1352 packages, release build, macOS APFS,
warm CAS store): `rm -rf node_modules && pacquet install --frozen-
lockfile` drops from ~17.4 s to ~13.9 s, with CPU utilization rising
from ~135 % to ~160 % as the serialized-open bottleneck clears.

* address review feedback

* `create_virtual_store.rs`: bind `store_index.as_ref()` to a local and
  use `async move` so the closure's copy of the `Option<&…>` is moved
  into each future, rather than borrowing from the closure's
  per-iteration scope.
* `load_cached_cas_paths`: treat a poisoned index mutex as a cache miss
  rather than propagating the panic. The `SELECT` is stateless, so the
  prior panic can't have left the index in an inconsistent shape, and
  cache lookups are a best-effort hint — falling through to a fresh
  download is strictly safer than crashing every subsequent snapshot.

* park StoreIndex open on the blocking pool

Copilot flagged that `StoreIndex::shared_readonly_in` does synchronous
SQLite work (`Connection::open_with_flags` + `PRAGMA busy_timeout`) and
calling it directly in an `async fn` stalls the tokio reactor thread
for the duration of the open. It's once per install, not a hot spot,
but matches the pattern the store-index writer already uses and costs
us nothing to keep consistent.

Wrap both call sites (frozen-lockfile and no-lockfile install paths)
in `tokio::task::spawn_blocking` and await the result before kicking
off concurrent package work. Also moves `tokio` from `dev-dependencies`
into `dependencies` in package-manager's `Cargo.toml` since it's now a
runtime dep for the library code, not just tests.

* degrade JoinError on StoreIndex open to a cache miss

`spawn_blocking(...).await` returns `Result<_, JoinError>` which fires
on blocking-task panic or on cancellation during runtime shutdown.
Panicking on that turns a transient runtime issue into a hard crash
for an install that could otherwise succeed — cache lookups just miss.

Drop the `.expect(...)` and degrade to `None` via `.ok().flatten()`.
`shared_readonly_in` already yields `None` for a first-time install
against an empty store, and every downstream caller already handles
that shape, so this is the same failure mode the code already copes
with — just reached from a different source.

* use async move in install_dependencies_from_registry map closure

Makes the recursive no-lockfile closure's captures explicit: bind
`&node_modules_path` to a local `node_modules_path_ref` (Copy) and
switch to `async move` so every captured variable is moved (copied,
for all the `&T` and `Option<&T>` captures) into the future instead
of borrowing the closure's per-iteration stack frame.

Matches the pattern already used in `create_virtual_store.rs` and in
the outer no-lockfile closure — keeps all three parallel-map sites
consistent.

* log JoinError before degrading to cache miss on StoreIndex open

The silent `.ok().flatten()` hid two failure modes behind a cache miss:
a panic inside the blocking task, and task cancellation during runtime
shutdown. Neither is a crash-worthy error, but a blocking-task panic
during install really should be visible somewhere.

Promote the two call sites to a `match` and emit a `tracing::warn!` on
the `Err(JoinError)` branch before falling back to `None`. Uses the
`pacquet::install` target in common with the rest of this crate's
tracing. Degradation path is unchanged.

* chore: cargo fmt

* log JoinError in tarball cache lookup, refresh stale comment

Mirror the tracing change from the index-open call sites: when the
blocking cache-lookup task returns `Err(JoinError)`, emit a
`tracing::warn!` with the error and the `cache_key` before degrading
to `None`. Panic / cancellation stays diagnosable; the caller still
falls through to a fresh download.

Also drop the stale "msgpackr-records decoding is a follow-up"
wording from the cache-lookup rationale — `StoreIndex::get` has
decoded pnpm-written msgpackr-records since 0ea1252 (#251), and
`transcode_to_plain_msgpack` has handled it on the read side since
90be4f6 (#252). Simplify to "an undecodable entry … falls through
to the download path" so the comment reflects today's decoder.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants