Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

feat: side-effects cache WRITE path (#421 slice B)#424

Merged
zkochan merged 17 commits into
mainfrom
feat/side-effects-cache-write-path-421
May 12, 2026
Merged

feat: side-effects cache WRITE path (#421 slice B)#424
zkochan merged 17 commits into
mainfrom
feat/side-effects-cache-write-path-421

Conversation

@zkochan

@zkochan zkochan commented May 12, 2026

Copy link
Copy Markdown
Member

Summary

Implements slice (B) of #421 — the WRITE path for pnpm's side-effects cache. With this PR, a successful postinstall now re-CAFSes the built package directory, diffs against the pristine PackageFilesIndex.files row, and mutates the row's side_effects map so a future install (pacquet or pnpm) can skip the rebuild via the READ path landed in #423.

User-visible behavior: lifecycle scripts that produce build artifacts (node-gyp outputs, generated source files, etc.) get cached the first time the install runs; subsequent installs against the same lockfile + Node major skip the script and reuse the cached overlay.

What lands

  • pacquet_store_dir::add_files_from_dir — directory walker that re-CAFSes a post-build package directory. Mirrors pnpm/pnpm@7e3145f9fc:store/cafs/src/addFilesFromDir.ts:16-194 + worker/src/start.ts:312-383. Containment: dunce::canonicalize each symlink target, drop if out of root; read/recurse via the resolved canonical path so retargeting a symlink between containment check and read can't smuggle out-of-root data into CAFS. Skips top-level node_modules (matches upstream includeNodeModules: false). Cycle-safe via a recursion-stack visited: HashSet<PathBuf> (entries inserted on descend, removed on return).
  • pacquet_store_dir::upload + calculate_diffupload(store_dir, built_pkg_location, files_index_file, cache_key, writer) walks the package via add_files_from_dir and queues a WriteMsg::SideEffectsUpload { key, cache_key, current_files } to the writer task. The actual R/M/W of the existing PackageFilesIndex row happens inside the writer task (see below), so concurrent uploads to the same row stay commutative.
  • StoreIndexWriter now drives a WriteMsg enum (Replace vs SideEffectsUpload). The batch loop coalesces per-key updates into one PackageFilesIndex value before flushing — multiple side-effects uploads for the same row apply in arrival order against a single in-memory state instead of racing against a stale read on a separate readonly handle. The R/M/W (load existing row → check algo == HASH_ALGORITHM = "sha512"calculate_diff → insert into side_effects map) happens inside the writer's transaction. Loading short-circuits with a debug! / warn! when the base row is missing or unreadable, matching upstream's if (!existingFilesIndex) return / ALGO_MISMATCH bail-outs at pnpm/pnpm@7e3145f9fc:worker/src/start.ts:342-371.
  • Config.side_effects_cache_readonly: bool (default false) — mirrors pnpm's sideEffectsCacheReadonly. Two derived helpers consolidate the gate semantics so precedence stays single-sourced:
    • Config::side_effects_cache_read() = cache || readonly
    • Config::side_effects_cache_write() = cache && !readonly
  • BuildModules WRITE-path call site — after run_postinstall_hooks returns Ok, if side_effects_cache_write is on and StoreIndexWriter + StoreDir are both threaded in, calls pacquet_store_dir::upload(...) with the snapshot's store_index_key(integrity, pkg_id) and the calc_dep_state cache key. All upload errors are swallowed with tracing::warn!, mirroring upstream's try { upload } catch { logger.warn } at pnpm/pnpm@7e3145f9fc:building/during-install/src/index.ts:208-215.
  • build_deps_subgraph — the cache-hashing dep graph is now bounded to the forward closure of requires_build snapshots. Pure-JS installs feed in an empty root iterator and the function returns immediately; installs with buildable deps walk only what calc_dep_state will actually traverse. Upstream's lockfileToDepGraph builds the full graph because its consumers extend beyond cache hashing; pacquet's graph is consumed only by calc_dep_state today, so the closure-bounded walk produces byte-identical cache keys with strictly less work. Restored the cold-install benchmark to parity with main.
  • StoreIndexWriter hoisted up to InstallFrozenLockfile::run — previously spawned + drained inside CreateVirtualStore::run; now it spans both the prefetch/download phase and the build phase so the WRITE-path upload can queue through it. The final batch flushes after BuildModules::run returns.
  • Byte-stable msgpack encodingencode_package_files_index now iterates files, side_effects, and SideEffectsDiff.added in sorted-key order so two pacquet writes of the same logical row produce identical bytes. Was prompted by a review on PR feat: side-effects cache WRITE path (#421 slice B) #424 pointing out that HashMap iteration randomization would otherwise make every row look "changed" on byte-diff even when content was identical.
  • cache_key refactor in BuildModulescalc_dep_state now fires once per snapshot (before the gate check) so both the READ gate and the new WRITE site read the same value.

Tests

  • add_files_from_dir::tests (9 cases): top-level files, nested forward-slash paths, exec-bit propagation, top-level-vs-nested node_modules, symlink-out-of-root drop, symlink-in-root follow, directory-cycle termination, missing-root error.
  • upload::tests (6 cases): identical / added-only / deleted-only / digest change / mode change / mixed — covers every branch of upstream's calculateDiff.
  • deps_graph::tests (new for slice B): empty-roots → empty subgraph; forward-closure inclusion + exclusion; cycle termination.
  • workspace_yaml::tests: sideEffectsCacheReadonly parses + applies; the 4-state truth table for side_effects_cache_read() / side_effects_cache_write().
  • build_modules::tests (new):
    • write_path_populates_side_effects_row — full WRITE-path: pre-seed a base row, run a postinstall that creates generated.txt, drain the writer task, assert side_effects[cache_key].added contains generated.txt and NOT pristine index.js. Mirrors the spirit of upstream's 'a postinstall script does not modify the original sources added to the store' at pnpm/pnpm@7e3145f9fc:sideEffects.ts:189-223.
    • write_path_disabled_skips_upload — same scenario with side_effects_cache_write: false → row's side_effects stays None. Counterpart of upstream's gate on sideEffectsCacheWrite.
    • upload_error_does_not_interrupt_install — postinstall produces a 0-permission file in the package dir; the walker fails on fs::read with EACCES, surfacing as UploadError::AddFilesFromDir(_) which BuildModules swallows with tracing::warn!. Asserts (a) the postinstall-created generated.txt is on disk and (b) the pre-seeded base row's side_effects stays None. Mirrors 'uploading errors do not interrupt installation' at pnpm/pnpm@7e3145f9fc:sideEffects.ts:166-187.

Out of scope

  • Patch-file hash in cache key — depends on patchedDependencies (Lifecycle script subsystem: gaps vs upstream pnpm #397 item 9). Today's WRITE side stamps patch_file_hash: None; same as the existing READ side.
  • requires_build recompute in the R/M/W path. Upstream recomputes from (manifest, filesMap) when the existing row's field is None; pacquet leaves the field as-is (cold-batch downloads already populate it with a real value).
  • GVS / hoisted-node-linker variants of the upstream tests.

Test plan

  • just ready (clippy clean, all tests pass)
  • taplo format --check (clean — just ready doesn't run this, CI does)
  • CI green on Linux / macOS / Windows

Written by an agent (Claude Code, claude-opus-4-7).

zkochan added 7 commits May 12, 2026 16:03
Compute `calc_dep_state` once per snapshot and store the resulting
`Option<String>` before the `is_built` gate check. Pure refactor —
the gate-side comparison still uses the same value the previous
inline call produced. Routes the value to one binding so a future
WRITE-path upload site can reuse the exact key the gate consumed,
provably identical because both reads come from the same `let`.

`deps_state_cache` makes a hypothetical second call free, but
funneling through one site keeps the cache-key composition
single-sourced.
Adds `Config.side_effects_cache_readonly: bool` (default `false`)
and the corresponding `pnpm-workspace.yaml`'s
`sideEffectsCacheReadonly` field via `WorkspaceSettings`. Mirrors
upstream's pnpm/pnpm@7e3145f9fc:config/reader/src/Config.ts:124.

Two helper methods on `Config` consolidate the gate semantics so
the precedence stays single-sourced:

- `side_effects_cache_read()`  = `cache || readonly`
- `side_effects_cache_write()` = `cache && !readonly`

Upstream's derivation uses JS `??` (nullish coalesce on tri-state
optional values) at config/reader/src/index.ts:614-615; pacquet
collapses that to two booleans since `Config` uses eager defaults
rather than `Option<bool>`. The user-intuitive semantics — setting
`sideEffectsCacheReadonly: true` actually blocks writes — match
the flag name, with a small drift from upstream when *both* flags
are explicitly set (documented inline on `side_effects_cache_write`).
Adds `pacquet_store_dir::add_files_from_dir(store_dir, pkg_root)` —
walks a (post-build) package directory, writes every regular file
into CAFS via `StoreDir::write_cas_file`, and returns an `AddedFiles
{ files }` map keyed by forward-slash relative path. Ports
pnpm/pnpm@7e3145f9fc:store/cafs/src/addFilesFromDir.ts:16-194 +
worker/src/start.ts:312-383.

Mirrors upstream's containment + cycle defenses:
- Symlinks: `dunce::canonicalize` the target, drop if it resolves
  outside `pkg_root` (`isSubdir(rootDir, realPath)`).
- Top-level `node_modules` skipped (`includeNodeModules` defaults
  to `false` upstream); nested ones walk normally.
- `visited: HashSet<PathBuf>` carries canonical directory paths
  so a `sub/up -> ..` loop terminates.

Pulls `walkdir` and `dunce` into `pacquet-store-dir/Cargo.toml`
from the workspace defaults.

Unit tests cover: top-level files, nested forward-slash paths,
exec-bit propagation, top-level-vs-nested node_modules,
out-of-root-symlink drop, in-root-symlink follow, directory
cycle termination, and missing-root error surfacing.

Wire-up to `BuildModules` lands in subsequent commits along with
the `calculate_diff` helper and the upload wrapper.
Adds `pacquet_store_dir::upload` and `pacquet_store_dir::calculate_diff`.

`upload(store_dir, built_pkg_location, files_index_file,
side_effects_cache_key, writer)` ports
pnpm/pnpm@7e3145f9fc:store/controller/src/storeController/index.ts:90-99
and the worker-side body at
pnpm/pnpm@7e3145f9fc:worker/src/start.ts:342-371. Re-hashes the
post-build package dir via `add_files_from_dir`, opens a fresh
read-only `StoreIndex`, fetches the existing row by
`files_index_file`, bails `Ok(())` if missing (matches upstream's
`if (!existingFilesIndex) return`), validates the row's `algo`
matches `HASH_ALGORITHM = "sha512"` (matches upstream's
`ALGO_MISMATCH` guard), inserts the diff into the row's
`side_effects` map, and re-queues through `StoreIndexWriter::queue`.

`calculate_diff` ports the set-difference helper at
pnpm/pnpm@7e3145f9fc:worker/src/start.ts:411-434. Returns
`SideEffectsDiff { added, deleted }` with `Option<...>` fields so
empty sides elide from msgpack the same way they do upstream.

Unit tests cover identical/added-only/deleted-only/digest-change/
mode-change/mixed cases — the upstream test vectors for the helper.
Adds the WRITE-path call site in `BuildModules`. After a successful
`run_postinstall_hooks`, if `side_effects_cache_write` is on and a
`StoreIndexWriter` + `StoreDir` are threaded in, `BuildModules` calls
`pacquet_store_dir::upload(...)` with the snapshot's
`store_index_key(integrity, pkg_id)` and the cache key computed by
`calc_dep_state`. Upload errors are swallowed with `tracing::warn!`,
mirroring upstream's `try { upload } catch { logger.warn }` at
pnpm/pnpm@7e3145f9fc:building/during-install/src/index.ts:208-215.

`cache_gate_active` now also opens up for the WRITE side so the dep
graph gets built even when no READ-side overlay is present, ensuring
`cache_key` is non-None when WRITE is enabled.

New `BuildModules` fields are all optional / bool-with-default-false
so existing callers (and the 6 existing test sites) need only the
three new field initializers; behavior is unchanged when WRITE is
off.

Tests:
- `write_path_populates_side_effects_row`: pre-seeds a base
  `PackageFilesIndex` row, runs a postinstall that creates
  `generated.txt`, drains the writer task, re-reads the row, and
  asserts `side_effects[cache_key].added` contains `generated.txt`
  and NOT pristine `index.js`. Mirrors the spirit of upstream's
  `'a postinstall script does not modify the original sources
  added to the store'` test at sideEffects.ts:189-223.
- `write_path_disabled_skips_upload`: same scenario with
  `side_effects_cache_write: false` — row's `side_effects` stays
  `None`. Counterpart of upstream's gate on `sideEffectsCacheWrite`.
The batched store-index writer was spawned + drained inside
`CreateVirtualStore::run`. The side-effects-cache WRITE path in
`BuildModules` now also needs to queue rows through that same writer,
which means it has to outlive `CreateVirtualStore::run`.

This commit hoists the `spawn` and the `drop + await` calls up to
`InstallFrozenLockfile::run`, threads the `&Arc<StoreIndexWriter>`
into both `CreateVirtualStore` (existing consumer) and `BuildModules`
(new consumer), and keeps the existing best-effort error semantics
(`warn!` + continue).

Behaviour is unchanged for the prefetch / cold-batch path. The
final batch now flushes after `BuildModules::run` returns, so any
WRITE-path mutations queued during the build phase land before
the install closes out.
Adds `upload_error_does_not_interrupt_install`, mirroring upstream's
`'uploading errors do not interrupt installation'` at
pnpm/pnpm@7e3145f9fc:installing/deps-installer/test/install/sideEffects.ts:166-187.

Upstream stubs `opts.storeController.upload` to throw; pacquet
reaches the same property by handing `BuildModules` a `store_dir`
with no `index.db`. The WRITE path's `StoreIndex::open_readonly_in`
returns an error, `upload` surfaces it as `UploadError::OpenIndex`,
`BuildModules` swallows it with a `tracing::warn!` and proceeds —
matching upstream's `try { upload } catch { logger.warn }` shape.

The postinstall-generated artifact on disk is the proof that the
build ran end-to-end past the swallowed upload error.

The upstream `'a postinstall script does not modify the original
sources added to the store'` test is already covered by
`write_path_populates_side_effects_row` (this PR's earlier commit),
which asserts the post-build diff carries the new file and the
pristine file's digest is unchanged — same property as upstream's
"original sources unchanged" assertion against `getFilePathByModeInCafs`.
Copilot AI review requested due to automatic review settings May 12, 2026 14:24
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements the complete side-effects cache write path: Config introduces readonly and read/write gate helpers; new store-dir modules provide file indexing and diff computation; StoreIndexWriter gains side-effects upload coalescing; the writer is threaded through CreateVirtualStore/BuildModules/Install orchestration; tests cover gates, indexing, diffs, and upload resilience.

Changes

Side-effects cache write path

Layer / File(s) Summary
Config side-effects cache gates
crates/config/src/lib.rs, crates/config/src/workspace_yaml.rs, crates/config/src/workspace_yaml/tests.rs, crates/package-manager/src/install_package_from_registry/tests.rs
Config gains side_effects_cache_readonly and helper methods side_effects_cache_read()/side_effects_cache_write() derived from both flags. WorkspaceSettings gains side_effects_cache_readonly and applies it to Config. Tests validate YAML decoding and the read/write truth table; a registry-install test helper was updated.
Store-dir side-effects upload infrastructure
crates/store-dir/Cargo.toml, crates/store-dir/src/add_files_from_dir.rs, crates/store-dir/src/add_files_from_dir/tests.rs, crates/store-dir/src/upload.rs, crates/store-dir/src/upload/tests.rs, crates/store-dir/src/lib.rs
New add_files_from_dir traverses package directories, writes files to CAFS, preserves exec bits, safely follows in-root symlinks, and returns an AddedFiles map. upload re-hashes built packages, computes deterministic SideEffectsDiff (sha512), and enqueues updates via StoreIndexWriter::queue_side_effects_upload. Both modules include tests and are re-exported; dunce was added as a dependency.
StoreIndexWriter: typed queue and side-effects upload
crates/store-dir/src/store_index.rs
Writer channel uses a WriteMsg enum and coalesces per-key mutations in a pending map. SideEffectsUpload messages load base rows (pending or SQLite), compute diffs against current files, insert side_effects, and flush coalesced rows with index.set_many. Adds queue_side_effects_upload and stable serialization for side-effects maps.
CreateVirtualStore wiring
crates/package-manager/src/create_virtual_store.rs
CreateVirtualStore now accepts a caller-owned store_index_writer reference and no longer spawns/owns the writer task; the writer is passed into warm/cold install flows.
Install orchestration: writer lifecycle
crates/package-manager/src/install_frozen_lockfile.rs
InstallFrozenLockfile::run spawns the StoreIndexWriter at start, threads the writer into CreateVirtualStore and BuildModules, and awaits the writer task at the end, logging (but not failing on) join errors.
BuildModules read/write gating and postinstall upload
crates/package-manager/Cargo.toml, crates/package-manager/src/build_modules.rs, crates/package-manager/src/build_modules/tests.rs
BuildModules gains side_effects_cache_write, store_dir, and store_index_writer. Read/write gates cause dependency-graph/state-cache construction when either gate is enabled; per-snapshot cache keys are computed once and used for hit-checking. On cache hit the build is skipped; after successful postinstall the write path re-CAFSes the package, computes diffs, and enqueues a side-effects upload; upload/open-index failures are logged and ignored. Tests updated and new write-path tests validate behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • pnpm/pacquet#420: Prior refactor of Config/workspace wiring that this change builds upon.
  • pnpm/pacquet#423: Related package-manager install/build flow changes overlapping BuildModules/CreateVirtualStore wiring.
  • pnpm/pacquet#422: Earlier side-effects cache feature changes to config/workspace that this PR extends.

Suggested reviewers

  • anthonyshew

Poem

🐰 I tunneled through configs and store—

Read and write gates, and hashes galore.
A postinstall nibble, a diff to queue,
The writer hums, the index grew.
Hop, upload, and bless this build anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing the WRITE path for the side-effects cache feature, which is the primary focus of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all major changes, test coverage, and out-of-scope items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/side-effects-cache-write-path-421

Comment @coderabbitai help to get the list of available commands and usage tips.

`crate::upload` and `add_files_from_dir` each name both a module
and a function inside it; rustdoc's intra-doc-link resolver flags
the bare ``[`name`]`` form as ambiguous under `-D warnings`. Adding
the `()` suffix routes each link to the function explicitly,
matching the same fix from PR #423.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/package-manager/src/install_frozen_lockfile.rs (1)

158-171: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the read gate here, not the raw cache flag.

BuildModules now gets a separate write gate, but the read-side field is still wired to config.side_effects_cache. That breaks the new readonly mode: side_effects_cache=false + side_effects_cache_readonly=true should still consume cached side effects, but this path will force a rebuild.

Suggested fix
         let ignored_builds = BuildModules {
             virtual_store_dir: &config.virtual_store_dir,
             modules_dir: &config.modules_dir,
             lockfile_dir: manifest_dir,
             snapshots,
             packages,
             importers,
             allow_build_policy: &allow_build_policy,
             side_effects_maps_by_snapshot: Some(&side_effects_maps_by_snapshot),
             engine_name: engine_name.as_deref(),
-            side_effects_cache: config.side_effects_cache,
+            side_effects_cache: config.side_effects_cache_read(),
             side_effects_cache_write: config.side_effects_cache_write(),
             store_dir: Some(&config.store_dir),
             store_index_writer: Some(&store_index_writer),
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/package-manager/src/install_frozen_lockfile.rs` around lines 158 -
171, The BuildModules initializer is still using config.side_effects_cache for
the read gate, breaking readonly mode; change the read-side argument to the
dedicated read gate (e.g. pass config.side_effects_cache_readonly) or, to
preserve current semantics, pass the OR of both flags (config.side_effects_cache
|| config.side_effects_cache_readonly) into the BuildModules read field (refer
to the BuildModules struct construction and the side_effects_cache_read /
side_effects_cache fields) while keeping side_effects_cache_write as-is.
🧹 Nitpick comments (2)
crates/config/src/workspace_yaml/tests.rs (1)

169-189: ⚡ Quick win

Prefer assert_eq! for the truth-table cases.

These bare assert! calls won't show which (read, write) pair failed. Folding each case into one assert_eq! on the expected tuple keeps the coverage and makes failures self-describing.

♻️ Suggested rewrite
 fn side_effects_cache_gates_truth_table() {
     let mut config = Config::new();
-    assert!(config.side_effects_cache_read());
-    assert!(config.side_effects_cache_write());
+    assert_eq!(
+        (config.side_effects_cache_read(), config.side_effects_cache_write()),
+        (true, true),
+    );

     config.side_effects_cache = false;
     config.side_effects_cache_readonly = false;
-    assert!(!config.side_effects_cache_read());
-    assert!(!config.side_effects_cache_write());
+    assert_eq!(
+        (config.side_effects_cache_read(), config.side_effects_cache_write()),
+        (false, false),
+    );

     config.side_effects_cache = true;
     config.side_effects_cache_readonly = true;
-    assert!(config.side_effects_cache_read());
-    assert!(!config.side_effects_cache_write());
+    assert_eq!(
+        (config.side_effects_cache_read(), config.side_effects_cache_write()),
+        (true, false),
+    );

     config.side_effects_cache = false;
     config.side_effects_cache_readonly = true;
-    assert!(config.side_effects_cache_read());
-    assert!(!config.side_effects_cache_write());
+    assert_eq!(
+        (config.side_effects_cache_read(), config.side_effects_cache_write()),
+        (true, false),
+    );
 }

Based on learnings: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., eprintln!/eprintln!(...)) so that useful diagnostic values are printed when a test fails, unless the assertion is assert_eq!.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/config/src/workspace_yaml/tests.rs` around lines 169 - 189, The test
side_effects_cache_gates_truth_table currently uses separate assert! calls for
read/write pairs which don't show which pair failed; replace each pair of
assert!(config.side_effects_cache_read()) /
assert!(config.side_effects_cache_write()) (and their negated forms) with a
single assert_eq! comparing (config.side_effects_cache_read(),
config.side_effects_cache_write()) to the expected (bool,bool) tuple for that
case. Locate the assertions in the side_effects_cache_gates_truth_table test
(created via Config::new(), toggling config.side_effects_cache and
config.side_effects_cache_readonly) and update them to use assert_eq! so
failures are self-describing; no extra eprintln! logging is needed once
assert_eq! is used.
crates/store-dir/src/upload/tests.rs (1)

41-48: ⚡ Quick win

Make the added_only assertion self-describing.

assert!(added.contains_key("new")) gives no visibility into the computed diff on failure. Either log added first or assert on the expected key set with assert_eq!.

Based on learnings: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., eprintln!/eprintln!(...)) so that useful diagnostic values are printed when a test fails, unless the assertion is assert_eq!.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/store-dir/src/upload/tests.rs` around lines 41 - 48, The test
added_only should be made self-describing by either printing the computed
`added` before asserting or by using an equality assertion on the expected key
set; update the test function `added_only` (which calls `calculate_diff` and
checks `diff.added`) to either emit `eprintln!("{:?}", added)` right before the
assertion or replace `assert!(added.contains_key("new"))` with an `assert_eq!`
that compares the `added` keys to the expected set (e.g., collect `added.keys()`
into a Vec/HashSet and assert it equals the single-element collection containing
"new") so failures show the actual diff.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/store-dir/src/add_files_from_dir/tests.rs`:
- Around line 169-172: The test uses a hardcoded Unix absolute path; make it
platform-neutral by creating a fresh temporary directory (e.g., via
tempfile::tempdir()) and building a non-existent path under that tempdir, then
call add_files_from_dir(&store_dir, &nonexistent_path) and assert it returns
Err(AddFilesFromDirError::CanonicalizeRoot { .. }); update the
missing_root_errors test to use make_store(), tempfile::tempdir(),
tempdir.path().join("nonexistent") (or similar) as the root path so the test is
deterministic on Windows and Unix.

In `@crates/store-dir/src/upload.rs`:
- Around line 81-104: The code performs a read-modify-write using the on-disk
index only and then calls writer.queue(files_index_file, existing), which causes
races where concurrent uploads overwrite earlier queued side_effects; fix by
merging any pending queued row for the same files_index_file from the writer
before inserting the new side_effects entry so queued mutations are accumulated.
Concretely, inside upload() after loading `existing` (the result of index.get)
and before modifying `existing.side_effects`, attempt to retrieve the
writer-owned pending entry for `files_index_file` (or otherwise ask the
StoreIndexWriter for a pending/queued row), deserialize/merge its
`.side_effects` into `existing.side_effects`, then compute `diff` and insert
into the merged map, and finally call
`writer.queue(files_index_file.to_string(), existing)`; ensure you reference
`StoreIndex::open_readonly_in`, `index.get`, `existing`,
`side_effects_cache_key`, `calculate_diff`, and `writer.queue` when locating the
code to change.

---

Outside diff comments:
In `@crates/package-manager/src/install_frozen_lockfile.rs`:
- Around line 158-171: The BuildModules initializer is still using
config.side_effects_cache for the read gate, breaking readonly mode; change the
read-side argument to the dedicated read gate (e.g. pass
config.side_effects_cache_readonly) or, to preserve current semantics, pass the
OR of both flags (config.side_effects_cache ||
config.side_effects_cache_readonly) into the BuildModules read field (refer to
the BuildModules struct construction and the side_effects_cache_read /
side_effects_cache fields) while keeping side_effects_cache_write as-is.

---

Nitpick comments:
In `@crates/config/src/workspace_yaml/tests.rs`:
- Around line 169-189: The test side_effects_cache_gates_truth_table currently
uses separate assert! calls for read/write pairs which don't show which pair
failed; replace each pair of assert!(config.side_effects_cache_read()) /
assert!(config.side_effects_cache_write()) (and their negated forms) with a
single assert_eq! comparing (config.side_effects_cache_read(),
config.side_effects_cache_write()) to the expected (bool,bool) tuple for that
case. Locate the assertions in the side_effects_cache_gates_truth_table test
(created via Config::new(), toggling config.side_effects_cache and
config.side_effects_cache_readonly) and update them to use assert_eq! so
failures are self-describing; no extra eprintln! logging is needed once
assert_eq! is used.

In `@crates/store-dir/src/upload/tests.rs`:
- Around line 41-48: The test added_only should be made self-describing by
either printing the computed `added` before asserting or by using an equality
assertion on the expected key set; update the test function `added_only` (which
calls `calculate_diff` and checks `diff.added`) to either emit
`eprintln!("{:?}", added)` right before the assertion or replace
`assert!(added.contains_key("new"))` with an `assert_eq!` that compares the
`added` keys to the expected set (e.g., collect `added.keys()` into a
Vec/HashSet and assert it equals the single-element collection containing "new")
so failures show the actual diff.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e9b17bb0-c3af-4629-a74d-ad7912473796

📥 Commits

Reviewing files that changed from the base of the PR and between 97d7439 and 2baa2c7.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • crates/config/src/lib.rs
  • crates/config/src/workspace_yaml.rs
  • crates/config/src/workspace_yaml/tests.rs
  • crates/package-manager/Cargo.toml
  • crates/package-manager/src/build_modules.rs
  • crates/package-manager/src/build_modules/tests.rs
  • crates/package-manager/src/create_virtual_store.rs
  • crates/package-manager/src/install_frozen_lockfile.rs
  • crates/package-manager/src/install_package_from_registry/tests.rs
  • crates/store-dir/Cargo.toml
  • crates/store-dir/src/add_files_from_dir.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/store-dir/src/lib.rs
  • crates/store-dir/src/upload.rs
  • crates/store-dir/src/upload/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Agent
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Cargo Deny
  • GitHub Check: Dylint
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Code Coverage
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Log emissions are part of 'match pnpm'—when porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(...), or streamParser.write(...), mirror the call site, payload, and ordering so @pnpm/cli.default-reporter parses pacquet's NDJSON the same way
Declare a newtype wrapper for branded string types instead of collapsing into plain String or &str to preserve the distinction between validating and non-validating brands
If upstream always validates before construction of a branded string type, implement validation via TryFrom<String> and/or FromStr without providing an infallible public constructor
If upstream never validates a branded string type, expose an infallible From<String> constructor to provide type-safety without runtime validation
When upstream occasionally constructs a branded string type without validation via bare as assertion, expose a from_str_unchecked constructor so callers can opt into the same unchecked path explicitly
Match upstream serde behavior for branded types—use #[serde(try_from = "String")] for deserialization to go through the validator and #[serde(into = "String")] for serialization
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls on branded types rather than handwriting impl blocks
Model string-literal unions (like 'auto' | 'always' | 'never') as Rust enums instead of newtype wrappers
Template literal types (like ${string}@${string}) should be treated as branded string types with the same validation discipline applied
Follow the code style guide in CODE_STYLE_GUIDE.md covering code-level conventions not enforced by tooling: imports, modules, naming, ownership and borrowing, parameter type selection, trait bounds, pattern matching, pipe-trait, error handling, test layout, and cloning of Arc and Rc
Warnings are errors in cargo clippy --locked -- --deny warnings, and should not be silenced with `#[allow(....

Files:

  • crates/config/src/workspace_yaml.rs
  • crates/package-manager/src/install_frozen_lockfile.rs
  • crates/package-manager/src/install_package_from_registry/tests.rs
  • crates/config/src/lib.rs
  • crates/package-manager/src/build_modules.rs
  • crates/config/src/workspace_yaml/tests.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/package-manager/src/create_virtual_store.rs
  • crates/store-dir/src/add_files_from_dir.rs
  • crates/store-dir/src/upload.rs
  • crates/store-dir/src/upload/tests.rs
  • crates/store-dir/src/lib.rs
  • crates/package-manager/src/build_modules/tests.rs
🧠 Learnings (2)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/config/src/workspace_yaml.rs
  • crates/package-manager/src/install_frozen_lockfile.rs
  • crates/package-manager/src/install_package_from_registry/tests.rs
  • crates/config/src/lib.rs
  • crates/package-manager/src/build_modules.rs
  • crates/config/src/workspace_yaml/tests.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/package-manager/src/create_virtual_store.rs
  • crates/store-dir/src/add_files_from_dir.rs
  • crates/store-dir/src/upload.rs
  • crates/store-dir/src/upload/tests.rs
  • crates/store-dir/src/lib.rs
  • crates/package-manager/src/build_modules/tests.rs
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.

Applied to files:

  • crates/package-manager/src/install_package_from_registry/tests.rs
  • crates/config/src/workspace_yaml/tests.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/store-dir/src/upload/tests.rs
  • crates/package-manager/src/build_modules/tests.rs

Comment thread crates/store-dir/src/add_files_from_dir.rs Outdated
Comment thread crates/store-dir/src/add_files_from_dir/tests.rs
Comment thread crates/store-dir/src/upload.rs Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the side-effects cache WRITE path so pacquet can persist postinstall-produced artifacts into the store-index, enabling subsequent installs (pacquet or pnpm) to skip rebuilds via the already-landed READ gate.

Changes:

  • Added add_files_from_dir + upload in pacquet-store-dir to re-CAFS a built package dir, compute a side-effects diff, and queue a read-modify-write update to PackageFilesIndex.side_effects.
  • Introduced Config.side_effects_cache_readonly plus side_effects_cache_read() / side_effects_cache_write() helpers to mirror pnpm’s read/write gating semantics.
  • Hoisted StoreIndexWriter to span download + build phases, and wired a best-effort upload call after successful postinstall in BuildModules, with extensive new test coverage.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/store-dir/src/add_files_from_dir.rs Directory walker that re-CAFSes an on-disk package directory with symlink containment + cycle handling.
crates/store-dir/src/add_files_from_dir/tests.rs Unit tests for directory walking, node_modules skipping, symlink containment, and cycle termination.
crates/store-dir/src/upload.rs Worker-side orchestrator to diff vs pristine index row and queue side-effects cache updates; includes calculate_diff.
crates/store-dir/src/upload/tests.rs Unit tests covering calculateDiff-style cases (identical/add/delete/mode/digest/mixed).
crates/store-dir/src/lib.rs Exposes new add_files_from_dir and upload APIs from the crate.
crates/store-dir/Cargo.toml Adds dependencies needed for the new walker/upload code.
crates/config/src/lib.rs Adds side_effects_cache_readonly and helper methods for READ/WRITE gate semantics.
crates/config/src/workspace_yaml.rs Parses and applies sideEffectsCacheReadonly from pnpm-workspace.yaml.
crates/config/src/workspace_yaml/tests.rs Tests YAML parsing and the read/write truth table for cache gates.
crates/package-manager/src/install_frozen_lockfile.rs Owns StoreIndexWriter for the full install and threads it into download + build phases; awaits final flush at end.
crates/package-manager/src/create_virtual_store.rs Accepts caller-owned StoreIndexWriter instead of spawning/awaiting internally.
crates/package-manager/src/build_modules.rs Adds WRITE-path upload call site after successful postinstall; refactors cache key computation to be shared.
crates/package-manager/src/build_modules/tests.rs Adds end-to-end-ish tests for write path population, write-disabled behavior, and swallowed upload errors.
crates/package-manager/src/install_package_from_registry/tests.rs Updates test config construction for the new readonly flag.
crates/package-manager/Cargo.toml Adds sha2 as a dev-dependency for new test helpers.
Cargo.lock Lockfile updates for newly referenced crates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/package-manager/src/install_frozen_lockfile.rs Outdated
Comment thread crates/store-dir/src/upload.rs
Comment thread crates/store-dir/Cargo.toml Outdated
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.05263% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.33%. Comparing base (97d7439) to head (632df98).

Files with missing lines Patch % Lines
crates/store-dir/src/store_index.rs 84.90% 8 Missing ⚠️
...tes/package-manager/src/install_frozen_lockfile.rs 81.81% 2 Missing ⚠️
crates/store-dir/src/add_files_from_dir.rs 97.93% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #424      +/-   ##
==========================================
+ Coverage   85.82%   86.33%   +0.50%     
==========================================
  Files          83       85       +2     
  Lines        5854     6123     +269     
==========================================
+ Hits         5024     5286     +262     
- Misses        830      837       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.01     16.1±0.56ms   269.7 KB/sec    1.00     15.9±0.07ms   273.5 KB/sec

- Move side-effects R/M/W into the writer task (coderabbit).
  `StoreIndexWriter` now drives a `WriteMsg` enum with `Replace`
  and `SideEffectsUpload` variants. The batch loop coalesces
  per-key updates into one `PackageFilesIndex` value before
  flushing, so two uploads against the same row in the same batch
  no longer race the stale read on a separate readonly handle.
  `upload()` shrinks to walk-then-queue.
- Close TOCTOU on symlinks (coderabbit): the walker now reads
  files and recurses into directories via the resolved canonical
  path instead of the original entry, so retargeting a symlink
  between containment check and read can't smuggle out-of-root
  data into CAFS.
- Make `missing_root_errors` platform-neutral (coderabbit): use a
  tempdir subpath instead of `/this/does/not/exist/anywhere` so
  Windows CI doesn't see a spuriously-present path.
- Pass `config.side_effects_cache_read()` to
  `BuildModules.side_effects_cache` (copilot): the field is the
  READ gate, so the precedence helper should drive it.
- Make `calculate_diff` iteration deterministic (copilot): use
  `BTreeSet` over the union of base/current keys so the resulting
  `deleted: Vec<String>` is lexically sorted and the msgpack
  payload is byte-stable for the same input.
- Drop the unused `walkdir` workspace dep from `pacquet-store-dir`
  (copilot): the walker uses `std::fs::read_dir`.
Copilot AI review requested due to automatic review settings May 12, 2026 14:38
`[crate::upload]` was the last remaining ambiguous intra-doc link
between the `upload` module and the function. Adding `()` routes
it to the function so rustdoc's `-D warnings` build stops failing.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.

Comment thread crates/store-dir/src/upload.rs Outdated
Comment thread crates/package-manager/src/build_modules/tests.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/store-dir/src/upload.rs (1)

59-65: ⚡ Quick win

Accept &StoreIndexWriter here instead of &Arc<_>.

upload() only needs shared access to the writer, so &Arc<StoreIndexWriter> unnecessarily couples the API to one ownership strategy. Taking &StoreIndexWriter keeps the signature broader and existing Arc call sites still work via deref coercion.

Suggested change
 pub fn upload(
     store_dir: &StoreDir,
     built_pkg_location: &Path,
     files_index_file: &str,
     side_effects_cache_key: &str,
-    writer: &Arc<StoreIndexWriter>,
+    writer: &StoreIndexWriter,
 ) -> Result<(), UploadError> {

As per coding guidelines, "Choose owned vs. borrowed parameters to minimize copies; widen to the most encompassing type (&Path over &PathBuf, &str over &String) when it doesn't force extra copies".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/store-dir/src/upload.rs` around lines 59 - 65, Change the upload
function signature to take a borrowed StoreIndexWriter (pub fn upload(...,
writer: &StoreIndexWriter, ...) -> Result<..., UploadError>) instead of
&Arc<StoreIndexWriter>; update any internal uses to work with &StoreIndexWriter
(no cloning or Arc-specific methods), and update all call sites that currently
pass an Arc<StoreIndexWriter> to pass a reference to the inner value (e.g.,
writer.as_ref() or &*writer) so existing Arc ownership remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/store-dir/src/store_index.rs`:
- Around line 197-200: The side_effects HashMap and the SideEffectsDiff::added
map must be serialized deterministically; add a custom serde serializer (e.g.
serialize_map_sorted or serialize_hashmap_as_btreemap) that converts HashMap
entries into a BTreeMap (sorting keys) before calling the serializer, then
annotate the problematic fields with #[serde(serialize_with =
"serialize_map_sorted")] (apply this to the row.side_effects field in the struct
that contains it and to the SideEffectsDiff::added field definition). Implement
the serializer to accept the usual (S: Serializer) signature and forward
serialization of the BTreeMap so msgpack output is byte-stable and matches the
project's existing pattern.

---

Nitpick comments:
In `@crates/store-dir/src/upload.rs`:
- Around line 59-65: Change the upload function signature to take a borrowed
StoreIndexWriter (pub fn upload(..., writer: &StoreIndexWriter, ...) ->
Result<..., UploadError>) instead of &Arc<StoreIndexWriter>; update any internal
uses to work with &StoreIndexWriter (no cloning or Arc-specific methods), and
update all call sites that currently pass an Arc<StoreIndexWriter> to pass a
reference to the inner value (e.g., writer.as_ref() or &*writer) so existing Arc
ownership remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 650b01c5-11e1-4eb2-9a65-ee562c0cb845

📥 Commits

Reviewing files that changed from the base of the PR and between 560200b and 9546af9.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • crates/package-manager/src/install_frozen_lockfile.rs
  • crates/store-dir/Cargo.toml
  • crates/store-dir/src/add_files_from_dir.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/store-dir/src/store_index.rs
  • crates/store-dir/src/upload.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/store-dir/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/package-manager/src/install_frozen_lockfile.rs
  • crates/store-dir/src/add_files_from_dir/tests.rs
  • crates/store-dir/src/add_files_from_dir.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Code Coverage
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Run benchmark on ubuntu-latest
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Log emissions are part of 'match pnpm'—when porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(...), or streamParser.write(...), mirror the call site, payload, and ordering so @pnpm/cli.default-reporter parses pacquet's NDJSON the same way
Declare a newtype wrapper for branded string types instead of collapsing into plain String or &str to preserve the distinction between validating and non-validating brands
If upstream always validates before construction of a branded string type, implement validation via TryFrom<String> and/or FromStr without providing an infallible public constructor
If upstream never validates a branded string type, expose an infallible From<String> constructor to provide type-safety without runtime validation
When upstream occasionally constructs a branded string type without validation via bare as assertion, expose a from_str_unchecked constructor so callers can opt into the same unchecked path explicitly
Match upstream serde behavior for branded types—use #[serde(try_from = "String")] for deserialization to go through the validator and #[serde(into = "String")] for serialization
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls on branded types rather than handwriting impl blocks
Model string-literal unions (like 'auto' | 'always' | 'never') as Rust enums instead of newtype wrappers
Template literal types (like ${string}@${string}) should be treated as branded string types with the same validation discipline applied
Follow the code style guide in CODE_STYLE_GUIDE.md covering code-level conventions not enforced by tooling: imports, modules, naming, ownership and borrowing, parameter type selection, trait bounds, pattern matching, pipe-trait, error handling, test layout, and cloning of Arc and Rc
Warnings are errors in cargo clippy --locked -- --deny warnings, and should not be silenced with `#[allow(....

Files:

  • crates/store-dir/src/upload.rs
  • crates/store-dir/src/store_index.rs
🧠 Learnings (1)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/store-dir/src/upload.rs
  • crates/store-dir/src/store_index.rs

Comment thread crates/store-dir/src/store_index.rs Outdated
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 2.469 ± 0.048 2.419 2.542 1.00 ± 0.04
pacquet@main 2.468 ± 0.072 2.367 2.559 1.00
pnpm 5.929 ± 0.083 5.789 6.032 2.40 ± 0.08
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 2.4689039816399996,
      "stddev": 0.04780507005721563,
      "median": 2.4491665432399996,
      "user": 2.58218524,
      "system": 3.4716213000000002,
      "min": 2.41898022624,
      "max": 2.5420708292399996,
      "times": [
        2.44893726224,
        2.5420708292399996,
        2.51206773224,
        2.41898022624,
        2.42238800624,
        2.5323051952399998,
        2.43452208124,
        2.50072212524,
        2.4276505342399997,
        2.44939582424
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.4678160689399995,
      "stddev": 0.07200481956971332,
      "median": 2.46928902374,
      "user": 2.55450604,
      "system": 3.3838941,
      "min": 2.36701871324,
      "max": 2.55915867524,
      "times": [
        2.5508388302399996,
        2.55915867524,
        2.42198255324,
        2.43932594624,
        2.36701871324,
        2.4221017992399996,
        2.3722784982399996,
        2.4992521012399997,
        2.52728649124,
        2.5189170812399997
      ]
    },
    {
      "command": "pnpm",
      "mean": 5.92899292664,
      "stddev": 0.08322227215606012,
      "median": 5.93109513524,
      "user": 8.73746164,
      "system": 4.343755300000001,
      "min": 5.789286720240001,
      "max": 6.03226841624,
      "times": [
        5.90861164824,
        5.789286720240001,
        5.8075996432400006,
        5.95357862224,
        5.89604724324,
        6.03226841624,
        5.99601960424,
        6.01162277224,
        5.9881097722400005,
        5.90678482424
      ]
    }
  ]
}

Scenario: Frozen Lockfile (Hot Cache)

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 695.7 ± 64.2 641.7 865.1 1.00
pacquet@main 702.4 ± 80.8 655.3 920.6 1.01 ± 0.15
pnpm 2526.2 ± 119.9 2414.6 2785.0 3.63 ± 0.38
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.6956557923,
      "stddev": 0.06422375898028904,
      "median": 0.6753533603999999,
      "user": 0.35833357999999993,
      "system": 1.49355988,
      "min": 0.6416866079,
      "max": 0.8651463669,
      "times": [
        0.8651463669,
        0.7211930129,
        0.6732501849,
        0.6987764299,
        0.6989813569000001,
        0.6696046889,
        0.6416866079,
        0.6513977209,
        0.6774565359,
        0.6590650179
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.7024205165,
      "stddev": 0.08080488487627817,
      "median": 0.6735188819,
      "user": 0.34971257999999994,
      "system": 1.51433668,
      "min": 0.6552940409,
      "max": 0.9205762319,
      "times": [
        0.7351236929,
        0.6734529659,
        0.6816737759,
        0.7083415179,
        0.6585410649000001,
        0.6617263689,
        0.6558907079,
        0.9205762319,
        0.6735847979,
        0.6552940409
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.5261615645,
      "stddev": 0.11989112782652504,
      "median": 2.4853160134000003,
      "user": 2.96723048,
      "system": 2.2039737799999997,
      "min": 2.4146114469000004,
      "max": 2.7850201139000004,
      "times": [
        2.6163077129000003,
        2.4177157519000003,
        2.4146114469000004,
        2.7850201139000004,
        2.4360639569000004,
        2.4656249519,
        2.5050070749000004,
        2.4591027329000004,
        2.6413636409000003,
        2.5207982619000004
      ]
    }
  ]
}

- Add a custom sorted-map serializer for `side_effects` and
  `SideEffectsDiff.added` (coderabbit, copilot): converts each
  `HashMap` to a `BTreeMap` (sorted by key) before emitting, so
  the msgpack payload is byte-stable for the same logical
  overlay. Without this the maps go on the wire in
  `HashMap`-randomized order even though `calculate_diff` builds
  them deterministically.
- Honest comment on `calculate_diff` (copilot): drops the
  overclaimed "byte-stable msgpack" promise; the in-loop
  `BTreeSet` only orders the `deleted` vector, and byte-stability
  of `added` is now provided by the new serializer.
- Rewrite `upload_error_does_not_interrupt_install` to actually
  fail the upload (copilot): the previous test relied on
  `StoreIndex::open_readonly_in` returning an error when there's
  no `index.db`, but `StoreIndexWriter::spawn` auto-creates the
  file. The new fixture has the postinstall produce a
  0-permission file in the package directory, which makes
  `add_files_from_dir` fail with `EACCES` on `fs::read` and
  forces a real `UploadError::AddFilesFromDir`. The test asserts
  the build kept going despite the upload error.
Copilot AI review requested due to automatic review settings May 12, 2026 15:26
`serialize_sorted_map_opt` is a private helper, so intra-doc links
to it fail under `-D warnings` with `this item is private`. Drop
the link brackets and keep the name as bare code style so the doc
still surfaces the connection without the rustdoc error.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/store-dir/src/upload.rs (1)

120-127: 💤 Low value

Consider deriving Clone for CafsFileInfo.

The manual clone_info helper could be eliminated by adding Clone to CafsFileInfo's derives, making the code more idiomatic.

-#[derive(Debug, PartialEq, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 pub struct CafsFileInfo {

Then in calculate_diff:

-            added.insert(file.to_string(), clone_info(now));
+            added.insert(file.to_string(), now.clone());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/store-dir/src/upload.rs` around lines 120 - 127, The manual clone_info
function is redundant—derive Clone on the CafsFileInfo struct and remove
clone_info; then update callers (e.g., in calculate_diff) to call .clone() on
CafsFileInfo instances instead of using clone_info. Ensure the struct definition
for CafsFileInfo includes Clone (and keep other derives intact), delete the
clone_info function, and replace its usages with direct .clone() calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/store-dir/src/upload.rs`:
- Around line 120-127: The manual clone_info function is redundant—derive Clone
on the CafsFileInfo struct and remove clone_info; then update callers (e.g., in
calculate_diff) to call .clone() on CafsFileInfo instances instead of using
clone_info. Ensure the struct definition for CafsFileInfo includes Clone (and
keep other derives intact), delete the clone_info function, and replace its
usages with direct .clone() calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 99edeed0-f05e-42c6-abf6-0370ed09f294

📥 Commits

Reviewing files that changed from the base of the PR and between 9546af9 and 82b72c6.

📒 Files selected for processing (3)
  • crates/package-manager/src/build_modules/tests.rs
  • crates/store-dir/src/store_index.rs
  • crates/store-dir/src/upload.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/package-manager/src/build_modules/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Agent
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Lint and Test (ubuntu-latest)
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Code Coverage
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Log emissions are part of 'match pnpm'—when porting a function that fires pnpm:<channel> events through globalLogger, logger.debug(...), or streamParser.write(...), mirror the call site, payload, and ordering so @pnpm/cli.default-reporter parses pacquet's NDJSON the same way
Declare a newtype wrapper for branded string types instead of collapsing into plain String or &str to preserve the distinction between validating and non-validating brands
If upstream always validates before construction of a branded string type, implement validation via TryFrom<String> and/or FromStr without providing an infallible public constructor
If upstream never validates a branded string type, expose an infallible From<String> constructor to provide type-safety without runtime validation
When upstream occasionally constructs a branded string type without validation via bare as assertion, expose a from_str_unchecked constructor so callers can opt into the same unchecked path explicitly
Match upstream serde behavior for branded types—use #[serde(try_from = "String")] for deserialization to go through the validator and #[serde(into = "String")] for serialization
Use #[derive(derive_more::From)] and #[derive(derive_more::Into)] for mechanical conversion impls on branded types rather than handwriting impl blocks
Model string-literal unions (like 'auto' | 'always' | 'never') as Rust enums instead of newtype wrappers
Template literal types (like ${string}@${string}) should be treated as branded string types with the same validation discipline applied
Follow the code style guide in CODE_STYLE_GUIDE.md covering code-level conventions not enforced by tooling: imports, modules, naming, ownership and borrowing, parameter type selection, trait bounds, pattern matching, pipe-trait, error handling, test layout, and cloning of Arc and Rc
Warnings are errors in cargo clippy --locked -- --deny warnings, and should not be silenced with `#[allow(....

Files:

  • crates/store-dir/src/store_index.rs
  • crates/store-dir/src/upload.rs
🧠 Learnings (1)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/store-dir/src/store_index.rs
  • crates/store-dir/src/upload.rs
🔇 Additional comments (8)
crates/store-dir/src/store_index.rs (4)

769-790: LGTM! Byte-stable serialization.

The serialize_sorted_map_opt helper correctly addresses the deterministic serialization requirement by converting HashMap to BTreeMap before emission. This ensures msgpack payloads are byte-identical for the same logical overlay across runs.


62-91: LGTM! Clean enum design for message-passing.

The WriteMsg enum cleanly separates the two write patterns, and the doc comments thoroughly explain why routing R/M/W through the writer task ensures commutativity. This correctly addresses the stale-read race condition from the prior review.


146-204: LGTM! Correct coalescing implementation.

The batch-coalescing logic correctly accumulates multiple mutations for the same key:

  • Loading from pending first ensures sequential SideEffectsUpload messages build on each other
  • The graceful fallback to SQLite handles the first sighting of a key in a batch
  • Algo mismatch and missing-row cases are logged and skipped appropriately

234-269: LGTM! Clean API design.

Good refactoring with the send_msg helper to avoid duplicating the channel-send + warning-guard logic. The public API clearly distinguishes full replacement (queue) from R/M/W (queue_side_effects_upload).

crates/store-dir/src/upload.rs (4)

1-21: LGTM! Well-documented module.

Clear module documentation with upstream pnpm references, and focused imports without star-imports.


23-36: LGTM! Clean error type and algorithm constant.

The UploadError enum follows the crate's error patterns with derive_more and miette, and HASH_ALGORITHM provides a single source of truth for the algo check in both upload and writer task.


59-74: LGTM! Clean orchestrator.

The function correctly delegates the heavy lifting: add_files_from_dir for rehashing and queue_side_effects_upload for the R/M/W (which happens inside the writer task for commutativity). This addresses the stale-read race from the prior review.


89-118: LGTM! Correct diff algorithm.

The set-difference logic correctly identifies added, modified, and deleted files. Using BTreeSet for deterministic iteration ensures deleted is sorted, while added's stability is handled by the serialize_sorted_map_opt serializer.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

crates/package-manager/src/install_frozen_lockfile.rs:102

  • StoreIndexWriter::spawn is awaited only on the happy path. Any ? return before the final drop(store_index_writer)/writer_task.await (e.g. CreateVirtualStore/BuildModules errors) will drop the JoinHandle and may skip flushing queued rows (and may leave a blocking task running until runtime shutdown). Consider using a guard/RAII pattern so the sender is dropped and the writer task is awaited on all exit paths.
        // Spawn the batched store-index writer here so it lives
        // across both the prefetch/download phase (consumers in
        // `CreateVirtualStore`) and the build phase (the new
        // side-effects-cache WRITE-path upload site in
        // `BuildModules`). We drop the orchestrator's clone and
        // await the join handle at the end of `run`, so the final
        // batch flushes once every queued row from both phases has
        // been processed. A writer open / task failure is degraded
        // to a `warn!` and the install still succeeds — pacquet's
        // existing best-effort stance on cache writes.
        let (store_index_writer, writer_task) = StoreIndexWriter::spawn(&config.store_dir);

        let CreateVirtualStoreOutput { package_manifests, side_effects_maps_by_snapshot } =
            CreateVirtualStore {
                http_client,
                config,
                packages,
                snapshots,
                logged_methods,
                requester,
                store_index_writer: &store_index_writer,
            }
            .run::<R>()
            .await
            .map_err(InstallFrozenLockfileError::CreateVirtualStore)?;

Comment thread crates/store-dir/src/store_index.rs Outdated
Comment thread crates/store-dir/src/store_index.rs Outdated
Comment thread crates/store-dir/src/store_index.rs
Comment thread crates/package-manager/src/build_modules/tests.rs
zkochan added 2 commits May 12, 2026 17:39
- Preserve same-batch Replace on algo-mismatch (copilot).
  `SideEffectsUpload`'s previous shape removed the row from
  `pending` before the algo check, then `continue`'d without
  reinserting — dropping any earlier-in-batch `Replace` for the
  same key. Switched to a `HashMap::entry()` flow so the row stays
  in `pending` across every short-circuit; the side-effects
  mutation is what gets suppressed, not the row write.
- Strengthen the upload-error test (copilot). Now seeds a base
  `PackageFilesIndex` row before the install runs and asserts
  the row's `side_effects` stays `None` after the swallowed
  upload error, matching upstream's `filesIndex2.sideEffects
  toBeFalsy()` at sideEffects.ts:186.
Cold-install benchmark on PR #424 showed a ~5% regression vs main
(2.446s vs 2.333s on Linux). Tracked it to `write_gate_active`
firing on every install — default config has
`side_effects_cache_write = true`, `engine_name = Some` when node
is on PATH, and `store_index_writer = Some` always; so the dep
graph was being built O(|snapshots|) even for pure-JS installs
with zero buildable packages.

The WRITE-path upload site only ever runs for snapshots whose
`requires_build` is true (it's the same loop body that gates on
that bit). So if no snapshot requires a build, the dep graph is
dead work — the upload-site closure never fires.

Gate `write_gate_active` on `requires_build_map.values().any(|&v| v)`
to restore the pre-PR behavior on cold installs without
postinstalls.

Behaviour preserved when at least one snapshot requires a build:
the dep graph still gets built and the WRITE-path runs as before.
The existing `write_path_*` tests use the
`failing-postinstall` / `postinstall-modifies-source` fixtures
which produce `requires_build = true`, so they continue to
exercise the active-gate path.
Copilot AI review requested due to automatic review settings May 12, 2026 15:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 2 comments.

Comment thread crates/package-manager/src/build_modules.rs
Comment thread crates/store-dir/src/add_files_from_dir.rs Outdated
When a lockfile has only a handful of `requires_build` snapshots,
`build_deps_graph` was still walking every entry in `snapshots:` —
needless work for the cache-key compute, which only ever recurses
into the children of `requires_build` roots via `calc_dep_state`.

Add a sibling `build_deps_subgraph` that takes an iterator of root
keys and BFS-walks the forward closure (dependencies + optional
dependencies). `BuildModules` now feeds in the `requires_build`
keys; the resulting graph contains exactly the nodes
`calc_dep_state` will visit and no more.

For a pure-JS install (zero `requires_build` keys), the iterator
is empty and the function returns an empty graph in O(0). For an
install with a single leaf-dep postinstall, the walk visits just
that snapshot. For a postinstall on a top-level dep, the walk
visits the full subgraph below it — still bounded to what
`calc_dep_state` would have traversed anyway.

Upstream's `lockfileToDepGraph`
(pnpm/pnpm@7e3145f9fc:deps/graph-builder/src/lockfileToDepGraph.ts:123-170)
builds the full graph because its consumers extend well beyond
cache hashing (direct-deps map, hierarchy + hoisting, bin linking).
In pacquet today the only consumer is `calc_dep_state`, so the
trimmed walk produces identical cache keys with much less work.

Tests cover empty-roots → empty graph, forward-closure inclusion +
exclusion, and cycle termination.
Copilot AI review requested due to automatic review settings May 12, 2026 16:05
`build_deps_subgraph` already returns an empty graph in O(0) when
the root iterator is empty (which is the case for pure-JS installs
where no snapshot is `requires_build`). The extra
`any_requires_build` check on `cache_gate_active` and
`write_gate_active` was belt-and-suspenders that the subgraph
builder makes unnecessary — same behavior on every input, less
code.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.

Comment thread crates/store-dir/src/store_index.rs Outdated
- Sort maps in the msgpackr-records encoder for real byte stability
  (coderabbit). The previous "fix" used a serde `serialize_with`
  helper, but `StoreIndex::set_many` writes via the bespoke
  `encode_package_files_index` which iterates `HashMap`s directly —
  the serde attribute was dead code. Now `files`, `side_effects`,
  and `SideEffectsDiff.added` all iterate via a `sorted_by_key`
  helper that emits in lexicographic key order, so row payloads
  are byte-stable across pacquet writes of the same logical state.
  Drops the now-unused `serialize_sorted_map_opt` helper and the
  misleading doc-comments.
- Add `store_dir.is_some()` to `write_gate_active` (copilot). The
  upload site is later gated on both `store_index_writer` AND
  `store_dir`; this matches the gate to the use sites.
- Clarify `WalkCtx.visited` semantics (copilot). It's a
  recursion-stack set (entries inserted on descend, removed on
  return), not a global visited set. Updated the doc-comment to
  spell this out and reference upstream's same shape.

Also extract `apply_write_msg` / `load_pending_row` out of
`StoreIndexWriter::spawn` to flatten 7-8 levels of nesting inside
the spawn-blocking closure — same control flow, easier to read.
@zkochan zkochan merged commit 55f6f76 into main May 12, 2026
16 checks passed
@zkochan zkochan deleted the feat/side-effects-cache-write-path-421 branch May 12, 2026 16:32
KSXGitHub pushed a commit that referenced this pull request May 13, 2026
Pull in 28 commits from upstream main, including the
`pacquet-npmrc` → `pacquet-config` rename (#420) plus features:
- supportedArchitectures + --cpu/--os/--libc (#456)
- frozen-lockfile (#442, #443, #447, #450)
- git-fetcher (#436 / #446, #451, #454)
- side-effects cache (#421 / #422, #423, #424)
- real-hoist + global-virtual-store (#432 / #438 / #444, #449, #452)
- patchedDependencies + allow-builds (#425, #427, #428)
- engine/platform installability (#434 / #439)

Conflicts resolved:
- `crates/npmrc/` files migrated under the renamed
  `crates/config/` directory; `Npmrc` → `Config` everywhere
  except `NpmrcAuth` (which keeps the `.npmrc`-domain name).
- `Config::current` reads the env-var DI generic `Api: EnvVar`
  for ${VAR}-substitution in `.npmrc`. Production turbofish in
  `cli_args.rs` is `Config::current::<RealApi, _, _, _, _>(...)`.
- Two-phase `NpmrcAuth::apply_*` retained so default-registry
  creds key at the yaml-resolved registry URL.
- New `Config::auth_headers` field plumbed through
  `install_package_by_snapshot`'s `DownloadTarballToStore`.
- Tests under `crates/config/src/workspace_yaml/tests.rs`
  pick up the new ParseYaml unit test added on this branch.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants