perf(fs): port pnpm v11 writeBufferToCafs into ensure_file by zkochan · Pull Request #279 · pnpm/pacquet

zkochan · 2026-04-24T21:15:14Z

Summary

Ports pnpm v11's writeBufferToCafs shape into pacquet's CAFS writer, replacing the old if exists → skip, else create-and-overwrite pattern with the full exclusive-create + integrity-verify + atomic-rewrite sequence upstream uses.

References:

store/cafs/src/writeBufferToCafs.ts
store/cafs/src/writeFile.ts
store/cafs/src/checkPkgFilesIntegrity.ts (verifyFileIntegrity)

New sequence in `ensure_file`

Try O_CREAT | O_EXCL open (OpenOptions::create_new(true)). Success → we own the file, write content directly. Matches pnpm's writeFileExclusive.
ErrorKind::AlreadyExists → re-read the file and byte-compare with content. CAS paths are hash-derived, so matching bytes imply matching digest; this is the pacquet equivalent of pnpm's verifyFileIntegrity, cheaper because we hold the expected bytes already.
Match → Ok(()). The file is a live CAS entry; leaving it alone is correct.
Mismatch (torn blob from a prior crashed install) → write_atomic: open a unique temp path next to the target, write content, fs::rename over. Rename is atomic on Unix (rename(2)) and replaces-in-place on Windows, so observers never see a half-written blob. Matches pnpm's writeFileAtomic + renameOverwriteSync.
Any other open error propagates as CreateFile.

Temp path scheme mirrors pnpm's pathTemp: {basename with -suffix stripped}{pid}{counter}. Counter is a process-local monotonic AtomicU64 (replaces pnpm's Node.js workerThreads.threadId). -exec → x suffix transform preserved via strip_dash_suffix, so temp files don't look like executable CAS entries.

Deliberate divergences from pnpm v11

All documented inline on the ensure_file doc comment:

No upfront stat. create_new/AlreadyExists gives the exists-signal for free. Saves one syscall per CAS file on cold installs.
Byte-compare instead of crypto.hash. We hold the buffer, so equality check gives the same correctness guarantee without a second full-buffer walk.
No locker: Map<string, number> process-local cache. Pacquet's hot path calls ensure_file at most once per CAS file per install (the StoreIndex decides whether we even get here), so the locker would be empty work. Can revisit if profiling shows repeated AlreadyExists hits on a single path.

Why it's safer

Old shape used OpenOptions::create(true) which silently overwrites a pre-existing file. Combined with the if exists → skip short-circuit, a torn blob from a crashed prior install would persist across new installs and get served downstream as-if correct. The new shape self-heals: a torn blob at a CAS path is detected via byte-compare and atomically replaced with the known-good contents we were about to write.

Why it's faster

On cold installs where every CAS file is new, the .exists() stat always returned false — pure waste. Removed. Same syscall shape as before on the new-file path (single open with O_CREAT|O_EXCL), one syscall less overall.

On warm hits during tarball extraction (rare — StoreIndex usually short-circuits before we get here) the shape is create_new → read → byte-compare. Same cost as pnpm's stat → verifyFileIntegrity, and since we byte-compare instead of re-hashing we save a full-buffer walk.

Test plan

7 targeted tests cover new-file write, matching-content short-circuit, mismatched-content atomic rewrite, missing-parent error, Unix mode, -exec → x temp-name transform, and plain-basename temp-name passthrough.
Existing shard_cache_populates_on_first_write_and_skips_mkdir_thereafter in store-dir still passes — covers the AlreadyExists branch through the CAS writer.
just ready — 177+ tests + clippy + fmt green.
cargo nextest run -p pacquet-cli --test install --run-ignored=ignored-only frozen_lockfile_should_be_able_to_handle_big_lockfile — end-to-end against the 1352-snapshot fixture in ~50 s.

Ports pnpm v11's `writeFileExclusive` shape (`store/cafs/src/writeFile.ts:21-28` and the `EEXIST` branch in `writeBufferToCafs.ts:57-68`) into pacquet's CAFS writer. The old shape paid a `file_path.exists()` stat on every call before the open; on a cold install where essentially every CAS file is new the stat always returned `false`, so it was pure waste. Replace it with `OpenOptions::create_new(true)` (`O_CREAT|O_EXCL` on Unix, equivalent disposition on Windows) and map `ErrorKind::AlreadyExists` to `Ok(())`. The warm-cache case (file already at its hash-derived path) and the concurrent-writer race (another install process on the same store got to this entry first) are both correct no-ops because the path itself is the integrity assertion — contents at `files/XX/…` are by definition the sha512 digest recorded in the filename, so a present file has the bytes we were about to write. pnpm still stats first because it runs `verifyFileIntegrity` on the existing file to detect torn mid-write blobs from a crashed prior install; pacquet doesn't re-verify individual CAS files on write (the tarball-level ssri check has already passed for the batch these bytes came from), so the stat has no job to do and can go. Torn-blob recovery remains an open concern tracked in `investigations/pacquet-macos-perf.md` §5 — the fix there is temp+rename, orthogonal to this syscall shape. Also flips `ensure_file` from the pre-PR `OpenOptions::create(true)` (which silently overwrites) to `create_new(true)` (refuses to overwrite, surfaces as AlreadyExists → Ok). A concurrent writer now can't silently clobber our partially-written bytes, which is a strictly better safety property even though the observable behavior doesn't change on success paths. Adds four targeted tests pinning: - new-file write puts bytes on disk (with mode on Unix); - pre-existing file is preserved byte-for-byte (not overwritten); - missing parent dir surfaces as CreateFile/NotFound so callers can't accidentally bypass the `ensure_parent_dir` contract.

codecov · 2026-04-24T21:19:12Z

Codecov Report

❌ Patch coverage is 81.92771% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.30%. Comparing base (af4b10d) to head (58215cd).

Files with missing lines	Patch %	Lines
crates/fs/src/ensure_file.rs	81.92%	45 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #279      +/-   ##
==========================================
- Coverage   89.67%   89.30%   -0.37%     
==========================================
  Files          61       61              
  Lines        5074     5313     +239     
==========================================
+ Hits         4550     4745     +195     
- Misses        524      568      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-04-24T21:20:27Z

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     15.7±0.09ms   276.2 KB/sec    1.00     15.8±0.13ms   275.2 KB/sec

Copilot

Pull request overview

This PR optimizes the CAFS writer’s hot path by removing the upfront exists() stat in ensure_file and switching to an atomic exclusive create (O_CREAT|O_EXCL via OpenOptions::create_new), treating AlreadyExists as a successful no-op.

Changes:

Replace file_path.exists() + create(true) with create_new(true) and map AlreadyExists → Ok(()) in ensure_file.
Add targeted unit tests for new-file writes, existing-target preservation, missing-parent errors, and Unix mode handling.
Update CAFS writer test comments and add tempfile as a dev-dependency for the fs crate.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`crates/store-dir/src/cas_file.rs`	Updates test documentation to reflect the new `AlreadyExists` no-op behavior.
`crates/fs/src/ensure_file.rs`	Implements `create_new`-based exclusive creation, adds extensive rationale docs, and introduces new unit tests.
`crates/fs/Cargo.toml`	Adds `tempfile` under `dev-dependencies` for the new tests.
`Cargo.lock`	Locks the new `tempfile` dev-dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

First pass of this PR skipped integrity verification on the `AlreadyExists` branch on the theory that the hash-derived CAS path itself is enough of an assertion. That misses torn-blob recovery — a prior install that crashed mid-write leaves a file at the right path with wrong bytes, and a new install would happily skip over it, silently serving corrupt bytes downstream. Port pnpm v11's `writeBufferToCafs` full shape instead: 1. `O_CREAT|O_EXCL` open (unchanged from first pass) — success means we wrote the bytes ourselves. 2. `AlreadyExists` → re-read the file and byte-compare with `content`. CAS paths are hash-derived, so matching bytes imply matching digest; this is pnpm's `verifyFileIntegrity(fileDest, integrity)` check, cheaper because we already hold the expected bytes and can skip the re-hash. 3. Match → `Ok(())`. The file is a live CAS entry. 4. Mismatch (or `NotFound` on re-read from a concurrent cleaner) → `write_atomic`: open a unique temp path next to the target, write `content`, `fs::rename` over the target. Rename is atomic on Unix (`rename(2)`) and replaces-in-place on Windows, so observers never see a half-written blob. Matches pnpm's `writeFileAtomic` + `renameOverwriteSync`. Temp path scheme mirrors pnpm's `pathTemp`: `{basename with -suffix stripped}{pid}{counter}`, where counter is a process-local monotonic `AtomicU64` instead of pnpm's Node.js `workerThreads.threadId`. Strips `-exec` → appends `x` the same way pnpm's `removeSuffix` does, so temp files don't look like executable CAS entries to an observer scanning the shard. Deliberate divergences from pnpm v11 documented inline: - No upfront stat. `create_new`/`AlreadyExists` gives us the exists-signal for free; stat would be redundant. - Byte-compare instead of `crypto.hash`. We hold the buffer, so equality gives the same correctness guarantee without a second full-buffer walk. - No `locker: Map<string, number>` process-local cache. Pacquet's hot path calls `ensure_file` at most once per CAS file per install (the `StoreIndex` decides whether we even get here), so the locker would be mostly empty work. Adds three tests covering the new paths — matching-content short-circuit, mismatched-content atomic rewrite, temp-name scheme preserves pnpm's `-exec` → `x` suffix transformation.

Port the `EPERM|EACCES|EBUSY` arm of pnpm's `rename-overwrite` library (zkochan/packages/rename-overwrite/index.js) into the torn-blob-recovery `rename` call inside `ensure_file`. That arm is the one retry family that actually hits pacquet in practice: Windows Defender and other AV / file-indexer tools briefly hold the destination open when we try to rename our temp file over it, which fails with `ERROR_ACCESS_DENIED` or `ERROR_SHARING_VIOLATION`. Both surface through Rust's `io::ErrorKind` as `PermissionDenied` or `ResourceBusy`, and both clear on their own within tens to hundreds of milliseconds. Loop shape matches pnpm exactly: 60-second total budget, start at zero sleep, grow the per-attempt sleep by 10 ms each iteration, cap at 100 ms. Any non-transient error kind propagates immediately. Other retry arms in `rename-overwrite` (`ENOTEMPTY`/`EEXIST`/ `ENOTDIR` swap-rename, `ENOENT` mkdir-and-recurse, `EXDEV` copy- and-delete) are deliberately **not** ported — see the doc comment on `rename_with_retry` for the case-by-case reasoning. Short version: for this specific call site (temp and target in the same CAS shard dir, both files, no live readers), those arms have no work to do. Adds two tests pinning the behavior: - `transient_rename_error_classifier` rejects any `ErrorKind` other than `PermissionDenied` / `ResourceBusy`. A regression that classified e.g. `NotFound` as transient would pathologically spin for 60 seconds on a legitimately missing source. - `rename_with_retry_succeeds_fast_when_no_error` pins that the happy path returns immediately without sleeping — the first attempt's `Ok(())` short-circuits, which matters since the retry loop starts with a 10 ms step.

Three review fixes for `ensure_file`: 1. **Symlink scrub on AlreadyExists.** `open(O_CREAT|O_EXCL)` returns EEXIST on Unix even when the dirent is a symlink, so a tampered or backup-restored store could route a symlinked dirent into this function. The old `AlreadyExists` branch called `fs::read` (which follows symlinks) and byte-compared — a symlink pointing at a file with matching bytes would return Ok and leak into the CAS, where downstream `fs::hard_link` (which does *not* follow links on Linux) would hardlink the symlink itself rather than the target. Add a `fs::symlink_metadata` guard: anything that isn't a regular file (symlink, dir, fifo, socket, device) goes straight to `write_atomic`, whose `rename` replaces it with a real file. Two new tests pin the live-symlink and dangling-symlink paths. 2. **Stale doc comment on `writes_a_new_file`.** Said "with the requested mode" but the call passed `None`. Mode coverage lives in `unix_mode_is_applied_on_new_files`, so just drop the stray mode claim. 3. **Umask-flaky mode assertion.** `OpenOptionsExt::mode` runs the requested bits through the process umask; a restrictive shell (e.g. `umask 0o077`) would strip the group/other bits of `0o755` and fail `assert_eq!(mode, 0o755)`. Mask to `0o700` before comparing so we pin only the owner bits, which every sensible umask preserves. The observable property we care about (the owner-exec bit surviving for executable CAS blobs) is still covered.

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-24T21:53:01Z

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command	Mean [s]	Min [s]	Max [s]	Relative
`pacquet@HEAD`	3.021 ± 0.099	2.884	3.261	1.02 ± 0.04
`pacquet@main`	2.953 ± 0.073	2.831	3.099	1.00
`pnpm`	6.626 ± 0.049	6.545	6.670	2.24 ± 0.06

BENCHMARK_REPORT.json

{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 3.0212122107800004,
      "stddev": 0.09856034174777834,
      "median": 3.0235144727800005,
      "user": 2.47492144,
      "system": 3.4169718600000003,
      "min": 2.8835517217800004,
      "max": 3.26078146778,
      "times": [
        3.26078146778,
        3.0344525207800004,
        2.93472474178,
        3.04610774078,
        2.9733334417800004,
        3.0208532327800004,
        2.8835517217800004,
        3.0308735227800003,
        3.00126800478,
        3.02617571278
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.95334017168,
      "stddev": 0.07293697934140608,
      "median": 2.95019952178,
      "user": 2.4556847399999997,
      "system": 3.46581236,
      "min": 2.83073083878,
      "max": 3.09896800378,
      "times": [
        2.93955154478,
        3.01907811878,
        2.95913638878,
        2.98272415378,
        2.8814228777800004,
        2.92139074678,
        2.83073083878,
        2.94732690378,
        3.09896800378,
        2.95307213978
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    },
    {
      "command": "pnpm",
      "mean": 6.625729822379999,
      "stddev": 0.048884856673326205,
      "median": 6.64549974428,
      "user": 9.21092914,
      "system": 4.54136356,
      "min": 6.544548767779999,
      "max": 6.67014727678,
      "times": [
        6.66428713778,
        6.64522058478,
        6.5822429927799995,
        6.64577890378,
        6.66873724078,
        6.6638877687799996,
        6.544548767779999,
        6.55189055378,
        6.67014727678,
        6.6205569967799995
      ],
      "exit_codes": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    }
  ]
}

Five review fixes for `ensure_file` hardening: 1. **Streaming byte-compare.** Old `AlreadyExists` branch called `fs::read` and did a `Vec<u8>` equality check, doubling peak memory for every CAS entry it verified. For a 30 MB tarball entry that was 30 MB extra heap per concurrent worker. Replaced with `file_equals_bytes` — open, read 8 KB stack-buffer chunks, compare each against the corresponding slice of `content`, bail on first mismatch. Fixed-size stack buffer regardless of file size. Added a length short-circuit (`meta.len() != content.len()`) before we open the file at all. 2. **`O_EXCL` on the atomic-write temp path.** Old `write_atomic` opened with `create+truncate`, which follows symlinks and truncates whatever's there. An attacker (or a stale temp from a crashed prior install) pre-seeding the predicted temp path could cause us to truncate a file outside the store shard. Switch to `create_new(true)`; on `AlreadyExists` advance the pid + counter suffix and retry, up to 16 fresh names. Collisions are already vanishingly rare given the pid + per-process `AtomicU64` scheme, but the retry closes the adversarial / shared-store loophole without panicking. 3. **Windows-only rename-retry classifier.** Old `is_transient_rename_error` treated `PermissionDenied` and `ResourceBusy` as transient on every platform. On Unix, `rename` returning those codes is essentially always a permanent permission or mount-point issue, so the 60 s retry loop just stretches out the failure. Gate the classifier on `cfg!(windows)`; non-Windows returns `false` and propagates immediately. Doc comment and the classifier test updated to reflect the new contract. 4. **Split open / write error classification in `write_atomic`.** Previous shape wrapped both in a single `WriteFile` variant, making "couldn't create the temp file" and "couldn't write to it" indistinguishable. Separate the two call sites: open errors map to `CreateFile`, write errors stay as `WriteFile`. 5. **Drop flaky 10 ms timing assertion in `rename_with_retry_succeeds_when_no_error`.** On loaded CI / slow filesystems a successful first-attempt rename can exceed 10 ms of scheduling jitter even without the retry path firing. Correctness assertions only — the classifier test already pins the semantic property we cared about. Also file handle is now explicitly `drop`ped before `rename` to keep the sharing-violation window as small as possible on Windows. Two new tests cover the streaming compare: - `file_equals_bytes_classifies_match_mismatch_and_length_mismatch` pins the exact-match / content-diff / length-diff branches. - `file_equals_bytes_handles_multi_chunk_files` uses a 20 KB payload to exercise the inner `read_exact` loop across multiple chunks and plants a last-chunk mismatch to catch "first chunk matched, so I returned true" regressions.

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The `shard_cache_populates_on_first_write_and_skips_mkdir_thereafter` test comments still described the pre-review version of `ensure_file`, which swallowed `AlreadyExists` as `Ok(())` without verifying the existing bytes. Since b027412 (symlink-safety guard) and 1835847 (streaming byte-compare), the `AlreadyExists` branch routes through `verify_or_rewrite` and returns `Ok(())` only after byte-matching the existing file against the buffer; a mismatch takes the `write_atomic` torn-blob-recovery path instead. Update the doc block and the inline comment in the test to describe this actual flow, and point to the cross-crate test that pins the mismatch branch so a reader can follow the whole contract.

zkochan requested a review from Copilot April 24, 2026 21:16

Copilot started reviewing on behalf of zkochan April 24, 2026 21:16 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread crates/fs/src/ensure_file.rs

Comment thread crates/fs/src/ensure_file.rs Outdated

Comment thread crates/fs/src/ensure_file.rs Outdated

zkochan changed the title ~~perf(fs): drop exists() stat in ensure_file; use O_CREAT|O_EXCL~~ perf(fs): port pnpm v11 writeBufferToCafs into ensure_file Apr 24, 2026

zkochan added 2 commits April 24, 2026 23:35

zkochan requested a review from Copilot April 24, 2026 21:42

Copilot started reviewing on behalf of zkochan April 24, 2026 21:43 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread crates/fs/src/ensure_file.rs Outdated

Comment thread crates/fs/src/ensure_file.rs Outdated

Comment thread crates/fs/src/ensure_file.rs Outdated

Comment thread crates/fs/src/ensure_file.rs Outdated

Comment thread crates/fs/src/ensure_file.rs Outdated

zkochan requested a review from Copilot April 24, 2026 22:09

Copilot started reviewing on behalf of zkochan April 24, 2026 22:10 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread crates/store-dir/src/cas_file.rs Outdated

zkochan merged commit 71bf423 into main Apr 24, 2026
11 of 13 checks passed

zkochan deleted the defer-exists-stat branch April 24, 2026 22:22

zkochan mentioned this pull request Apr 24, 2026

perf: port pnpm v11 HTTP/fetch topology and worker-pool sizing #280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(fs): port pnpm v11 writeBufferToCafs into ensure_file#279

perf(fs): port pnpm v11 writeBufferToCafs into ensure_file#279
zkochan merged 6 commits into
mainfrom
defer-exists-stat

zkochan commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zkochan commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New sequence in ensure_file

Deliberate divergences from pnpm v11

Why it's safer

Why it's faster

Test plan

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Micro-Benchmark Results

Linux

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zkochan commented Apr 24, 2026 •

edited

Loading

New sequence in `ensure_file`

codecov Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading