perf(store-dir): pre-create CAFS shards at install bootstrap#276
Conversation
…otstrap Port pnpm's eager shard pre-creation plus its lazy per-directory cache into pacquet's CAFS write path so per-file `write_cas_file` calls don't pay `create_dir_all` (and its backing `stat`) on every entry of every tarball. - `StoreDir::init` eagerly mkdirs `v11/files/` and all 256 shards (`00`..`ff`) on a fresh store, gated by an `exists()` check on `files/` so a warm store pays only one stat. Matches the pattern pnpm introduced upstream in pnpm/pnpm@298e5dc (PR #8700). - `DashSet<u8>` on `StoreDir` caches which shards this process has already ensured. `StoreDir::init` seeds it with all 256 entries; `write_cas_file` consults it on every write and only falls back to `ensure_parent_dir` on a miss. Belt-and-suspenders matches pnpm's `store/cafs/src/writeFile.ts` `dirs` Set. - `ensure_file` is split into `ensure_parent_dir` (create dir tree, idempotent) plus `ensure_file` (open + write; parent must already exist). Only `write_cas_file` calls the former, and only when the shard isn't cached yet. - Wired into `install_without_lockfile::run` and `create_virtual_store::run` via `spawn_blocking`. Init failure degrades to a `warn!`; the lazy per-shard mkdir fallback inside `write_cas_file` still handles missed shards. No behavior change: CAFS layout, error semantics, and file-write ordering are unchanged. On a cold install of ~10k files this avoids ~9700 redundant `stat` syscalls. Tests cover fresh-store shard creation, warm-store cache seeding (sentinel survives init), and cas_file cache population.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #276 +/- ##
==========================================
- Coverage 89.88% 89.83% -0.06%
==========================================
Files 60 61 +1
Lines 4963 5065 +102
==========================================
+ Hits 4461 4550 +89
- Misses 502 515 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Micro-Benchmark ResultsLinux |
There was a problem hiding this comment.
Pull request overview
Bootstraps pnpm-compatible CAFS shard directories up front and adds a per-process shard cache to remove repeated create_dir_all/stat syscalls from the CAS write hot path during installs.
Changes:
- Add
StoreDir::init()to eagerly createv11/files/and (on cold stores) all 25600..ffshard directories, and seed a runtime shard cache. - Update
StoreDir::write_cas_fileto consult the shard cache and only mkdir parent dirs on a cache miss. - Split
pacquet_fs::ensure_fileintoensure_parent_dir+ensure_fileso callers can avoid per-file parent-dir creation overhead.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache state and StoreDir::init() plus tests around shard bootstrap/cache seeding. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir before writing CAS files; adds a shard-cache behavior test. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency for the shard cache. |
| crates/package-manager/src/install_without_lockfile.rs | Calls StoreDir::init() on the blocking pool at install start (with warn-on-failure intent). |
| crates/package-manager/src/create_virtual_store.rs | Same StoreDir::init() bootstrap for the virtual-store install path. |
| crates/fs/src/ensure_file.rs | Introduces ensure_parent_dir and updates ensure_file to assume parent exists. |
| Cargo.lock | Locks dashmap as a dependency of pacquet-store-dir. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…init io errors Copilot flagged two shapes on #276: - `StoreDir::init` used to seed the cache with all 256 shards even on the warm-store branch where it skipped the mkdirs. If a shard dir was missing (e.g. store created by an older pacquet that only materialized shards lazily), `write_cas_file` would trust the cache, skip `ensure_parent_dir`, and then `ensure_file` would blow up at `open` with `NotFound`. Warm-store `init` is now a near-noop that returns after the `files/` existence check without seeding — the lazy fallback inside `write_cas_file` populates the cache per shard on first write, matching pnpm's `writeFile.ts` `dirs` Set shape. Updated the `init_warm_store_*` test to pin the new invariant (cache stays empty on warm store). - Both bootstrap sites were only warning on `JoinError` from `spawn_blocking(move || store_dir.init()).await`; an `io::Error` returned by `init` itself arrived as `Ok(Err(_))` and was silently dropped. Switched both to a full `match` that warns on both error layers, so permission-denied / disk-full failures are still diagnosable. No behavior change on the happy path; fresh-store init still creates all 256 shards and seeds the cache. Tests: 191/191 pass, 0 warnings.
There was a problem hiding this comment.
Pull request overview
This PR ports pnpm’s CAFS shard-directory bootstrap into pacquet to remove per-file directory creation from the CAS write hot path by pre-creating shard directories and caching shard “ensured” state per process.
Changes:
- Add
StoreDir::init()to pre-createv11/files/and all 256 shard directories (00..ff) on fresh stores and seed an in-memory shard cache. - Update
write_cas_fileto consult the shard cache and only call directory creation on a shard miss (using newly splitensure_parent_dir). - Wire
StoreDir::init()into install bootstrapping viaspawn_blocking, degrading failures to warnings.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache to StoreDir and introduces StoreDir::init() plus tests for cold/warm store behavior. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir to avoid repeated create_dir_all in write_cas_file; adds a test. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency to support the shard cache. |
| crates/package-manager/src/install_without_lockfile.rs | Calls StoreDir::init() during install startup on the blocking pool with warning-only failure handling. |
| crates/package-manager/src/create_virtual_store.rs | Same StoreDir::init() bootstrap wiring for lockfile-based installs. |
| crates/fs/src/ensure_file.rs | Splits directory creation into ensure_parent_dir; updates ensure_file contract to require parent directory exists. |
| Cargo.lock | Records dashmap as a dependency of pacquet-store-dir. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two more shapes from the Copilot review on #276: - `StoreDir::init` gated on `files.exists()`, which returns `true` for any path type. If `v11/files` was a regular file or broken symlink (store corruption, half-written bootstrap), init would silently noop and every later `write_cas_file` would fail with a cryptic per-file `open` error. Gate on `is_dir()` instead — the happy path is unchanged, and the corrupt case falls through to `create_dir_all` which surfaces a clear "not a directory" error from the kernel that the caller already degrades to `warn!`. Added `init_rejects_non_directory_files_path`. - `shard_cache_populates_on_first_write_and_skips_mkdir_thereafter` had a doc comment claiming the test removes the parent directory between writes — it never does. Rewrote the comment to describe what the test actually pins (lazy cache population, warm-cache `exists()` noop, follow-up write in the same or a new shard) and moved the "recovering from out-of-band rmdir is out of scope" caveat up front so it doesn't contradict the preceding paragraph. No behavior change on the happy path. Tests: 192/192 pass, 0 warnings.
There was a problem hiding this comment.
Pull request overview
This PR ports pnpm’s CAFS shard-directory bootstrap into pacquet to reduce per-file directory creation overhead when writing to the store CAS (v11/files/XX/...), by eagerly creating all 256 shard directories at install start and adding a per-process shard “ensured” cache for the lazy fallback path.
Changes:
- Add
StoreDir::init()to pre-createv11/files/and all00..ffshard directories on fresh stores and seed a per-process shard cache. - Split
pacquet_fs::ensure_fileso parent directory creation can be done separately (ensure_parent_dir) and skipped on hot paths via the shard cache. - Call
StoreDir::init()during install bootstraps (install_without_lockfileandcreate_virtual_store) onspawn_blocking, degrading failures to warnings.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache to StoreDir, manual Eq/PartialEq, and StoreDir::init() plus tests. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir to avoid per-write create_dir_all; adds a cache behavior test. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency for DashSet. |
| crates/package-manager/src/install_without_lockfile.rs | Runs StoreDir::init() at install startup (blocking pool) with warning-on-failure behavior. |
| crates/package-manager/src/create_virtual_store.rs | Runs StoreDir::init() at virtual-store creation startup (blocking pool) with warning-on-failure behavior. |
| crates/fs/src/ensure_file.rs | Splits parent dir creation into ensure_parent_dir and updates ensure_file semantics/docs. |
| Cargo.lock | Records the new dashmap dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two more from Copilot on #276: - The per-shard `AlreadyExists` branch in `StoreDir::init` used to mark the shard as ensured unconditionally. A regular file or non-dir symlink squatting on `v11/files/XX` (parallel process corruption, tampered store) would be accepted silently, and `write_cas_file` would later trust the cache, skip `ensure_parent_dir`, and fail at `open` with a cryptic error. After catching `AlreadyExists`, verify `shard_dir.is_dir()` — if not, return `io::Error { kind: AlreadyExists, ... }` naming the offending path. Mirrors the top-level `is_dir()` gate on `files/` and keeps failure diagnosable at install bootstrap. - The `init` rustdoc still said "gated by an `exists()` check"; previous round switched to `is_dir()` but the doc wasn't updated. Fixed, with a note explaining why `is_dir()` is tighter (non-dir corruption at `files/` no longer silently noops). Also updated the "benign `AlreadyExists`" paragraph to mention the new non-dir rejection. No happy-path change. Tests: 192/192 pass, 0 warnings. The per- shard guard is defense-in-depth for a narrow race (regular file planted at `files/XX` between `create_dir_all(files)` and the loop's `create_dir(shard)`) that's hard to reproduce reliably in a unit test; the top-level `init_rejects_non_directory_files_path` test exercises the same "non-dir at a shard-path position" failure mode via `files/` itself.
There was a problem hiding this comment.
Pull request overview
This PR ports pnpm’s CAFS shard-directory bootstrap into pacquet to avoid per-file create_dir_all/stat overhead on the CAS write hot path by pre-creating v11/files/00..ff and caching ensured shards per process.
Changes:
- Add
StoreDir::init()to eagerly createv11/files/plus all 256 shard directories and seed a per-process shard cache. - Update
StoreDir::write_cas_file()to consult the shard cache and only mkdir on a cache miss. - Split
pacquet_fs::ensure_fileintoensure_parent_dir+ensure_file, and wireStoreDir::init()into install bootstrap (spawn_blocking) with warn-on-failure degradation.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache to StoreDir, implements init() to pre-create CAFS shards, and adds unit tests for init behavior. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir to avoid per-write directory creation syscalls; adds a cache behavior test. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency for DashSet shard cache. |
| crates/package-manager/src/install_without_lockfile.rs | Calls StoreDir::init() at install bootstrap on the blocking pool with warn-on-failure fallback. |
| crates/package-manager/src/create_virtual_store.rs | Calls StoreDir::init() at install bootstrap on the blocking pool with warn-on-failure fallback. |
| crates/fs/src/ensure_file.rs | Splits directory creation into ensure_parent_dir; updates ensure_file to require an existing parent directory. |
| Cargo.lock | Records the new dashmap dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…to-dir Copilot flagged that `StoreDir::init`'s doc said "regular file or symlink squatting" while the implementation uses `Path::is_dir`, which follows symlinks and accepts a symlink pointing at a real directory. Tighten the wording on both the rustdoc and the inline comment to name exactly what's rejected — regular file, non-dir symlink, broken symlink — and explain why following symlinks is desired (ops folks occasionally spread a store across disks via symlinked shards). No behavior change.
CI's Doc step failed on the previous commit with: error: unresolved link to `Path::is_dir` error: could not document `pacquet-store-dir` `std::path::Path` isn't in scope at the rustdoc site, so [`Path::is_dir`] couldn't resolve. Use the same reference-style pattern already in use elsewhere in the file for `AlreadyExists` / `std::io::ErrorKind::AlreadyExists`. Verified locally with `RUSTDOCFLAGS='-D warnings' cargo doc --no-deps --workspace`.
There was a problem hiding this comment.
Pull request overview
This PR improves install performance by moving CAFS shard-directory creation off the CAS write hot path and into an install bootstrap step, while keeping a lazy fallback for robustness.
Changes:
- Add
StoreDir::init()to pre-createv11/files/00..ffon fresh stores and seed a per-process shard cache. - Update
StoreDir::write_cas_file()to consult the shard cache and only callcreate_dir_all(viaensure_parent_dir) once per shard per process. - Split
pacquet_fs::ensure_filedirectory creation into a newensure_parent_dirhelper, and add tests covering init + cache behavior.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache to StoreDir, implements init() pre-creation of 256 shards, and adds unit tests. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir before ensure_file to avoid per-file create_dir_all. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency for DashSet. |
| crates/package-manager/src/install_without_lockfile.rs | Calls StoreDir::init() on a blocking task at install bootstrap with warn-on-failure degradation. |
| crates/package-manager/src/create_virtual_store.rs | Same bootstrap call to StoreDir::init() for lockfile installs. |
| crates/fs/src/ensure_file.rs | Introduces ensure_parent_dir and makes ensure_file require pre-existing parent dir. |
| Cargo.lock | Locks the new dashmap dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…helper Two Copilot comments on #276 pointed out the shard-init bootstrap block was duplicated verbatim in `install_without_lockfile.rs` and `create_virtual_store.rs`, and that an earlier round had them drift (different warning wording), which would make grepping the install log frustrating. Move the `spawn_blocking` + nested-Result degradation into a new `crates/package-manager/src/store_init.rs` module exposing `init_store_dir_best_effort(&'static StoreDir) -> impl Future`. Both call sites now delegate to it, so log text and error policy stay in sync. Also drops a `dbg!(err)` from `init_rejects_non_directory_files_path` that would have printed noisy output on every test run (third Copilot comment). `expect_err` already asserts the failure; the error-kind check isn't portable enough to pin, and the helper already surfaces it via `warn!` at runtime.
There was a problem hiding this comment.
Pull request overview
This PR improves install-time performance by eliminating per-file create_dir_all calls on the CAFS write hot path, via an eager shard-directory bootstrap plus a per-process shard cache fallback.
Changes:
- Add
StoreDir::init()to pre-createv11/files/00..ffon fresh stores and seed a per-process shard cache. - Update
StoreDir::write_cas_file()to consult the shard cache and only call directory creation on cache misses. - Split
pacquet_fs::ensure_fileintoensure_parent_dir(mkdirs) +ensure_file(open/write only), and wire best-effort init into install entrypoints.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| crates/store-dir/src/store_dir.rs | Adds shard cache to StoreDir and implements StoreDir::init(); adds unit tests for fresh/warm/corrupt store cases. |
| crates/store-dir/src/cas_file.rs | Uses shard cache + ensure_parent_dir to avoid per-file directory creation; adds a shard-cache behavior test. |
| crates/store-dir/Cargo.toml | Adds dashmap dependency for the shard cache. |
| crates/package-manager/src/store_init.rs | Introduces a shared best-effort init helper that runs StoreDir::init() on the blocking pool and degrades failures to warn!. |
| crates/package-manager/src/lib.rs | Wires in the new store_init module. |
| crates/package-manager/src/install_without_lockfile.rs | Calls best-effort store init early in the install flow. |
| crates/package-manager/src/create_virtual_store.rs | Calls best-effort store init early in the install flow. |
| crates/fs/src/ensure_file.rs | Splits directory creation into ensure_parent_dir and updates ensure_file contract to require an existing parent dir. |
| Cargo.lock | Records the added dashmap dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.9905540765399996,
"stddev": 0.08371367313954152,
"median": 2.97465014124,
"user": 2.4864520399999996,
"system": 3.4473722,
"min": 2.84730168874,
"max": 3.13433637474,
"times": [
3.02781327274,
2.96811811474,
3.13433637474,
3.04121835174,
2.90331397474,
3.07993730974,
2.84730168874,
2.9600099247400005,
2.98118216774,
2.9623095857400004
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pacquet@main",
"mean": 3.0109359629400005,
"stddev": 0.045417133981574126,
"median": 3.0089151552400004,
"user": 2.47616994,
"system": 3.650548299999999,
"min": 2.9500667267400003,
"max": 3.09439446374,
"times": [
3.0253761957400003,
3.03594126074,
2.9577191277400003,
3.09439446374,
3.04032377474,
2.99245411474,
3.0482582057400003,
2.98120087374,
2.9500667267400003,
2.9836248857400003
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pnpm",
"mean": 6.575236008439999,
"stddev": 0.0706514618282887,
"median": 6.54818689124,
"user": 9.08521114,
"system": 4.4358336,
"min": 6.47638524174,
"max": 6.672331675740001,
"times": [
6.63825211674,
6.54322241574,
6.55123173774,
6.507102547740001,
6.672331675740001,
6.47638524174,
6.51835651874,
6.65855978674,
6.64177599874,
6.54514204474
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
]
} |
Summary
Ports pnpm's CAFS shard-directory bootstrap into pacquet to eliminate per-file
create_dir_all(and its backingstat) on the hot path.StoreDir::initmkdirsv11/files/and all 256 shard directories (00..ff) at install start, gated by anexists()check onfiles/so a warm store pays only one stat. Matches pnpm/pnpm@298e5dc (#8700)initStoreinworker/src/start.ts.DashSet<u8>onStoreDirrecords which shards this process has ensured;write_cas_fileconsults it and only falls through toensure_parent_diron a miss. Matches pnpm's belt-and-suspendersdirsSet instore/cafs/src/writeFile.ts.ensure_file—ensure_parent_dircreates the dir tree,ensure_fileopens + writes; the CAFS caller uses only the latter once the shard is cached.install_without_lockfile::runandcreate_virtual_store::runonspawn_blocking; init failure degrades towarn!, lazy fallback still handles it.Also includes one incidental fix, in its own commit: the
default_store_dirtests inpacquet-npmrcfailed when the contributor's shell hadPNPM_HOMEpre-set, becausePNPM_HOMEshadows theXDG_DATA_HOMEbranch. Snapshot+restore both vars around the bodies.No behavior change: CAFS layout, error semantics, and file-write ordering are unchanged. Reduces ~9700 redundant
statsyscalls on a cold install of ~10k files (one per CAS write after the first in each shard).Test plan
just readygreen (180 tests, 0 warnings, fmt clean)init_creates_all_256_shards_and_populates_cache,init_warm_store_seeds_cache_without_recreating_shards,shard_cache_populates_on_first_write_and_skips_mkdir_thereafterpackages_under_orgs_should_work(live npmjs download) still extracts@fastify/error@3.3.0correctlyjust integrated-benchmarkvs pnpm on macOS to quantify the cold-install improvement