fix(pacquet/fs): serialize concurrent CAS writes to the same path#11758
Conversation
Two install snapshots whose tarballs ship identical file content
(e.g. a shared LICENSE across sibling packages like
`@pnpm.e2e/hello-world-js-bin` and `@pnpm.e2e/hello-world-js-bin-parent`)
hash to the same CAS path and call `ensure_file` concurrently. Without
serialization the second writer's `O_CREAT|O_EXCL` hits `AlreadyExists`
while the first writer is mid-`write_all`, falls into
`verify_or_rewrite`'s `meta.len() != content.len()` arm, and runs
`write_atomic` — which renames a temp file over the live source.
On Linux/ext4 the partial-size observation is fast enough that this
window opens often, surfacing as flaky CI failures of the form:
failed to import "<store>/v11/files/65/<hash>" to ".../LICENSE":
No such file or directory (os error 2)
emitted by `link_file` whose `reflink`/`fs::hard_link` raced the
rename. macOS/APFS and Windows tests pass because their `stat` cadence
and CI runner parallelism don't reliably open the window.
Port pnpm v11's `locker: Map<string, number>` semantics — slightly
stronger in pacquet, as a per-path `Mutex` rather than a dedup cache —
so two writers of the same CAS path serialize through it. The second
caller acquires the lock after the first writer's `write_all` has
finished, then takes the byte-match fast path inside `verify_or_rewrite`
and never has to rewrite. The previous docstring at the bottom of the
"Differences from pnpm" list explicitly acknowledged the locker
omission and predicted it would matter; this change closes that gap.
Add a regression test (`concurrent_writers_of_same_path_do_not_swap_the_inode`)
that fires 32 threads at one path with identical content and asserts
the inode never swaps — the observable signal that no writer ever took
the `write_atomic` rename path.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
🧰 Additional context used📓 Path-based instructions (1)pacquet/**/*.rs📄 CodeRabbit inference engine (pacquet/AGENTS.md)
Files:
🔇 Additional comments (1)
📝 WalkthroughWalkthroughThis PR adds a DashMap-backed, process-local per-path mutex to serialize concurrent ensure_file writers, integrates mutex acquisition into ensure_file, and adds a Unix-only concurrency test verifying inode and content invariants. ChangesPer-path concurrency serialization
Sequence Diagram(s)sequenceDiagram
participant Writer as Writer Thread
participant LockMap as cas_write_lock
participant Mutex as per-path Mutex
participant FS as Filesystem
Writer->>LockMap: ensure_file(path) -> get_or_insert Arc<Mutex>
LockMap->>Mutex: return Arc<Mutex> for path
Writer->>Mutex: acquire lock (guard)
Writer->>FS: open / verify / atomic-rewrite
FS-->>Writer: write result
Writer->>Mutex: release lock
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11758 +/- ##
==========================================
- Coverage 90.12% 89.64% -0.49%
==========================================
Files 146 151 +5
Lines 16876 17578 +702
==========================================
+ Hits 15209 15757 +548
- Misses 1667 1821 +154 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
`ensure_file` is public; references to private `verify_or_rewrite`, `write_atomic`, and `cas_write_lock` only resolve under `--document-private-items`, which trips `rustdoc::private-intra-doc-links` under `-D warnings`. Drop the link form and keep the names as plain backticked identifiers — the docstring still reads correctly and `cargo doc` no longer fails.
Address review feedback on #11758 — single-letter constants are opaque; `WRITER_COUNT` reads as the loop count of concurrent writers the test fires at one CAS path.
Micro-Benchmark ResultsLinux |
KSXGitHub
left a comment
There was a problem hiding this comment.
Make the documentation more concise and focused.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.3847090452,
"stddev": 0.10875690455721122,
"median": 2.358349492,
"user": 2.80250764,
"system": 3.6088811799999996,
"min": 2.312693414,
"max": 2.687785175,
"times": [
2.312693414,
2.374138229,
2.326695113,
2.355275498,
2.687785175,
2.368928559,
2.359946387,
2.378330254,
2.326545226,
2.356752597
]
},
{
"command": "pacquet@main",
"mean": 2.3184605131000002,
"stddev": 0.05100345041453707,
"median": 2.3370192475000002,
"user": 2.67949914,
"system": 3.5894638799999994,
"min": 2.236840649,
"max": 2.3811184770000002,
"times": [
2.3405539490000002,
2.255330449,
2.291562521,
2.3811184770000002,
2.333484546,
2.3498498260000003,
2.362569836,
2.269402859,
2.236840649,
2.363892019
]
},
{
"command": "pnpm",
"mean": 4.6067262121,
"stddev": 0.047260176659664886,
"median": 4.61934752,
"user": 7.818185440000001,
"system": 4.04494098,
"min": 4.51730027,
"max": 4.658805881,
"times": [
4.654355049,
4.608028232,
4.5711421020000005,
4.658805881,
4.628032748,
4.612539746,
4.644116067,
4.546786732,
4.626155294,
4.51730027
]
}
]
}Scenario: Frozen Lockfile (Hot Cache)
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.71534736988,
"stddev": 0.09168674480439003,
"median": 0.6855012609800001,
"user": 0.38330262,
"system": 1.5805075800000001,
"min": 0.6714006919800001,
"max": 0.97423969998,
"times": [
0.97423969998,
0.6944527369800001,
0.6832946829800001,
0.6810138529800001,
0.6794770239800001,
0.6901694069800001,
0.6714006919800001,
0.67832783598,
0.7133899279800001,
0.6877078389800001
]
},
{
"command": "pacquet@main",
"mean": 0.71391133818,
"stddev": 0.05316824439222206,
"median": 0.6955709834800001,
"user": 0.37812222,
"system": 1.5865728799999999,
"min": 0.67479088598,
"max": 0.84527939898,
"times": [
0.70326112598,
0.69740106698,
0.67479088598,
0.6937408999800001,
0.84527939898,
0.6884604029800001,
0.7006652129800001,
0.68018079998,
0.7696282989800001,
0.68570528898
]
},
{
"command": "pnpm",
"mean": 2.52884831268,
"stddev": 0.07903863866586336,
"median": 2.52155315198,
"user": 3.00704162,
"system": 2.22444798,
"min": 2.39504738898,
"max": 2.6440563799800003,
"times": [
2.57754821998,
2.41684780498,
2.6208668789800003,
2.56222989398,
2.52346125598,
2.51964504798,
2.51747491198,
2.39504738898,
2.51130534398,
2.6440563799800003
]
}
]
} |
Address review feedback on #11758: drop body-narrating prose, keep only the contract and the non-obvious why. The test docstring no longer references "pre-fix code" so it reads independent of this PR's history.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pacquet/crates/fs/src/ensure_file/tests.rs`:
- Around line 339-343: The inode assertion only compares two metadata reads
taken after the operation and can miss an inode swap that happened mid-run;
modify the test in ensure_file/tests.rs to read and store the original inode
(e.g., call fs::metadata(&*path).ino() into a variable like original_ino) before
performing the join/operation and then assert that final_meta.ino() equals
original_ino after the operation, and keep the existing content and length
assertions (use the existing identifiers path, final_meta, and content to locate
and update the test).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: bd53df16-4545-4d4a-93e6-51047fab273f
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
pacquet/crates/fs/Cargo.tomlpacquet/crates/fs/src/ensure_file.rspacquet/crates/fs/src/ensure_file/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Agent
- GitHub Check: Run benchmark on ubuntu-latest
- GitHub Check: Lint and Test (windows-latest)
- GitHub Check: Run benchmark on ubuntu-latest
🧰 Additional context used
📓 Path-based instructions (1)
pacquet/**/*.rs
📄 CodeRabbit inference engine (pacquet/AGENTS.md)
pacquet/**/*.rs: When porting a function that firespnpm:<channel>events throughglobalLogger,logger.debug(), orstreamParser.write(), mirror the call site, payload, and ordering so the reporter parses pacquet's NDJSON the same way it parses pnpm's.
Declare a newtype wrapper for branded string types. Do not collapse the brand into a plainStringor&str.
If upstream always validates before construction, validate in pacquet's wrapper too. The wrapper must construct only viaTryFrom<String>and/orFromStr. Do not provide an infallible public constructor.
If upstream never validates, just brand for type-safety. Expose an infallibleFrom<String>(andFrom<&str>when convenient).
If upstream occasionally constructs without validation, exposefrom_str_uncheckedas an escape hatch alongside the validating constructor.
Match upstream serde behavior for branded types that cross JSON, YAML, or INI boundaries. Use#[serde(try_from = "String")]for deserialization and#[serde(into = "String")]for serialization.
Use#[derive(derive_more::From)]and#[derive(derive_more::Into)]for mechanical conversion impls. Fall back to manualimplonly when conversion needs custom logic.
String-literal unions should becomeenums, not newtype wrappers. Model closed sets of valid string values as enums.
Template literal types should be treated as branded strings with validation discipline from rules 2-5.
Choose owned vs. borrowed parameters to minimize copies. Widen to the most encompassing type (&Pathover&PathBuf,&strover&String) when it doesn't force extra copies.
PreferArc::clone(&x)/Rc::clone(&x)overx.clone()for reference-counted types, so the cost is visible at the call site.
Follow Rust API Guidelines for naming conventions.
Do not use star imports inside module bodies. Writeuse super::{Foo, bar}instead ofuse super::*;. Two forms stay allowed: external-crate preludes likeuse rayon::prelude::*;and root-of-module re-...
Files:
pacquet/crates/fs/src/ensure_file/tests.rspacquet/crates/fs/src/ensure_file.rs
🔇 Additional comments (2)
pacquet/crates/fs/Cargo.toml (1)
14-14: LGTM!pacquet/crates/fs/src/ensure_file.rs (1)
1-1: LGTM!Also applies to: 8-11, 155-161, 171-174, 201-214
There was a problem hiding this comment.
Pull request overview
This PR addresses a CI-only race in pacquet’s CAS writer where concurrent ensure_file calls targeting the same CAS path can trigger an unexpected atomic rewrite and break concurrent linking.
Changes:
- Serialize concurrent
ensure_filecalls per target path using a process-local lock map. - Add a regression test intended to cover concurrent writers to the same path.
- Add
dashmapas a dependency for the lock map implementation.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pacquet/crates/fs/src/ensure_file.rs |
Adds a process-local per-path mutex (via DashMap) and acquires it in ensure_file to serialize same-path writes. |
pacquet/crates/fs/src/ensure_file/tests.rs |
Adds a new concurrent-writer regression test (currently has correctness/compilation issues noted in comments). |
pacquet/crates/fs/Cargo.toml |
Adds dashmap dependency for the new lock map. |
Cargo.lock |
Updates lockfile to include dashmap for pacquet-fs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }) | ||
| }) | ||
| .collect(); | ||
|
|
||
| for handle in handles { | ||
| handle.join().expect("writer thread should not panic"); | ||
| } | ||
|
|
||
| let final_meta = fs::metadata(&*path).unwrap(); | ||
| assert_eq!(fs::read(&*path).unwrap(), *content); | ||
| assert_eq!(final_meta.len(), content.len() as u64); | ||
| assert_eq!(final_meta.ino(), fs::metadata(&*path).unwrap().ino()); |
There was a problem hiding this comment.
Two parts:
*contentmove: incorrect —assert_eq!(a, b)expands tomatch (&a, &b) { ... }so*contentis a place expression that's immediately re-borrowed via&. No move happens. The test compiles and the suite has been passing locally and in CI. Other pre-existing tests in this file use the same*arc_of_vecpattern.- Mid-run inode swap: agreed and applied in 5c571a7 — added per-thread
observed_inodescapture plus a pre-create that gives the assertion a reference point (original_ino) taken before the contended run.
Written by an agent (Claude Code, claude-opus-4-7).
Generated by Claude Code
| /// Returning `Arc<Mutex<()>>` (rather than a `Ref` into the map) so | ||
| /// the caller `.lock()`s after the shard guard from `entry()` is | ||
| /// released — otherwise a recursive lookup on the same shard could | ||
| /// deadlock. The map is never pruned: entries are one per unique | ||
| /// CAS path the install wrote, bounded by the working set. |
There was a problem hiding this comment.
Deliberate. Pacquet is a CLI that terminates after one install — the process lifetime equals one install's working set, so the map can't grow beyond that. The entries are small (a PathBuf + a 24-byte Arc<Mutex<()>> ≈ ~150 bytes each); even ~10k unique CAS files is well under 2 MiB, dwarfed by the rest of the install's working set.
A Weak-based cleanup pass would have to gate on "no outstanding lock holders", which means atomically checking refcount under the DashMap shard guard while holding the lock to insert — a real complexity cliff for no win on the only caller (ensure_file from a one-shot CLI). If pacquet ever grows a long-lived server mode, revisit then; until then the simpler form matches the lifecycle.
Written by an agent (Claude Code, claude-opus-4-7).
Generated by Claude Code
… test Address CodeRabbit's and Copilot's review feedback on #11758. The previous assert compared two metadata reads taken at the same moment after join — tautological. Pre-creating the file gives an `original_ino` reference taken before the contended run, and each writer also captures the path's inode immediately after its own `ensure_file` returns so a mid-run rename swap from another writer is visible to at least one of those observations.
Address `perfectionist::macro_trailing_comma` warning surfaced by the Dylint CI on #11758 — multi-line macro invocations must end with a trailing comma. (My earlier local `dylint --all` run missed this because the perfectionist library hadn't fully recompiled against the new test code.)
| // Pre-create so every contender below routes through | ||
| // `verify_or_rewrite` — the path that can swap the inode via | ||
| // `write_atomic`'s rename on a size mismatch. `original_ino` is | ||
| // the reference point both the post-run and per-writer inode | ||
| // observations are compared against. | ||
| ensure_file(&path, &content, None).expect("pre-create"); | ||
| let original_ino = fs::metadata(&*path).unwrap().ino(); | ||
|
|
There was a problem hiding this comment.
Fair point — applied in bc0d409: dropped the pre-create. The shape is now what you described (fresh dirent, races on the initial create/write).
A caveat I should be upfront about: any inode observation taken after ensure_file returns has already missed the rename window. In a single-rename race the late-comers all observe whatever the post-rename inode is and the test slips past. The per-thread observation does catch a multi-rename race (where renames continue happening while earlier observers have already captured), but a strict regression test would need an in-flight observer thread that polls during write_all, which I didn't add — the docstring now flags this as a smoke test rather than a tight regression.
Happy to add the polling-observer variant if you'd like a stronger test. Re: the unbounded-LOCKS comment, replied separately on that thread — deliberate given pacquet's one-shot CLI lifecycle.
Written by an agent (Claude Code, claude-opus-4-7).
Generated by Claude Code
Address Copilot review feedback on #11758. The pre-create made every contender take the byte-match fast path against a complete file, so the lock made no observable difference and the test couldn't even weakly distinguish lock from no-lock. Removing it leaves the fresh-dirent shape (`O_CREAT|O_EXCL` race + verify_or_rewrite for the rest), and the per-thread inode observations can now diverge under a multi-rename race without the lock. Honest about the limitation in the docstring: any observation taken after `ensure_file` returns has already missed the rename window, so a single-rename race converges on one inode and slips past. The test catches the multi-rename case and validates the "no deadlock, all writers see correct content" baseline.
Summary
Fixes a flaky CI failure on Ubuntu only (
Pacquet CI / Lint and Test (ubuntu-latest)) wherepacquet install --frozen-lockfileaborts with:Root cause
Two install snapshots whose tarballs ship identical file content (e.g. the shared
LICENSEbetween@pnpm.e2e/hello-world-js-binand@pnpm.e2e/hello-world-js-bin-parent) hash to the same CAS path and callensure_fileconcurrently. The race:O_CREAT|O_EXCLsucceeds, startswrite_all.O_CREAT|O_EXCL→AlreadyExists→verify_or_rewriteseesmeta.len() < content.len()(A is still writing) →write_atomicrenames a temp file over the live source.link_file'sreflink/fs::hard_linkagainst the source observes a transientNotFoundwindow during the rename.Why Ubuntu only: ext4/tmpfs report the partial size during a concurrent write fast enough to open the window; macOS/APFS and Windows CI runners' stat cadence and lower parallelism don't reliably open it.
The existing docstring in
ensure_file.rsexplicitly acknowledged that pacquet had omitted pnpm'slockermap under the (broken) assumption that "the hot path callsensure_fileat most once per CAS file per install." Sibling packages with shared files violate that.Fix
Port pnpm v11's
lockerto a per-pathMutex(slightly stronger than pnpm's dedup-only cache). Concurrent writers of the same CAS path now serialize, so the second caller'sverify_or_rewritealways takes the byte-match fast path and never rewrites.Test plan
concurrent_writers_of_same_path_do_not_swap_the_inodefires 32 threads at one path with identical content and asserts the inode never swaps — the observable signal that no writer took thewrite_atomicrename path.cargo test -p pacquet-fs— 25 passed.cargo clippy --workspace --all-targets -- -D warningsclean.cargo check --workspace --all-targetsclean.https://claude.ai/code/session_01UizmMquu22ZMzB7wrxswr4
Summary by CodeRabbit
Improvements
Tests