perf(git-fetcher): skip CAS re-import when packlist matches input (#436 follow-up)#479
Conversation
… follow-up) Port upstream's [`gitHostedTarballFetcher.ts:88-100`](https://github.com/pnpm/pnpm/blob/94240bc046/fetching/tarball-fetcher/src/gitHostedTarballFetcher.ts#L88-L100) fast path: when `resolution.path` is `None` and `packlist.len() == cas_paths.len()`, the materialized tree is byte-identical to the input CAS files and re-importing every entry through hash-dedup is wasted work. Two branches mirror upstream: - `!should_be_built`: queue a `PackageFilesIndex` row at the final key with `requires_build: Some(false)`, then hand the input `cas_paths` straight to the dispatcher. Upstream copies the `\traw` row to the prepared key; pacquet's tarball download doesn't write a raw row at this key today, so the row is synthesized from `cas_paths` instead — the CAS files themselves are already in place. - `should_be_built && ignore_scripts`: return input as-is *without* queueing a row, so subsequent installs re-check the build gate. The synthesized row's per-file digest comes from the CAS path itself (pnpm v11's `files/XX/<rest>[-exec]` layout makes this exact); size comes from `fs::metadata`, mode from the `-exec` suffix. No per-file `fs::read + sha512` — the whole point of the optimization.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
📝 WalkthroughWalkthroughAdds CAS path parsing helpers ( ChangesCAS Fast-Path Tarball Fetcher
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/git-fetcher/src/cas_io.rs`:
- Around line 241-252: cas_path_digest currently accepts any hex shard plus hex
stem; tighten it to match v11 CAS digests exactly by requiring the stem (after
stripping "-exec") to be exactly 64 hex chars. In function cas_path_digest,
after computing `stem` and `shard`, replace the loose checks with: ensure
`shard.len() == 2` and every byte is an ASCII hex digit, and ensure
`!stem.is_empty()` is replaced by `stem.len() == 64` and every byte of `stem` is
an ASCII hex digit; return None otherwise so only full v11 CAS shapes are
accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a02d1403-ff96-4816-9385-e46105972a75
📒 Files selected for processing (3)
crates/git-fetcher/src/cas_io.rscrates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/tarball_fetcher/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Agent
- GitHub Check: Lint and Test (ubuntu-latest)
- GitHub Check: Run benchmark on ubuntu-latest
- GitHub Check: Lint and Test (macos-latest)
- GitHub Check: Lint and Test (windows-latest)
- GitHub Check: Dylint
- GitHub Check: Run benchmark on ubuntu-latest
- GitHub Check: Code Coverage
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Preserve existing method chains andpipe-traitchains; do not break them into intermediateletbindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Pathover&PathBuf,&strover&String) when it does not force extra copies.
PreferArc::clone(&x)andRc::clone(&x)overx.clone()for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Writeuse super::{Foo, bar}instead ofuse super::*;for any glob whose target is a module you control. External-crate preludes (e.g.,use rayon::prelude::*;) and root-of-module re-exports (e.g.,pub use submodule::*;inlib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plainStringor&str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only viaTryFrom<String>and/orFromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallibleFrom<String>andFrom<&str>constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bareasassertion), add afrom_str_unchecked(or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...
Files:
crates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/tarball_fetcher/tests.rscrates/git-fetcher/src/cas_io.rs
🧠 Learnings (2)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.
Applied to files:
crates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/tarball_fetcher/tests.rscrates/git-fetcher/src/cas_io.rs
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.
Applied to files:
crates/git-fetcher/src/tarball_fetcher/tests.rs
🔇 Additional comments (3)
crates/git-fetcher/src/cas_io.rs (1)
321-405: LGTM!crates/git-fetcher/src/tarball_fetcher.rs (1)
16-25: LGTM!Also applies to: 32-32, 155-205
crates/git-fetcher/src/tarball_fetcher/tests.rs (1)
353-604: LGTM!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #479 +/- ##
==========================================
+ Coverage 90.09% 90.43% +0.33%
==========================================
Files 119 121 +2
Lines 11098 11475 +377
==========================================
+ Hits 9999 10377 +378
+ Misses 1099 1098 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Micro-Benchmark ResultsLinux |
There was a problem hiding this comment.
Pull request overview
This PR ports pnpm’s post-packlist fast path for git-hosted tarballs so that when the prepared tree is byte-identical to the already-imported CAS content, pacquet can skip the expensive import_into_cas re-hash step and return the original cas_paths map directly.
Changes:
- Add a fast path in
GitHostedTarballFetcherto skip re-import whenpath.is_none()andpacklist.len() == cas_paths.len(), with behavior matching upstream for both “no build needed” and “build needed but scripts ignored” cases. - Introduce
cas_io::synthesize_files_indexto reconstructPackageFilesIndex.filesentries from v11 CAS paths (digest + exec-bit + size) without reading file contents, plus unit tests. - Add integration tests covering the new fast-path behaviors and expected store-index side effects.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| crates/git-fetcher/src/tarball_fetcher.rs | Adds the fast-path branch and store-index row synthesis to avoid unnecessary CAS re-import/hashing. |
| crates/git-fetcher/src/cas_io.rs | Adds digest reconstruction + PackageFilesIndex.files synthesis helper and unit tests. |
| crates/git-fetcher/src/tarball_fetcher/tests.rs | Adds new integration tests for fast-path correctness and store-index behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.801676832860001,
"stddev": 0.11288774132381,
"median": 2.80118242206,
"user": 2.62067732,
"system": 3.8663815199999996,
"min": 2.64586275706,
"max": 3.02194963106,
"times": [
3.02194963106,
2.7977955930600005,
2.89702353506,
2.8678588040600004,
2.80456925106,
2.83521547106,
2.75787432606,
2.6603891790600005,
2.72822978106,
2.64586275706
]
},
{
"command": "pacquet@main",
"mean": 2.77168761006,
"stddev": 0.09596449115288286,
"median": 2.7386843580600004,
"user": 2.6873426199999995,
"system": 3.8586093200000002,
"min": 2.6771420220600004,
"max": 2.93457592906,
"times": [
2.9231252960600003,
2.7303845910600004,
2.6771420220600004,
2.7469841250600004,
2.93457592906,
2.6843939160600003,
2.6809640260600003,
2.73013415706,
2.7698056560600004,
2.83936638206
]
},
{
"command": "pnpm",
"mean": 6.653522134560001,
"stddev": 0.13140305520762072,
"median": 6.645163571059999,
"user": 9.77192312,
"system": 4.72913062,
"min": 6.44292599806,
"max": 6.88526302106,
"times": [
6.57266734306,
6.68072945706,
6.74342709606,
6.56427594706,
6.77459984306,
6.88526302106,
6.54029943806,
6.72143551706,
6.44292599806,
6.60959768506
]
}
]
}Scenario: Frozen Lockfile (Hot Cache)
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.7669954418400002,
"stddev": 0.07282519528299715,
"median": 0.7435631543400001,
"user": 0.39787974,
"system": 1.6389264200000002,
"min": 0.7202289538400001,
"max": 0.9691887898400001,
"times": [
0.9691887898400001,
0.74472600184,
0.7719322528400001,
0.74653322584,
0.77007145584,
0.74240030684,
0.7381573128400001,
0.7306787238400001,
0.7202289538400001,
0.7360373948400001
]
},
{
"command": "pacquet@main",
"mean": 0.8212273055400001,
"stddev": 0.05141538131493712,
"median": 0.80666417184,
"user": 0.40029144,
"system": 1.6531599200000002,
"min": 0.74697524384,
"max": 0.9242620768400001,
"times": [
0.8547394668400001,
0.74697524384,
0.7980141568400001,
0.8570356738400001,
0.8108672148400001,
0.7970373888400001,
0.8519181538400001,
0.9242620768400001,
0.8024611288400001,
0.7689625508400001
]
},
{
"command": "pnpm",
"mean": 2.7610234722399998,
"stddev": 0.0926633739251967,
"median": 2.75379020784,
"user": 3.3494344399999996,
"system": 2.31800092,
"min": 2.64300308684,
"max": 2.8880805918399997,
"times": [
2.64300308684,
2.77795081284,
2.8880805918399997,
2.88511639684,
2.72962960284,
2.6889447618399998,
2.69223128784,
2.84186052384,
2.6531149668399996,
2.81030269084
]
}
]
} |
…ub-path slow-path CodeRabbit and Copilot review on #479: - `cas_path_digest` now requires the stem to be exactly 126 hex chars (the remainder of a sha512 hex digest after the 2-char shard). Previously accepted any non-empty hex stem, so paths like `.../files/ab/cd` would synthesize a 4-char "digest" that could poison `index.db`. Fail closed on anything that isn't a real v11 CAS path. - Replace the bogus "empty stem" malformed-path case (which never triggered — `Path::file_name()` of `/tmp/ab/` is `Some("ab")`) with cases that actually exercise the new length check: too short, too long, and right-length-with-a-non-hex-byte. - Rebuild `sub_path_never_takes_fast_path` so `packlist.len() == cas_paths.len()` (single sub-package tarball). Only the `path.is_none()` guard now keeps the fast path out — if a future refactor drops it, the test catches the misshaped `packages/sub/...` keys leaking into the dispatcher's output.
Summary
Ports the post-packlist fast-path branch from upstream's
gitHostedTarballFetcher.ts:88-100so a successful prepare on a git-hosted tarball doesn't re-hash every file back into the CAS when it doesn't need to.When
resolution.pathisNoneandpacklist.len() == cas_paths.len():!should_be_built→ queue aPackageFilesIndexrow at the final key (synthesized fromcas_paths), hand the input map straight back. Upstream copies the\trawrow to the prepared key; pacquet's tarball download doesn't write a raw row at this key today, so we synthesize from the CAS layout instead — the actual CAS blobs are already in place from the tarball download.should_be_built && ignore_scripts→ return input as-is without queueing a final-key row, so subsequent--ignore-scriptsinstalls re-check the build gate (matches upstream'sstoreIndex.delete(rawFilesIndexFile); return { filesMap, ignoredBuild: true }branch).import_into_cas), unchanged.The synthesized row's per-file digest is reconstructed from the v11 CAS layout (
files/XX/<rest>[-exec]→ shard byte + file stem);sizecomes fromfs::metadata;modefrom the-execsuffix via the sameis_executablerule the read side uses (cas_file_path_by_mode). Skips the full file-read + SHA-512 cost per entry.A
cas_io::synthesize_files_indexhelper does the heavy lifting and gets its own unit tests for digest round-trip and exec-bit recovery.Test plan
cargo nextest run -p pacquet-git-fetcher— 72 tests pass (+8 new: 4 fast-path scenarios intarball_fetcher/tests.rs, 4 helper tests incas_io/tests.rs)just ready— full workspace, 1026 tests passjust dylint— perfectionist lints cleantaplo format --check— cleanNew fast-path tests
fast_path_returns_input_cas_paths_when_no_build_needed— output is the input map verbatim when!should_be_built && path.is_none() && packlist matches.fast_path_queues_synthesized_index_row— synthesized row queued at final key; an executable entry's(digest, mode)round-trips throughcas_file_path_by_modeto the input path.fast_path_ignore_scripts_returns_input_without_queueing_row—should_be_built && ignore_scriptsbranch returns input but does not queue a row.sub_path_never_takes_fast_path— sub-path always takes the slow path even on degenerate single-package monorepos.New helper tests (
cas_io)cas_path_digest_round_trips_through_write_cas_file— anchored against the canonical write side.cas_path_digest_rejects_malformed_paths— wrong shape / non-hex shard / empty stem →None.synthesize_files_index_recovers_digest_size_and_exec_bit—0o755vs0o644synthesis matches the write side's round-trip rule.synthesize_files_index_errors_on_malformed_cas_path→Io(InvalidData).Written by an agent (Claude Code, claude-opus-4-7).
Summary by CodeRabbit