feat(git-fetcher): warm prefetch for git-hosted snapshots (#436 follow-up)#454
Conversation
…w-up)
Wires git-hosted installs into pacquet's existing `index.db` warm-cache
path so the second install of an unchanged lockfile reuses the previous
install's clone+checkout+prepare+packlist work instead of redoing it
from scratch.
Three pieces:
- **Both fetchers now write `PackageFilesIndex` rows.** `cas_io::
import_into_cas` returns the full
`HashMap<String, CafsFileInfo>` payload alongside the existing
`cas_paths` map; the fetchers wrap that into a `PackageFilesIndex`
and queue it on the shared `StoreIndexWriter` at the cache key the
dispatcher passed in. `algo: "sha512"`, `manifest: None`,
`requires_build` reflects `should_be_built`. `add_files_from_dir`'s
`file_mode_from` is duplicated locally so the cache row encodes
the same POSIX-mode shape pnpm's tarball entries use.
- **`create_virtual_store::snapshot_cache_key` routes git arms
through the warm batch.** For `Tarball { gitHosted: true }` and
`Git` resolutions it now returns
`Some(gitHostedStoreIndexKey(pkg_id, built=true))` (matches
upstream's `pickStoreIndexKey` shape) instead of the previous
`None` cold-batch shortcut. `built = true` matches the dispatcher's
`ignore_scripts: false` default; both sites will flip together
when an `ignore-scripts` config knob lands.
- **Dispatcher threads writer + key into both fetchers.**
`install_package_by_snapshot.rs` builds the same
`gitHostedStoreIndexKey(package_id, built=true)` the warm path
reads and hands it to `GitFetcher.files_index_file` /
`GitHostedTarballFetcher.files_index_file`, plus the shared
`store_index_writer` clone.
New test: `tarball_fetcher::tests::writes_index_row_when_writer_provided`
spawns a real `StoreIndexWriter`, runs the fetcher with it, awaits
the writer's join handle, and re-opens the store to confirm the row
landed at the expected `pkg_id\tbuilt` key with the right `algo`,
`requires_build`, and a non-empty `files` map.
The warm prefetch's existing file-verification (`check_pkg_files_integrity`)
runs unchanged — if any CAS file referenced by the row has been
pruned, the snapshot falls through to cold and the fetcher re-runs.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (2)📚 Learning: 2026-05-01T10:01:33.766ZApplied to files:
📚 Learning: 2026-05-07T23:19:08.272ZApplied to files:
🔇 Additional comments (1)
📝 WalkthroughWalkthroughThis PR extends CAS import to produce file metadata ( ChangesGit-hosted warm prefetch
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #454 +/- ##
==========================================
- Coverage 87.20% 87.06% -0.14%
==========================================
Files 105 113 +8
Lines 8493 9240 +747
==========================================
+ Hits 7406 8045 +639
- Misses 1087 1195 +108 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/git-fetcher/src/tarball_fetcher/tests.rs`:
- Around line 325-326: The two assertions using
assert!(row.files.contains_key("package.json")) and
assert!(row.files.contains_key("index.js")) lack failure context; update these
to include diagnostic messages that print which keys are present when the
assertion fails (e.g., include row.files.keys() or row.files itself in the
message) so failures show the actual contents of row.files for easier CI triage;
locate the checks in the test in tarball_fetcher/tests.rs and replace the bare
assert! calls with assert! that include a formatted message referencing
row.files or row.files.keys().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 8d921631-a5c9-48dd-80b6-6dcec788ff23
📒 Files selected for processing (8)
crates/git-fetcher/src/cas_io.rscrates/git-fetcher/src/fetcher.rscrates/git-fetcher/src/fetcher/tests.rscrates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/tarball_fetcher/tests.rscrates/package-manager/src/create_virtual_store.rscrates/package-manager/src/install_package_by_snapshot.rscrates/package-manager/src/installability/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run benchmark on ubuntu-latest
- GitHub Check: Run benchmark on ubuntu-latest
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Preserve existing method chains andpipe-traitchains; do not break them into intermediateletbindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Pathover&PathBuf,&strover&String) when it does not force extra copies.
PreferArc::clone(&x)andRc::clone(&x)overx.clone()for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Writeuse super::{Foo, bar}instead ofuse super::*;for any glob whose target is a module you control. External-crate preludes (e.g.,use rayon::prelude::*;) and root-of-module re-exports (e.g.,pub use submodule::*;inlib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plainStringor&str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only viaTryFrom<String>and/orFromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallibleFrom<String>andFrom<&str>constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bareasassertion), add afrom_str_unchecked(or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...
Files:
crates/git-fetcher/src/fetcher/tests.rscrates/package-manager/src/installability/tests.rscrates/package-manager/src/install_package_by_snapshot.rscrates/package-manager/src/create_virtual_store.rscrates/git-fetcher/src/fetcher.rscrates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/cas_io.rscrates/git-fetcher/src/tarball_fetcher/tests.rs
🧠 Learnings (2)
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.
Applied to files:
crates/git-fetcher/src/fetcher/tests.rscrates/package-manager/src/installability/tests.rscrates/git-fetcher/src/tarball_fetcher/tests.rs
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.
Applied to files:
crates/git-fetcher/src/fetcher/tests.rscrates/package-manager/src/installability/tests.rscrates/package-manager/src/install_package_by_snapshot.rscrates/package-manager/src/create_virtual_store.rscrates/git-fetcher/src/fetcher.rscrates/git-fetcher/src/tarball_fetcher.rscrates/git-fetcher/src/cas_io.rscrates/git-fetcher/src/tarball_fetcher/tests.rs
🔇 Additional comments (8)
crates/package-manager/src/installability/tests.rs (1)
56-61: LGTM!crates/package-manager/src/create_virtual_store.rs (1)
17-20: LGTM!Also applies to: 667-715
crates/git-fetcher/src/cas_io.rs (1)
25-32: LGTM!Also applies to: 121-167, 169-181
crates/git-fetcher/src/fetcher.rs (1)
60-74: LGTM!Also applies to: 172-191
crates/git-fetcher/src/fetcher/tests.rs (1)
97-98: LGTM!Also applies to: 144-145, 205-206
crates/package-manager/src/install_package_by_snapshot.rs (1)
13-16: LGTM!Also applies to: 169-176, 191-193, 209-212, 229-230
crates/git-fetcher/src/tarball_fetcher/tests.rs (1)
5-7: LGTM!Also applies to: 56-57, 105-107, 150-152, 208-210, 261-263, 275-324, 327-327
crates/git-fetcher/src/tarball_fetcher.rs (1)
77-83: LGTM!Also applies to: 156-177
Micro-Benchmark ResultsLinux |
…#454 review) CodeRabbit pointed out that the two `assert!(row.files.contains_key(...))` calls in `writes_index_row_when_writer_provided` fail without showing what keys were actually present, making CI triage slower. Collect the keys once and pass them through to the assertion message so a failure points at the discrepancy directly. Pure test-readability change; no production code touched.
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.66226748482,
"stddev": 0.14044020100462554,
"median": 2.6316269642199996,
"user": 2.6175296799999996,
"system": 3.6408581999999994,
"min": 2.54357902272,
"max": 2.9203348177199997,
"times": [
2.89861999572,
2.54357902272,
2.55694870472,
2.65694569472,
2.9203348177199997,
2.55030160272,
2.6885711947199997,
2.54411988672,
2.6513461557199998,
2.61190777272
]
},
{
"command": "pacquet@main",
"mean": 2.57459651332,
"stddev": 0.06838677331987975,
"median": 2.5671595637199998,
"user": 2.56309838,
"system": 3.625521,
"min": 2.47114686972,
"max": 2.69468806672,
"times": [
2.54209228972,
2.5243061937199998,
2.66715994572,
2.56768449572,
2.52357325772,
2.56865710172,
2.47114686972,
2.69468806672,
2.56663463172,
2.6200222807199998
]
},
{
"command": "pnpm",
"mean": 6.382924709019999,
"stddev": 0.12566620585257687,
"median": 6.38275426372,
"user": 9.546950980000002,
"system": 4.6046641,
"min": 6.19614376872,
"max": 6.56793153872,
"times": [
6.33575752472,
6.23269475972,
6.409010654719999,
6.39950356272,
6.47107280972,
6.56793153872,
6.19614376872,
6.36600496472,
6.29220023972,
6.55892726672
]
}
]
}Scenario: Frozen Lockfile (Hot Cache)
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.7092117118400001,
"stddev": 0.030683466981246226,
"median": 0.7004896066400002,
"user": 0.39636678000000003,
"system": 1.49606138,
"min": 0.6851064926400001,
"max": 0.7927312456400001,
"times": [
0.7927312456400001,
0.6998627356400001,
0.71281212264,
0.7059392606400001,
0.6851064926400001,
0.6980966016400001,
0.6889658946400001,
0.7011164776400001,
0.7123118326400001,
0.69517445464
]
},
{
"command": "pacquet@main",
"mean": 0.7699402479399999,
"stddev": 0.04107972452039447,
"median": 0.7553922926400001,
"user": 0.38176457999999996,
"system": 1.5161572799999998,
"min": 0.7246431016400001,
"max": 0.8603070556400001,
"times": [
0.8603070556400001,
0.7660648446400001,
0.7584395796400001,
0.7373242036400001,
0.7523450056400001,
0.7909169746400001,
0.7466808286400001,
0.7481119386400001,
0.7246431016400001,
0.8145689466400001
]
},
{
"command": "pnpm",
"mean": 2.75247403574,
"stddev": 0.06431999187280807,
"median": 2.7381536961400004,
"user": 3.3506833800000004,
"system": 2.30865878,
"min": 2.67596523064,
"max": 2.88511119564,
"times": [
2.76444152364,
2.88511119564,
2.70995197564,
2.8371002616400003,
2.74011589464,
2.67596523064,
2.69557179764,
2.73619149764,
2.7212418026400003,
2.75904917764
]
}
]
} |
…for warm git prefetch (#454 coverage) Closes the two coverage gaps identified during PR review: - **`snapshot_cache_key` unit tests for git arms.** Two new tests in `create_virtual_store::tests` build a `PackageMetadata` with `LockfileResolution::Git` and `LockfileResolution::Tarball { gitHosted: true }` respectively and assert that `snapshot_cache_key` returns `Some("<pkg_id>\tbuilt")` — the same `gitHostedStoreIndexKey(pkg_id, built=true)` shape both fetchers write at install time. Drift between the read and write sides would silently degrade every git-hosted re-install to the cold path; pinning the shape on both sides keeps them in lock-step. - **Richer `CafsFileInfo` round-trip assertions.** The existing `writes_index_row_when_writer_provided` round-trip checked only `algo`, `requires_build`, and key presence. Now it also asserts the per-file entry's `digest` is non-empty + ASCII-hex, `mode` rounds to `0o644` on a non-executable regular file, `size` matches the source bytes, and `checked_at` is `None` (matches `add_files_from_dir`'s convention — the first integrity-check pass stamps it). Drift in any of these would surface as a warm- prefetch cache miss instead of a clean fail, so explicit assertions catch the regression at unit-test time. No production code changes; pure coverage. Existing tests still green at 767 (765 + the 2 new `snapshot_cache_key` tests).
Pull in 28 commits from upstream main, including the `pacquet-npmrc` → `pacquet-config` rename (#420) plus features: - supportedArchitectures + --cpu/--os/--libc (#456) - frozen-lockfile (#442, #443, #447, #450) - git-fetcher (#436 / #446, #451, #454) - side-effects cache (#421 / #422, #423, #424) - real-hoist + global-virtual-store (#432 / #438 / #444, #449, #452) - patchedDependencies + allow-builds (#425, #427, #428) - engine/platform installability (#434 / #439) Conflicts resolved: - `crates/npmrc/` files migrated under the renamed `crates/config/` directory; `Npmrc` → `Config` everywhere except `NpmrcAuth` (which keeps the `.npmrc`-domain name). - `Config::current` reads the env-var DI generic `Api: EnvVar` for ${VAR}-substitution in `.npmrc`. Production turbofish in `cli_args.rs` is `Config::current::<RealApi, _, _, _, _>(...)`. - Two-phase `NpmrcAuth::apply_*` retained so default-registry creds key at the yaml-resolved registry URL. - New `Config::auth_headers` field plumbed through `install_package_by_snapshot`'s `DownloadTarballToStore`. - Tests under `crates/config/src/workspace_yaml/tests.rs` pick up the new ParseYaml unit test added on this branch.
Summary
Wires git-hosted installs into pacquet's existing
index.dbwarm-cache path. Before this PR every git install cold-pathed, even when the snapshot was already in the store from a previous install. Now the second install of an unchanged lockfile reuses the previous install's clone + checkout + prepare + packlist work and skips the fetcher entirely — same as registry-tarball installs do today.Three pieces
PackageFilesIndexrows.cas_io::import_into_casreturns the fullHashMap<String, CafsFileInfo>payload alongside the existingcas_pathsmap; the fetchers wrap that into aPackageFilesIndexand queue it on the sharedStoreIndexWriterat the cache key the dispatcher passed in.algo: "sha512",manifest: None,requires_buildreflectsshould_be_built.add_files_from_dir'sfile_mode_fromis duplicated locally so the cache row encodes the same POSIX-mode shape pnpm's tarball entries use.create_virtual_store::snapshot_cache_keyroutes git arms through the warm batch. ForTarball { gitHosted: true }andGitresolutions it now returnsSome(gitHostedStoreIndexKey(pkg_id, built=true))(matches upstream'spickStoreIndexKeyshape) instead of the previousNonecold-batch shortcut.built = truematches the dispatcher'signore_scripts: falsedefault; both sites will flip together when anignore-scriptsconfig knob lands.install_package_by_snapshot.rsbuilds the samegitHostedStoreIndexKey(package_id, built=true)the warm path reads and hands it toGitFetcher.files_index_file/GitHostedTarballFetcher.files_index_file, plus the sharedstore_index_writerclone.The warm prefetch's existing file-verification (
check_pkg_files_integrity) runs unchanged — if any CAS file referenced by the row has been pruned, the snapshot falls through to cold and the fetcher re-runs.Out of scope
${filesIndexFile}\traw+ final) for skipping the re-import when packlist == raw. Pure perf optimization on top of this PR. Tracked on Install git-hosted packages frompnpm-lock.yaml(frozen-lockfile) #436.npm-packlistsemantics.Test plan
cargo nextest run -p pacquet-git-fetcher— 42 tests, including a newwrites_index_row_when_writer_providedround-trip that spawns a realStoreIndexWriter, runs the tarball fetcher with it, awaits the writer's join handle, and re-opens the store to confirm the row landed atpkg_id\tbuiltwith the rightalgo,requires_build, and a non-emptyfilesmap.cargo nextest run -p pacquet-package-manager— 245 tests still green aftersnapshot_cache_keyis rewritten.just ready— 765 tests, 765 pass.RUSTDOCFLAGS=-D warnings cargo doc --no-deps --workspace --document-private-items— clean.taplo format --checkjust dylintWritten by an agent (Claude Code, claude-opus-4-7).
Summary by CodeRabbit