Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

feat(git-fetcher): install git-hosted tarballs via preparePackage + packlist (#436 §C)#451

Merged
zkochan merged 4 commits into
mainfrom
feat/436-git-hosted-tarball
May 13, 2026
Merged

feat(git-fetcher): install git-hosted tarballs via preparePackage + packlist (#436 §C)#451
zkochan merged 4 commits into
mainfrom
feat/436-git-hosted-tarball

Conversation

@zkochan

@zkochan zkochan commented May 13, 2026

Copy link
Copy Markdown
Member

Summary

Builds on §B+D (#446). Adds the second half of the git-hosted install path: TarballResolution { gitHosted: true } lockfile entries — the ones whose tarballs land at codeload.github.com / gitlab.com / bitbucket.org archive endpoints — now run through a new GitHostedTarballFetcher after the existing DownloadTarballToStore settles their bytes into CAS.

Mirrors pnpm's gitHostedTarballFetcher.ts:

  1. Materialize the CAS-resident files into a writable temp dir (fs::copy per file — hardlinking here would let prepare scripts mutate shared CAS entries).
  2. Run the existing prepare_package port that §B+D shipped — same <pm>-install + prepublish / prepack / publish dispatch, same GIT_DEP_PREPARE_NOT_ALLOWED gate.
  3. Run the existing minimal packlist over the prepared tree to drop source maps, test fixtures, etc. that ship in the raw archive but not in the published file set.
  4. Re-import the filtered file set back into CAS and hand the HashMap<String, PathBuf> to CreateVirtualDirBySnapshot.

Shared cas_io module

The CAS-import helpers (import_into_cas, is_file_executable, map_write_cas) plus a new materialize_into lift out of fetcher.rs into a shared cas_io module so both fetchers consume the same code path. No public-API change — helpers stay pub(crate).

cas_io also gains a join_checked(root, rel) helper that rejects absolute paths, .. components, root, and drive-prefix components before joining, so a malicious tarball entry can't escape target_dir / pkg_dir. Errors surface as GitFetcherError::Io(InvalidInput). Both materialize_into and import_into_cas route through it, taking rel directly (forward slashes are valid on every platform — Path::components() recognises both / and \ on Windows — so no per-file String allocation is needed).

Supporting changes

  • TarballResolution.path: Option<String> mirrors pnpm's TarballResolution.path so a lockfile pinning a git-hosted tarball from a monorepo (path: packages/sub) deserializes without tripping deny_unknown_fields. The fetcher threads this through to prepare_package's safe_join_path so prepare and packlist run inside the right sub-directory.
  • The install dispatcher hoists the allow_build closure and the ScriptsPrependNodePath conversion above the resolution match so both the Git arm and the new gitHosted: true post-pass borrow from the same named locals — no temp closures across .await.
  • InstallPackageBySnapshotError::GitFetch's doc now spells out that the variant covers failures from both fetchers (the git-CLI path and the git-hosted-tarball post-pass), since both share the same GitFetcherError taxonomy.

Out of scope (follow-ups for #436)

  • Warm prefetch for git-hosted slots — neither fetcher writes gitHostedStoreIndexKey rows to index.db yet, so a re-install re-runs clone + prepare + packlist. Correct, but slow on the second run.
  • Two-slot CAS layout (${filesIndexFile}\traw + final). Upstream optimizes "raw == prepared" by promoting the raw entry in place. Pacquet always re-imports today; hash-dedup means duplicate work but not duplicate storage.
  • §E — full integration-test matrix in plans/TEST_PORTING.md 563-572.
  • Real npm-packlist semantics (.npmignore, .gitignore layering, bundleDependencies walking).

Test plan

  • cargo nextest run -p pacquet-git-fetcher — 41 tests pass:
    - 31 from §B+D (preferred-pm sniffer, MVP packlist, preparePackage decision logic, git-CLI fetcher integration).
    - 5 new cas_io tests for join_checked (accepts normal segments, strips ., rejects absolute paths, rejects mid-path .., and a materialize_into regression that asserts nothing lands outside the target on a poisoned cas_paths entry).
    - 5 new GitHostedTarballFetcher tests (pass-through, files-field filter, GIT_DEP_PREPARE_NOT_ALLOWED, CAS-isolation regression, monorepo sub-path).
  • cargo nextest run -p pacquet-lockfileTarballResolution { path } deserialize and serialize round-trip.
  • cargo nextest run -p pacquet-package-manager — 148 tests still green after the dispatcher hoist.
  • just ready — 713 tests, 713 pass.
  • cargo doc --no-deps --workspace --document-private-items with RUSTDOCFLAGS=-D warnings — clean.
  • taplo format --check
  • just dylint

Written by an agent (Claude Code, claude-opus-4-7).

…acklist (#436 §C)

Builds on §B+D (#446). Adds the second half of the git-hosted install
path: `TarballResolution { gitHosted: true }` lockfile entries now run
through a new `GitHostedTarballFetcher` after the existing
`DownloadTarballToStore` settles their bytes into CAS. Mirrors pnpm's
[`gitHostedTarballFetcher.ts`](https://github.com/pnpm/pnpm/blob/94240bc046/fetching/tarball-fetcher/src/gitHostedTarballFetcher.ts):

  1. Materialize the CAS-resident files into a writable temp dir
     (`fs::copy` per file — using hardlinks here would let prepare
     scripts mutate shared CAS entries).
  2. Run the existing `prepare_package` port — same `<pm>-install` +
     `prepublish` / `prepack` / `publish` dispatch §B+D shipped,
     same `GIT_DEP_PREPARE_NOT_ALLOWED` gate.
  3. Run the existing minimal `packlist` over the prepared tree to
     drop source maps, test fixtures, etc. that ship in the raw
     archive but not in the published file set.
  4. Re-import the filtered file set back into CAS and hand the
     `HashMap<String, PathBuf>` to `CreateVirtualDirBySnapshot`.

The CAS-import helpers (`import_into_cas`, `is_file_executable`,
`map_write_cas`) and the new `materialize_into` lift out of
`fetcher.rs` into a shared `cas_io` module so both fetchers consume
the same code path. No public-API change there — the helpers stay
`pub(crate)`.

Supporting changes:

- `TarballResolution.path: Option<String>` mirrors pnpm's
  `TarballResolution.path` so a lockfile pinning a git-hosted tarball
  from a monorepo (`path: packages/sub`) deserializes without
  tripping `deny_unknown_fields`. The fetcher threads this through to
  `prepare_package`'s `safe_join_path` so the prepare and packlist
  run inside the right sub-directory.
- The install dispatcher hoists the `allow_build` closure and the
  `ScriptsPrependNodePath` conversion above the resolution match so
  both the `Git` arm and the new `gitHosted: true` post-pass borrow
  from the same locals — no temp closures across `.await`.

Out of scope (still open on #436):

- Warm prefetch for git-hosted slots — neither fetcher writes
  `gitHostedStoreIndexKey` rows to `index.db` yet, so a re-install
  re-runs the clone + prepare + packlist. Correct, but slow on the
  second run.
- Two-slot CAS layout (`${filesIndexFile}\traw` + final). Upstream
  optimizes "raw file set == prepared file set" by promoting the
  raw entry in place. Pacquet always re-imports today; hash-dedup
  means duplicate work but not duplicate storage.
- §E — full integration-test matrix in `plans/TEST_PORTING.md`.
  This PR ships four crate-level tests for `GitHostedTarballFetcher`
  (pass-through, files-field filter, `GIT_DEP_PREPARE_NOT_ALLOWED`,
  CAS-isolation regression).
Copilot AI review requested due to automatic review settings May 13, 2026 12:35
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR adds GitHostedTarballFetcher to support post-download prepare and packlist filtering for git-hosted tarballs. It extracts shared CAS file materialization and import operations into a reusable cas_io module, refactors GitFetcher to use them, extends TarballResolution with an optional path field, and integrates the new fetcher into the package installation flow.

Changes

Shared CAS I/O and Git-Hosted Tarball Fetcher

Layer / File(s) Summary
Shared CAS I/O helpers module
crates/git-fetcher/src/cas_io.rs
New materialize_into copies CAS files to a writable directory with executable-bit re-application; import_into_cas reads prepared files, detects executability from metadata, writes into CAS, and returns a path map; is_file_executable and map_write_cas provide POSIX-aware execution detection and error wrapping.
Refactor GitFetcher to use shared helpers
crates/git-fetcher/src/fetcher.rs
GitFetcher now imports import_into_cas from crate::cas_io instead of maintaining a local implementation; prior local helpers are removed.
Extend TarballResolution with path field
crates/lockfile/src/resolution.rs, crates/lockfile/src/resolution/tests.rs
Add optional path: Option<String> field to TarballResolution (serialized as path, omitted when None) to represent sub-directories inside tarballs; update all existing tarball tests to include path: None, and add a new test validating deserialization of path: Some(...) when present.
GitHostedTarballFetcher implementation
crates/git-fetcher/src/tarball_fetcher.rs
New public struct GitHostedTarballFetcher<'a> with run() async method: materializes CAS files into a temp dir, calls prepare_package with build/script controls, reads/derives package.json, applies packlist filtering, re-imports the filtered tree into CAS, and returns GitFetchOutput with updated cas_paths and built flag.
GitHostedTarballFetcher test suite
crates/git-fetcher/src/tarball_fetcher/tests.rs
Five tokio async tests: (1) verifies no build without prepare script and preserves always-included files; (2) asserts files field filters materialized CAS entries; (3) rejects builds when disallowed with correct error; (4) monorepo path handling restricts packing to subdirectory; (5) regression test ensuring prepare/materialization does not mutate source CAS bytes.
Update crate exports and docs
crates/git-fetcher/src/lib.rs
Crate docs now describe both git snapshot and tarball fetchers; add internal cas_io and tarball_fetcher module declarations; re-export GitHostedTarballFetcher alongside existing public API.
Integrate fetcher into install flow
crates/package-manager/src/install_package_by_snapshot.rs
Compute allow_build_closure and scripts_prepend_node_path once up-front for reuse across both git and tarball paths; after initial tarball download, conditionally run GitHostedTarballFetcher when git_hosted == Some(true) to replace CAS paths; errors route via InstallPackageBySnapshotError::GitFetch.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • pnpm/pacquet#446: Directly related: this PR extracts the CAS import/materialization helpers into cas_io module (replacing local implementations) and reuses them for the new GitHostedTarballFetcher.
  • pnpm/pacquet#372: Overlaps in the install/tarball flow changes; both touch InstallPackageBySnapshot wiring and requester/progress threading.
  • pnpm/pacquet#440: Related to git-hosted tarball store/index semantics and backfill of git_hosted handling.

Suggested reviewers

  • anthonyshew

Poem

🐰 A tarball hops in from a branch up high,
We materialize, prepare, and watch bytes fly—
Shared CAS paths hum a tidy, clever tune,
Packlists snap in sync beneath the moon.
nibbles a carrot, hops away with a grin

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing git-hosted tarball installation via preparePackage and packlist, which is the primary feature added across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers the change scope, implementation approach, shared module design, supporting changes, and test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/436-git-hosted-tarball

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/git-fetcher/src/tarball_fetcher/tests.rs (1)

61-64: ⚡ Quick win

Add one diagnostic log before non-assert_eq! membership assertions.

These assert! checks would be easier to debug with the actual key set printed once before asserting.

Proposed tweak
     let keys: Vec<&str> = received.cas_paths.keys().map(String::as_str).collect();
+    eprintln!("PACKLIST OUTPUT KEYS: {keys:?}");
     assert!(keys.contains(&"dist/index.js"));
     assert!(keys.contains(&"package.json"), "package.json always included");
     assert!(!keys.contains(&"src/index.ts"), "src excluded by files field");
     assert!(!keys.contains(&"test/foo.test.js"), "test excluded by files field");

Based on learnings: “In Rust test code… if you use assertions like assert!, assert_ne!, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion.”

Also applies to: 109-113

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/git-fetcher/src/tarball_fetcher/tests.rs` around lines 61 - 64, Add a
diagnostic log that prints the actual key set before each non-assert_eq!
membership assertion in the test so failures show context; for example, before
the assertions that check received.cas_paths.contains_key("package.json"),
contains_key("index.js"), contains_key("README.md") (and the similar group
around lines 109-113), emit a single debug line like eprintln!("cas_keys =
{:?}", received.cas_paths.keys().collect::<Vec<_>>()) or use
dbg!(&received.cas_paths) so the test output shows the actual keys prior to the
assert! calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/git-fetcher/src/cas_io.rs`:
- Around line 42-43: The code currently joins untrusted CAS-relative paths
(variable rel) into target_dir (creating target via target_dir.join(...)) which
allows path traversal (.., absolute paths, or prefixes) to escape the intended
tree; fix by validating/sanitizing rel before joining: parse
Path::new(rel).components() and reject or error if any component is ParentDir,
RootDir, or Prefix, and ensure every component is a normal segment, then
construct the path by iteratively pushing those Normal components onto
target_dir (or use a safe join loop) before using target.parent(); apply the
same validation to the other occurrence around lines 77-78 so both joins only
use vetted path segments.

---

Nitpick comments:
In `@crates/git-fetcher/src/tarball_fetcher/tests.rs`:
- Around line 61-64: Add a diagnostic log that prints the actual key set before
each non-assert_eq! membership assertion in the test so failures show context;
for example, before the assertions that check
received.cas_paths.contains_key("package.json"), contains_key("index.js"),
contains_key("README.md") (and the similar group around lines 109-113), emit a
single debug line like eprintln!("cas_keys = {:?}",
received.cas_paths.keys().collect::<Vec<_>>()) or use dbg!(&received.cas_paths)
so the test output shows the actual keys prior to the assert! calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9698cdd4-e788-4f15-a960-c37269eacf12

📥 Commits

Reviewing files that changed from the base of the PR and between 43e5db9 and 33f04ed.

📒 Files selected for processing (8)
  • crates/git-fetcher/src/cas_io.rs
  • crates/git-fetcher/src/fetcher.rs
  • crates/git-fetcher/src/lib.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution.rs
  • crates/lockfile/src/resolution/tests.rs
  • crates/package-manager/src/install_package_by_snapshot.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Agent
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Code Coverage
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Lint and Test (ubuntu-latest)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Preserve existing method chains and pipe-trait chains; do not break them into intermediate let bindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Path over &PathBuf, &str over &String) when it does not force extra copies.
Prefer Arc::clone(&x) and Rc::clone(&x) over x.clone() for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*; for any glob whose target is a module you control. External-crate preludes (e.g., use rayon::prelude::*;) and root-of-module re-exports (e.g., pub use submodule::*; in lib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plain String or &str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only via TryFrom<String> and/or FromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallible From<String> and From<&str> constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bare as assertion), add a from_str_unchecked (or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...

Files:

  • crates/lockfile/src/resolution.rs
  • crates/git-fetcher/src/lib.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/fetcher.rs
  • crates/package-manager/src/install_package_by_snapshot.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution/tests.rs
  • crates/git-fetcher/src/cas_io.rs
🧠 Learnings (2)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/lockfile/src/resolution.rs
  • crates/git-fetcher/src/lib.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/fetcher.rs
  • crates/package-manager/src/install_package_by_snapshot.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution/tests.rs
  • crates/git-fetcher/src/cas_io.rs
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.

Applied to files:

  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution/tests.rs
🔇 Additional comments (6)
crates/lockfile/src/resolution.rs (1)

26-32: LGTM!

crates/lockfile/src/resolution/tests.rs (1)

25-25: LGTM!

Also applies to: 40-40, 58-58, 78-78, 91-91, 104-104, 117-117, 132-132, 144-144, 159-159, 171-189, 197-197

crates/git-fetcher/src/fetcher.rs (1)

16-16: LGTM!

Also applies to: 272-274

crates/git-fetcher/src/lib.rs (1)

1-2: LGTM!

Also applies to: 4-18, 20-20, 26-26, 33-33

crates/package-manager/src/install_package_by_snapshot.rs (1)

9-9: LGTM!

Also applies to: 107-124, 130-181

crates/git-fetcher/src/tarball_fetcher.rs (1)

94-94: LGTM!

Also applies to: 115-118, 142-143

Comment thread crates/git-fetcher/src/cas_io.rs Outdated
@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.43902% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.11%. Comparing base (1fa70a4) to head (c511d69).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...package-manager/src/install_package_by_snapshot.rs 16.66% 25 Missing ⚠️
crates/git-fetcher/src/cas_io.rs 89.21% 11 Missing ⚠️
crates/git-fetcher/src/tarball_fetcher.rs 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #451      +/-   ##
==========================================
- Coverage   87.45%   87.11%   -0.34%     
==========================================
  Files         100      103       +3     
  Lines        7865     8206     +341     
==========================================
+ Hits         6878     7149     +271     
- Misses        987     1057      +70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     16.1±0.23ms   269.3 KB/sec    1.00     16.1±0.29ms   269.8 KB/sec

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends pacquet’s frozen-lockfile install pipeline to handle git-hosted tarball lockfile entries (TarballResolution { gitHosted: true }) by running a prepare+packlist pass after the tarball bytes are downloaded into the CAS, mirroring pnpm’s gitHostedTarballFetcher behavior. It also updates lockfile parsing to accept monorepo subdirectory packing via a new TarballResolution.path field.

Changes:

  • Route gitHosted: true tarball resolutions through a new GitHostedTarballFetcher that materializes CAS files to a temp dir, runs prepare_package, applies packlist filtering, and re-imports into CAS.
  • Add TarballResolution.path: Option<String> to lockfile types + tests, mirroring pnpm’s TarballResolution.path.
  • Factor shared CAS I/O helpers into git-fetcher::cas_io for reuse across git and git-hosted-tarball fetchers.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/package-manager/src/install_package_by_snapshot.rs Hoists shared locals and adds the git-hosted tarball post-pass via GitHostedTarballFetcher.
crates/lockfile/src/resolution.rs Adds TarballResolution.path with serde support/documentation.
crates/lockfile/src/resolution/tests.rs Updates expectations for new path field and adds a deserialize test for tarball path.
crates/git-fetcher/src/lib.rs Exposes GitHostedTarballFetcher and documents the two git-hosted fetch paths.
crates/git-fetcher/src/fetcher.rs Moves CAS import helpers into cas_io for reuse.
crates/git-fetcher/src/cas_io.rs New shared CAS materialization/import helpers for git-hosted workflows.
crates/git-fetcher/src/tarball_fetcher.rs Implements the GitHostedTarballFetcher prepare+packlist+reimport pipeline.
crates/git-fetcher/src/tarball_fetcher/tests.rs Adds crate-level tests covering pass-through, files filtering, not-allowed build, and CAS isolation regression.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/package-manager/src/install_package_by_snapshot.rs
Comment thread crates/git-fetcher/src/tarball_fetcher.rs Outdated
Comment thread crates/git-fetcher/src/tarball_fetcher.rs Outdated
Comment thread crates/lockfile/src/resolution/tests.rs
…C review)

CodeRabbit flagged that `materialize_into` and `import_into_cas` both
joined a caller-supplied `rel` onto a trusted root without checking
the components. A malicious tarball entry like `../etc/passwd` would
have escaped the working tree on extraction. Pacquet's tarball
extractor is supposed to sanitise on the way in, but defense-in-depth
at this layer means a bug there can't turn into a write outside the
target directory.

Add a private `join_checked(root, rel)` helper in `cas_io` that:

- Rejects absolute paths (`/etc/passwd`, `C:\foo`).
- Rejects `..`, root, and drive-prefix components.
- Strips `.` components silently.
- Pushes only `Component::Normal` segments onto `root`.

Both `materialize_into` and `import_into_cas` route their joins
through it. Errors surface as `GitFetcherError::Io` with
`io::ErrorKind::InvalidInput` so the miette source chain remains
unchanged. A new test covers the rejection paths (absolute, parent-
dir at start, parent-dir mid-path) plus a regression test for
`materialize_into` that asserts no file lands outside the target on
a poisoned `cas_paths` entry.
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 2.609 ± 0.175 2.434 3.029 1.04 ± 0.07
pacquet@main 2.517 ± 0.066 2.411 2.612 1.00
pnpm 6.127 ± 0.131 5.966 6.354 2.43 ± 0.08
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 2.6085972815200007,
      "stddev": 0.17489809876437543,
      "median": 2.56699586052,
      "user": 2.6067669599999994,
      "system": 3.5257157999999995,
      "min": 2.43359172902,
      "max": 3.02857954602,
      "times": [
        2.56486350302,
        2.48442612402,
        2.7188438940200004,
        2.67923168502,
        2.4942660560200003,
        2.6394791410200003,
        3.02857954602,
        2.43359172902,
        2.4735629190200004,
        2.56912821802
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.51702915162,
      "stddev": 0.06594608148206567,
      "median": 2.5122714220200004,
      "user": 2.5612052600000004,
      "system": 3.5102591000000003,
      "min": 2.41106919702,
      "max": 2.6115159030200004,
      "times": [
        2.43830381602,
        2.56671974502,
        2.48094128002,
        2.41106919702,
        2.59744798502,
        2.4952035280200002,
        2.52933931602,
        2.6115159030200004,
        2.4890188010200003,
        2.5507319450200003
      ]
    },
    {
      "command": "pnpm",
      "mean": 6.127188401119999,
      "stddev": 0.13051544563858503,
      "median": 6.117481376019999,
      "user": 9.10327846,
      "system": 4.368406499999999,
      "min": 5.96607421602,
      "max": 6.35371301902,
      "times": [
        6.195679225019999,
        6.020747613019999,
        6.24331147002,
        6.155468078019999,
        6.079494674019999,
        6.02849367202,
        5.98523688502,
        5.96607421602,
        6.24366515902,
        6.35371301902
      ]
    }
  ]
}

Scenario: Frozen Lockfile (Hot Cache)

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 696.4 ± 41.5 667.0 812.0 1.00
pacquet@main 761.9 ± 69.7 693.8 903.2 1.09 ± 0.12
pnpm 2651.0 ± 137.6 2502.1 2946.9 3.81 ± 0.30
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.6963826587,
      "stddev": 0.04146780632427631,
      "median": 0.6852439346000001,
      "user": 0.37347067999999994,
      "system": 1.4609696799999998,
      "min": 0.6669509231,
      "max": 0.8119884221,
      "times": [
        0.8119884221,
        0.6837126691000001,
        0.6867752001,
        0.6669509231,
        0.6809052741,
        0.6942995791000001,
        0.6909613651000001,
        0.6912112621000001,
        0.6836897741,
        0.6733321181
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.7619125279000001,
      "stddev": 0.0696532220781103,
      "median": 0.7300466361,
      "user": 0.36175198000000003,
      "system": 1.48992328,
      "min": 0.6938067381,
      "max": 0.9031553001,
      "times": [
        0.8231760611000001,
        0.7374727681000001,
        0.8436988861,
        0.7288577901000001,
        0.7303197931000001,
        0.7002142861,
        0.9031553001,
        0.7297734791,
        0.6938067381,
        0.7286501771
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.6510351446,
      "stddev": 0.1376221718298184,
      "median": 2.6043153886,
      "user": 3.2584477799999996,
      "system": 2.14143338,
      "min": 2.5020697780999996,
      "max": 2.9468712891,
      "times": [
        2.7888165100999998,
        2.5634968461,
        2.9468712891,
        2.5020697780999996,
        2.6110981441,
        2.5185219531,
        2.5975326330999997,
        2.7040738160999997,
        2.5719305681,
        2.7059399081
      ]
    }
  ]
}

… path/round-trip tests (#436 §C review round 2)

Addresses the second batch of CodeRabbit comments on #451:

- `install_package_by_snapshot.rs`: expand the doc comment on
  `InstallPackageBySnapshotError::GitFetch` to cover both fetchers
  (the git-CLI path for `type: git` and the git-hosted-tarball
  post-pass for `gitHosted: true`). The variant already wrapped
  `GitFetcherError`, which spans both — the doc was just out of date.
- `tarball_fetcher.rs`: drop the `wrap_prepare_error` helper. Its
  docstring claimed it stamped a "Failed to prepare git-hosted
  package" prefix, but the body was a one-line `Prepare(err)`
  passthrough. Inline the `map_err` at the call site and explain the
  upstream-vs-pacquet difference (we lean on the miette source chain
  + dispatcher log line for the prefix rather than mutating
  `err.message`) in a comment next to the call.
- `tarball_fetcher/tests.rs`: add `path_field_packs_only_subdirectory`
  pinning that a `path: Some("packages/sub")` resolution returns
  files relative to the sub-dir, never including sibling-package
  files or carrying the monorepo-root prefix.
- `lockfile/resolution/tests.rs`: add `serialize_tarball_resolution_with_path`
  so the new `TarballResolution.path` field has a serialization
  round-trip alongside its deserialization assertion. Mirrors the
  pattern `GitResolution.path` already follows.
Copilot AI review requested due to automatic review settings May 13, 2026 12:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/git-fetcher/src/tarball_fetcher/tests.rs (2)

108-112: ⚡ Quick win

Add diagnostic logging before assertions.

The assertions check key presence/absence without logging the actual keys vector beforehand. If these assertions fail, the error won't show what keys were actually in received.cas_paths.

Based on learnings, test code should include logging for non-assert_eq! assertions so diagnostic values are printed when a test fails.

📋 Proposed fix to add diagnostic logging
     let keys: Vec<&str> = received.cas_paths.keys().map(String::as_str).collect();
+    eprintln!("keys = {keys:?}");
     assert!(keys.contains(&"dist/index.js"));
     assert!(keys.contains(&"package.json"), "package.json always included");
     assert!(!keys.contains(&"src/index.ts"), "src excluded by files field");
     assert!(!keys.contains(&"test/foo.test.js"), "test excluded by files field");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/git-fetcher/src/tarball_fetcher/tests.rs` around lines 108 - 112, Add
diagnostic logging of the actual keys before the assertions so failing tests
show the contents of received.cas_paths; specifically, compute the keys Vec as
currently done (let keys: Vec<&str> =
received.cas_paths.keys().map(String::as_str).collect();) and then log or print
the keys (e.g., using dbg!, println!, or eprintln!) right before the assert!
lines that reference keys (the assertions checking "dist/index.js",
"package.json", "src/index.ts", "test/foo.test.js") so test failures reveal the
actual keys present.

61-64: ⚡ Quick win

Add diagnostic logging before assertions.

The assertions use assert! without logging the actual values beforehand. If these assertions fail, the diagnostic output won't show what keys were actually present in received.cas_paths or the value of received.built.

Based on learnings, test code should include logging (e.g., eprintln!) for non-assert_eq! assertions so diagnostic values are printed when a test fails.

📋 Proposed fix to add diagnostic logging
+    eprintln!("received.built = {}", received.built);
+    eprintln!("received.cas_paths.keys() = {:?}", received.cas_paths.keys().collect::<Vec<_>>());
     assert!(!received.built, "no `prepare` script → not built");
     assert!(received.cas_paths.contains_key("package.json"));
     assert!(received.cas_paths.contains_key("index.js"));
     assert!(received.cas_paths.contains_key("README.md"));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/git-fetcher/src/tarball_fetcher/tests.rs` around lines 61 - 64, Add
diagnostic printing before the assertions in the test to aid failures: print the
value of received.built and the keys present in received.cas_paths (e.g., using
eprintln! with received.built and received.cas_paths.keys().collect::<Vec<_>>()
or similar) immediately before the assert! calls that check received.built and
the presence of "package.json", "index.js", and "README.md" so test failure
output shows the actual values; update the assertions in tests.rs around the
checks that reference received and received.cas_paths accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/git-fetcher/src/tarball_fetcher/tests.rs`:
- Around line 108-112: Add diagnostic logging of the actual keys before the
assertions so failing tests show the contents of received.cas_paths;
specifically, compute the keys Vec as currently done (let keys: Vec<&str> =
received.cas_paths.keys().map(String::as_str).collect();) and then log or print
the keys (e.g., using dbg!, println!, or eprintln!) right before the assert!
lines that reference keys (the assertions checking "dist/index.js",
"package.json", "src/index.ts", "test/foo.test.js") so test failures reveal the
actual keys present.
- Around line 61-64: Add diagnostic printing before the assertions in the test
to aid failures: print the value of received.built and the keys present in
received.cas_paths (e.g., using eprintln! with received.built and
received.cas_paths.keys().collect::<Vec<_>>() or similar) immediately before the
assert! calls that check received.built and the presence of "package.json",
"index.js", and "README.md" so test failure output shows the actual values;
update the assertions in tests.rs around the checks that reference received and
received.cas_paths accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 82d0d226-84ac-4ef2-b5af-e0371b4a9d03

📥 Commits

Reviewing files that changed from the base of the PR and between 33f04ed and ef34dc2.

📒 Files selected for processing (5)
  • crates/git-fetcher/src/cas_io.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution/tests.rs
  • crates/package-manager/src/install_package_by_snapshot.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/git-fetcher/src/cas_io.rs
  • crates/package-manager/src/install_package_by_snapshot.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Code Coverage
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Preserve existing method chains and pipe-trait chains; do not break them into intermediate let bindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Path over &PathBuf, &str over &String) when it does not force extra copies.
Prefer Arc::clone(&x) and Rc::clone(&x) over x.clone() for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*; for any glob whose target is a module you control. External-crate preludes (e.g., use rayon::prelude::*;) and root-of-module re-exports (e.g., pub use submodule::*; in lib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plain String or &str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only via TryFrom<String> and/or FromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallible From<String> and From<&str> constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bare as assertion), add a from_str_unchecked (or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...

Files:

  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/lockfile/src/resolution/tests.rs
🧠 Learnings (2)
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.

Applied to files:

  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/lockfile/src/resolution/tests.rs
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/lockfile/src/resolution/tests.rs
🔇 Additional comments (2)
crates/lockfile/src/resolution/tests.rs (1)

171-208: LGTM!

crates/git-fetcher/src/tarball_fetcher.rs (1)

1-155: LGTM!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

crates/git-fetcher/src/cas_io.rs:119

  • Same as above: rel.replace('/', std::path::MAIN_SEPARATOR_STR) creates an allocation per file. Consider passing rel directly into join_checked to avoid extra work in the per-file CAS import loop.
    let mut out = HashMap::with_capacity(files.len());
    for rel in files {
        let source = join_checked(pkg_dir, &rel.replace('/', std::path::MAIN_SEPARATOR_STR))?;
        let bytes = fs::read(&source).map_err(GitFetcherError::Io)?;

Comment thread crates/git-fetcher/src/cas_io.rs Outdated
 §C review round 3)

`materialize_into` and `import_into_cas` both ran
`rel.replace('/', std::path::MAIN_SEPARATOR_STR)` before handing the
path to `join_checked`, allocating a fresh `String` per file. CodeRabbit
pointed out that `Path::components()` (which `join_checked` already
uses) recognises both `/` and `\` as separators on Windows, so the
replace is unnecessary on every platform.

For a 1352-package install that's 1352+ avoided allocations per fetcher
pass. Pass `rel` straight to `join_checked` instead, with a short
comment at each call site explaining why.
@zkochan zkochan merged commit 2fcea40 into main May 13, 2026
14 of 16 checks passed
@zkochan zkochan deleted the feat/436-git-hosted-tarball branch May 13, 2026 13:27
zkochan added a commit that referenced this pull request May 13, 2026
Origin/main picked up #451 (`feat(git-fetcher): install git-hosted
tarballs via preparePackage + packlist`) which adds a required `path:
Option<String>` field to `TarballResolution`. The
`synthetic_metadata` helper in `installability/tests.rs` builds the
struct literally and so needs the new field. Set it to `None` — the
installability check doesn't look at `path`, so an unset value
preserves the test's existing assertion subject (engines / cpu / os /
libc filtering).

---
Written by an agent (Claude Code, claude-opus-4-7).
zkochan added a commit that referenced this pull request May 13, 2026
Origin/main picked up #451 (`feat(git-fetcher): install git-hosted
tarballs via preparePackage + packlist`) which adds a required `path:
Option<String>` field to `TarballResolution`. The
`synthetic_metadata` helper in `installability/tests.rs` builds the
struct literally and so needs the new field. Set it to `None` — the
installability check doesn't look at `path`, so an unset value
preserves the test's existing assertion subject (engines / cpu / os /
libc filtering).

---
Written by an agent (Claude Code, claude-opus-4-7).
zkochan added a commit that referenced this pull request May 13, 2026
… fixture

`crates/package-manager/src/installability/tests.rs` initialises a
`TarballResolution` without the `path` field that #451 added (git-
hosted tarball sub-directory selection). The omission landed when
#439 (installability check, #434 slice 1) merged just before #451 —
neither PR touched the other's file, so the bug shipped on
`origin/main`.

CI was failing with:

```
error[E0063]: missing field `path` in initializer of `TarballResolution`
  --> crates/package-manager/src/installability/tests.rs:56:49
```

Fix is one line: `path: None`. The installability check ignores the
resolution shape entirely (see the comment a few lines above the
init site), so `None` is correct.
zkochan added a commit that referenced this pull request May 13, 2026
Origin/main grew a `path: Option<String>` field on
`TarballResolution` (PR #451, `feat(git-fetcher): install
git-hosted tarballs via preparePackage + packlist`). That PR
landed after #439 (`feat(package-is-installable): platform +
engine check`) and missed updating the `TarballResolution`
literal in `crates/package-manager/src/installability/tests.rs`,
so any PR opened against current main inherits a build break in
the `package-is-installable` test build (#445 hit the same
failure). My own `peer_dependency_in_lockfile_surfaces_unsupported`
test in `crates/real-hoist/src/tests.rs` had the same gap.

Add `path: None` to both literals so the test builds compile
against the post-#451 `TarballResolution` shape.

Includes the rebase of feat/438-slice-3b onto current origin/main.
zkochan added a commit that referenced this pull request May 13, 2026
… fixture (#455)

Landrace between #439 (installability tests fixture) and #451 (added
`path: Option<String>` to `TarballResolution`). Both merged green
against their own bases but main's tip won't compile:

    error[E0063]: missing field `path` in initializer of `TarballResolution`
      --> crates/package-manager/src/installability/tests.rs:56:49

Add the missing field with `path: None` so the struct literal matches
the current `TarballResolution` shape.
zkochan added a commit that referenced this pull request May 13, 2026
Origin/main grew a `path: Option<String>` field on
`TarballResolution` (PR #451, `feat(git-fetcher): install
git-hosted tarballs via preparePackage + packlist`). That PR
landed after #439 (`feat(package-is-installable): platform +
engine check`) and missed updating the `TarballResolution`
literal in `crates/package-manager/src/installability/tests.rs`,
so any PR opened against current main inherits a build break in
the `package-is-installable` test build (#445 hit the same
failure). My own `peer_dependency_in_lockfile_surfaces_unsupported`
test in `crates/real-hoist/src/tests.rs` had the same gap.

Add `path: None` to both literals so the test builds compile
against the post-#451 `TarballResolution` shape.

Includes the rebase of feat/438-slice-3b onto current origin/main.
KSXGitHub pushed a commit that referenced this pull request May 13, 2026
Pull in 28 commits from upstream main, including the
`pacquet-npmrc` → `pacquet-config` rename (#420) plus features:
- supportedArchitectures + --cpu/--os/--libc (#456)
- frozen-lockfile (#442, #443, #447, #450)
- git-fetcher (#436 / #446, #451, #454)
- side-effects cache (#421 / #422, #423, #424)
- real-hoist + global-virtual-store (#432 / #438 / #444, #449, #452)
- patchedDependencies + allow-builds (#425, #427, #428)
- engine/platform installability (#434 / #439)

Conflicts resolved:
- `crates/npmrc/` files migrated under the renamed
  `crates/config/` directory; `Npmrc` → `Config` everywhere
  except `NpmrcAuth` (which keeps the `.npmrc`-domain name).
- `Config::current` reads the env-var DI generic `Api: EnvVar`
  for ${VAR}-substitution in `.npmrc`. Production turbofish in
  `cli_args.rs` is `Config::current::<RealApi, _, _, _, _>(...)`.
- Two-phase `NpmrcAuth::apply_*` retained so default-registry
  creds key at the yaml-resolved registry URL.
- New `Config::auth_headers` field plumbed through
  `install_package_by_snapshot`'s `DownloadTarballToStore`.
- Tests under `crates/config/src/workspace_yaml/tests.rs`
  pick up the new ParseYaml unit test added on this branch.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants