Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

perf(git-fetcher): skip CAS re-import when packlist matches input (#436 follow-up)#479

Merged
zkochan merged 2 commits into
mainfrom
feat/436-two-slot-cas
May 13, 2026
Merged

perf(git-fetcher): skip CAS re-import when packlist matches input (#436 follow-up)#479
zkochan merged 2 commits into
mainfrom
feat/436-two-slot-cas

Conversation

@zkochan

@zkochan zkochan commented May 13, 2026

Copy link
Copy Markdown
Member

Summary

Ports the post-packlist fast-path branch from upstream's gitHostedTarballFetcher.ts:88-100 so a successful prepare on a git-hosted tarball doesn't re-hash every file back into the CAS when it doesn't need to.

When resolution.path is None and packlist.len() == cas_paths.len():

  • !should_be_built → queue a PackageFilesIndex row at the final key (synthesized from cas_paths), hand the input map straight back. Upstream copies the \traw row to the prepared key; pacquet's tarball download doesn't write a raw row at this key today, so we synthesize from the CAS layout instead — the actual CAS blobs are already in place from the tarball download.
  • should_be_built && ignore_scripts → return input as-is without queueing a final-key row, so subsequent --ignore-scripts installs re-check the build gate (matches upstream's storeIndex.delete(rawFilesIndexFile); return { filesMap, ignoredBuild: true } branch).
  • Otherwise → slow path (import_into_cas), unchanged.

The synthesized row's per-file digest is reconstructed from the v11 CAS layout (files/XX/<rest>[-exec] → shard byte + file stem); size comes from fs::metadata; mode from the -exec suffix via the same is_executable rule the read side uses (cas_file_path_by_mode). Skips the full file-read + SHA-512 cost per entry.

A cas_io::synthesize_files_index helper does the heavy lifting and gets its own unit tests for digest round-trip and exec-bit recovery.

Test plan

  • cargo nextest run -p pacquet-git-fetcher — 72 tests pass (+8 new: 4 fast-path scenarios in tarball_fetcher/tests.rs, 4 helper tests in cas_io/tests.rs)
  • just ready — full workspace, 1026 tests pass
  • just dylint — perfectionist lints clean
  • taplo format --check — clean

New fast-path tests

  • fast_path_returns_input_cas_paths_when_no_build_needed — output is the input map verbatim when !should_be_built && path.is_none() && packlist matches.
  • fast_path_queues_synthesized_index_row — synthesized row queued at final key; an executable entry's (digest, mode) round-trips through cas_file_path_by_mode to the input path.
  • fast_path_ignore_scripts_returns_input_without_queueing_rowshould_be_built && ignore_scripts branch returns input but does not queue a row.
  • sub_path_never_takes_fast_path — sub-path always takes the slow path even on degenerate single-package monorepos.

New helper tests (cas_io)

  • cas_path_digest_round_trips_through_write_cas_file — anchored against the canonical write side.
  • cas_path_digest_rejects_malformed_paths — wrong shape / non-hex shard / empty stem → None.
  • synthesize_files_index_recovers_digest_size_and_exec_bit0o755 vs 0o644 synthesis matches the write side's round-trip rule.
  • synthesize_files_index_errors_on_malformed_cas_pathIo(InvalidData).

Written by an agent (Claude Code, claude-opus-4-7).

Summary by CodeRabbit

  • New Features
    • Added a CAS fast-path that synthesizes file indexes from existing CAS entries to skip re-imports when rebuilds aren't needed, preserves executable flags and file sizes, and refines fast-path behavior for subpaths, build gating, and ignore-scripts.
  • Tests
    • Added tests covering fast-path behavior, path-shape validation, executable/digest round-tripping, and error cases.

Review Change Stack

… follow-up)

Port upstream's [`gitHostedTarballFetcher.ts:88-100`](https://github.com/pnpm/pnpm/blob/94240bc046/fetching/tarball-fetcher/src/gitHostedTarballFetcher.ts#L88-L100)
fast path: when `resolution.path` is `None` and `packlist.len() ==
cas_paths.len()`, the materialized tree is byte-identical to the
input CAS files and re-importing every entry through hash-dedup is
wasted work. Two branches mirror upstream:

- `!should_be_built`: queue a `PackageFilesIndex` row at the final
  key with `requires_build: Some(false)`, then hand the input
  `cas_paths` straight to the dispatcher. Upstream copies the
  `\traw` row to the prepared key; pacquet's tarball download
  doesn't write a raw row at this key today, so the row is
  synthesized from `cas_paths` instead — the CAS files themselves
  are already in place.
- `should_be_built && ignore_scripts`: return input as-is *without*
  queueing a row, so subsequent installs re-check the build gate.

The synthesized row's per-file digest comes from the CAS path
itself (pnpm v11's `files/XX/<rest>[-exec]` layout makes this
exact); size comes from `fs::metadata`, mode from the `-exec`
suffix. No per-file `fs::read + sha512` — the whole point of the
optimization.
Copilot AI review requested due to automatic review settings May 13, 2026 19:16
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b82d40f1-bf75-498d-94c9-18875ba1cb06

📥 Commits

Reviewing files that changed from the base of the PR and between fceea6a and bf6b6d3.

📒 Files selected for processing (2)
  • crates/git-fetcher/src/cas_io.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/git-fetcher/src/cas_io.rs
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Lint and Test (ubuntu-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Code Coverage
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Run benchmark on ubuntu-latest

📝 Walkthrough

Walkthrough

Adds CAS path parsing helpers (cas_path_digest, synthesize_files_index) to reconstruct PackageFilesIndex from existing cas_paths without re-reading bytes. Integrates the fast path into GitHostedTarballFetcher to skip CAS re-import when file counts match and path is None, with special handling for ignore_scripts. Includes comprehensive test coverage for fast-path control flow, index queueing, sub-path, and script-ignore edge cases.

Changes

CAS Fast-Path Tarball Fetcher

Layer / File(s) Summary
CAS path parsing infrastructure
crates/git-fetcher/src/cas_io.rs (lines 187–259, 273–281, 327–428)
Implements cas_path_digest to validate and extract hex digests from v11 CAS paths, unconditional cas_path_is_executable helper for -exec suffix detection, and synthesize_files_index to rebuild PackageFilesIndex::files from cas_paths by parsing digest, deriving mode from executability, and fetching size via fs::metadata. Includes unit tests for round-trip consistency, malformed-path rejection, and correct file info recovery.
Tarball fetcher fast-path integration
crates/git-fetcher/src/tarball_fetcher.rs (lines 16–32, 155–204)
Documents why Pacquet synthesizes prepared store-index rows from cas_paths on the fast path rather than creating upstream-style two-slot entries. Refines fetcher control flow to take fast path when file counts match and path is None, synthesizing PackageFilesIndex and skipping CAS re-import; when ignore_scripts is true, returns input cas_paths without queuing index row to preserve --ignore-scripts idempotency.
Fast-path behavior test coverage
crates/git-fetcher/src/tarball_fetcher/tests.rs (lines 353–618)
Adds four tests: verifies fast path returns input cas_paths unchanged when no build is needed; confirms synthesized index row is queued with correct file keys and executable mode/digest round-tripping; validates sub-path always triggers slow path with relative file keys; ensures ignore_scripts returns input without queuing final-key index row.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pnpm/pacquet#454: Warm prefetch work that computes and propagates files_index and writes PackageFilesIndex rows to index.db, directly complementary to this PR's synthesize_files_index functionality for warm installs.

Poem

🐰 With CAS paths parsed and digests extracted clean,
Fast paths shimmer where reimports have been.
No bytes re-read, just metadata's light—
warm installs leap, executability bright! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main performance optimization: skipping CAS re-import when the packlist matches input, directly reflecting the core change across all modified files.
Description check ✅ Passed The description comprehensively covers the summary, rationale, upstream reference, and detailed test plan. All required sections are present and well-documented.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/436-two-slot-cas

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/git-fetcher/src/cas_io.rs`:
- Around line 241-252: cas_path_digest currently accepts any hex shard plus hex
stem; tighten it to match v11 CAS digests exactly by requiring the stem (after
stripping "-exec") to be exactly 64 hex chars. In function cas_path_digest,
after computing `stem` and `shard`, replace the loose checks with: ensure
`shard.len() == 2` and every byte is an ASCII hex digit, and ensure
`!stem.is_empty()` is replaced by `stem.len() == 64` and every byte of `stem` is
an ASCII hex digit; return None otherwise so only full v11 CAS shapes are
accepted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a02d1403-ff96-4816-9385-e46105972a75

📥 Commits

Reviewing files that changed from the base of the PR and between bd99486 and fceea6a.

📒 Files selected for processing (3)
  • crates/git-fetcher/src/cas_io.rs
  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Agent
  • GitHub Check: Lint and Test (ubuntu-latest)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Lint and Test (macos-latest)
  • GitHub Check: Lint and Test (windows-latest)
  • GitHub Check: Dylint
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Code Coverage
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Preserve existing method chains and pipe-trait chains; do not break them into intermediate let bindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Path over &PathBuf, &str over &String) when it does not force extra copies.
Prefer Arc::clone(&x) and Rc::clone(&x) over x.clone() for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*; for any glob whose target is a module you control. External-crate preludes (e.g., use rayon::prelude::*;) and root-of-module re-exports (e.g., pub use submodule::*; in lib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plain String or &str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only via TryFrom<String> and/or FromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallible From<String> and From<&str> constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bare as assertion), add a from_str_unchecked (or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...

Files:

  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/git-fetcher/src/cas_io.rs
🧠 Learnings (2)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/git-fetcher/src/tarball_fetcher.rs
  • crates/git-fetcher/src/tarball_fetcher/tests.rs
  • crates/git-fetcher/src/cas_io.rs
📚 Learning: 2026-05-01T10:01:33.766Z
Learnt from: zkochan
Repo: pnpm/pacquet PR: 349
File: crates/reporter/src/tests.rs:121-121
Timestamp: 2026-05-01T10:01:33.766Z
Learning: In Rust test code, follow the repo’s CODE_STYLE_GUIDE test-logging rule: add logging (e.g., `eprintln!`/`eprintln!(...)`) so that useful diagnostic values are printed when a test fails, unless the assertion is `assert_eq!` (where the differing values are already included). Concretely, if you use assertions like `assert!`, `assert_ne!`, etc., ensure the test logs the relevant actual/expected values (or context) before/around the assertion so failures can be diagnosed without rerunning.

Applied to files:

  • crates/git-fetcher/src/tarball_fetcher/tests.rs
🔇 Additional comments (3)
crates/git-fetcher/src/cas_io.rs (1)

321-405: LGTM!

crates/git-fetcher/src/tarball_fetcher.rs (1)

16-25: LGTM!

Also applies to: 32-32, 155-205

crates/git-fetcher/src/tarball_fetcher/tests.rs (1)

353-604: LGTM!

Comment thread crates/git-fetcher/src/cas_io.rs
@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.08257% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.43%. Comparing base (f9012b4) to head (bf6b6d3).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/git-fetcher/src/cas_io.rs 98.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #479      +/-   ##
==========================================
+ Coverage   90.09%   90.43%   +0.33%     
==========================================
  Files         119      121       +2     
  Lines       11098    11475     +377     
==========================================
+ Hits         9999    10377     +378     
+ Misses       1099     1098       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.00     15.8±0.54ms   274.1 KB/sec    1.00     15.8±0.42ms   274.7 KB/sec

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports pnpm’s post-packlist fast path for git-hosted tarballs so that when the prepared tree is byte-identical to the already-imported CAS content, pacquet can skip the expensive import_into_cas re-hash step and return the original cas_paths map directly.

Changes:

  • Add a fast path in GitHostedTarballFetcher to skip re-import when path.is_none() and packlist.len() == cas_paths.len(), with behavior matching upstream for both “no build needed” and “build needed but scripts ignored” cases.
  • Introduce cas_io::synthesize_files_index to reconstruct PackageFilesIndex.files entries from v11 CAS paths (digest + exec-bit + size) without reading file contents, plus unit tests.
  • Add integration tests covering the new fast-path behaviors and expected store-index side effects.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
crates/git-fetcher/src/tarball_fetcher.rs Adds the fast-path branch and store-index row synthesis to avoid unnecessary CAS re-import/hashing.
crates/git-fetcher/src/cas_io.rs Adds digest reconstruction + PackageFilesIndex.files synthesis helper and unit tests.
crates/git-fetcher/src/tarball_fetcher/tests.rs Adds new integration tests for fast-path correctness and store-index behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/git-fetcher/src/cas_io.rs Outdated
Comment thread crates/git-fetcher/src/cas_io.rs Outdated
Comment thread crates/git-fetcher/src/tarball_fetcher/tests.rs Outdated
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 2.802 ± 0.113 2.646 3.022 1.01 ± 0.05
pacquet@main 2.772 ± 0.096 2.677 2.935 1.00
pnpm 6.654 ± 0.131 6.443 6.885 2.40 ± 0.10
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 2.801676832860001,
      "stddev": 0.11288774132381,
      "median": 2.80118242206,
      "user": 2.62067732,
      "system": 3.8663815199999996,
      "min": 2.64586275706,
      "max": 3.02194963106,
      "times": [
        3.02194963106,
        2.7977955930600005,
        2.89702353506,
        2.8678588040600004,
        2.80456925106,
        2.83521547106,
        2.75787432606,
        2.6603891790600005,
        2.72822978106,
        2.64586275706
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.77168761006,
      "stddev": 0.09596449115288286,
      "median": 2.7386843580600004,
      "user": 2.6873426199999995,
      "system": 3.8586093200000002,
      "min": 2.6771420220600004,
      "max": 2.93457592906,
      "times": [
        2.9231252960600003,
        2.7303845910600004,
        2.6771420220600004,
        2.7469841250600004,
        2.93457592906,
        2.6843939160600003,
        2.6809640260600003,
        2.73013415706,
        2.7698056560600004,
        2.83936638206
      ]
    },
    {
      "command": "pnpm",
      "mean": 6.653522134560001,
      "stddev": 0.13140305520762072,
      "median": 6.645163571059999,
      "user": 9.77192312,
      "system": 4.72913062,
      "min": 6.44292599806,
      "max": 6.88526302106,
      "times": [
        6.57266734306,
        6.68072945706,
        6.74342709606,
        6.56427594706,
        6.77459984306,
        6.88526302106,
        6.54029943806,
        6.72143551706,
        6.44292599806,
        6.60959768506
      ]
    }
  ]
}

Scenario: Frozen Lockfile (Hot Cache)

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 767.0 ± 72.8 720.2 969.2 1.00
pacquet@main 821.2 ± 51.4 747.0 924.3 1.07 ± 0.12
pnpm 2761.0 ± 92.7 2643.0 2888.1 3.60 ± 0.36
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.7669954418400002,
      "stddev": 0.07282519528299715,
      "median": 0.7435631543400001,
      "user": 0.39787974,
      "system": 1.6389264200000002,
      "min": 0.7202289538400001,
      "max": 0.9691887898400001,
      "times": [
        0.9691887898400001,
        0.74472600184,
        0.7719322528400001,
        0.74653322584,
        0.77007145584,
        0.74240030684,
        0.7381573128400001,
        0.7306787238400001,
        0.7202289538400001,
        0.7360373948400001
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.8212273055400001,
      "stddev": 0.05141538131493712,
      "median": 0.80666417184,
      "user": 0.40029144,
      "system": 1.6531599200000002,
      "min": 0.74697524384,
      "max": 0.9242620768400001,
      "times": [
        0.8547394668400001,
        0.74697524384,
        0.7980141568400001,
        0.8570356738400001,
        0.8108672148400001,
        0.7970373888400001,
        0.8519181538400001,
        0.9242620768400001,
        0.8024611288400001,
        0.7689625508400001
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.7610234722399998,
      "stddev": 0.0926633739251967,
      "median": 2.75379020784,
      "user": 3.3494344399999996,
      "system": 2.31800092,
      "min": 2.64300308684,
      "max": 2.8880805918399997,
      "times": [
        2.64300308684,
        2.77795081284,
        2.8880805918399997,
        2.88511639684,
        2.72962960284,
        2.6889447618399998,
        2.69223128784,
        2.84186052384,
        2.6531149668399996,
        2.81030269084
      ]
    }
  ]
}

…ub-path slow-path

CodeRabbit and Copilot review on #479:

- `cas_path_digest` now requires the stem to be exactly 126 hex
  chars (the remainder of a sha512 hex digest after the 2-char
  shard). Previously accepted any non-empty hex stem, so paths like
  `.../files/ab/cd` would synthesize a 4-char "digest" that could
  poison `index.db`. Fail closed on anything that isn't a real v11
  CAS path.
- Replace the bogus "empty stem" malformed-path case (which never
  triggered — `Path::file_name()` of `/tmp/ab/` is `Some("ab")`)
  with cases that actually exercise the new length check: too
  short, too long, and right-length-with-a-non-hex-byte.
- Rebuild `sub_path_never_takes_fast_path` so `packlist.len() ==
  cas_paths.len()` (single sub-package tarball). Only the
  `path.is_none()` guard now keeps the fast path out — if a future
  refactor drops it, the test catches the misshaped
  `packages/sub/...` keys leaking into the dispatcher's output.
@zkochan zkochan merged commit a635673 into main May 13, 2026
16 checks passed
@zkochan zkochan deleted the feat/436-two-slot-cas branch May 13, 2026 19:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants