Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.

feat(tarball): binary fetcher for zip archives (#437 slice C)#472

Merged
zkochan merged 7 commits into
mainfrom
feat/437-slice-c
May 13, 2026
Merged

feat(tarball): binary fetcher for zip archives (#437 slice C)#472
zkochan merged 7 commits into
mainfrom
feat/437-slice-c

Conversation

@zkochan

@zkochan zkochan commented May 13, 2026

Copy link
Copy Markdown
Member

Summary

Slice C of #437 — the zip half of pnpm's binary fetcher. Slices A (lockfile types) and B (variant picker) have landed; this slice adds the per-archive download and CAS-extraction pipeline. The install-dispatch site (slice D) and the --no-runtime flag (slice E) will follow.

Mirrors upstream's downloadAndUnpackZip / extractZipToTarget:

  • extract_zip_entries walks every entry. Path validation runs before the is_dir() early-skip so directory entries like ../evil/ still surface PATH_TRAVERSAL. The canonical cas_paths / pkg_files_idx key is built from enclosed_name() rebuilt as Normal components joined with /. segments collapse, separators are cross-platform consistent, and the ignore filter sees the same string the map is keyed by.
  • DownloadZipArchiveToStore is the public sibling of DownloadTarballToStore — same store-index lookup, prefetch cache reuse, store-index writer queue. The retry policy, network-permit shape, and post-download semaphore gating exactly match the tarball path.
  • ignore_file_pattern is Option<Arc<IgnoreEntryFilter>> on both DownloadTarballToStore and DownloadZipArchiveToStore, so the Slice D dispatcher can construct a filter per fetch from runtime config without pinning to 'static. Cloned per retry attempt; the inner trait object is shared.
  • run_with_mem_cache doc-blocks the "stable filter per URL" invariant: the cache keys solely on package_url (matching pnpm's tarballCache), and callers keep the (URL, filter) relation functional. Upstream's archiveFilters is keyed by pkg.name, and tarball URLs encode (name, version, integrity), so this naturally holds.
  • New TarballError::ReadZipEntries(std::io::Error) for zip per-entry I/O failures (try_reserve / read_to_end); maps to ERR_PACQUET_ZIP in the retry classifier, distinct from the tar-specific ERR_PACQUET_TARBALL_TAR. Zip unix_mode() is masked to 0o777 before write so CafsFileInfo.mode stays permission-only (matches store_dir::add_files_from_dir::file_mode_from). TarballError::TarballTooLarge display generalized from "Tarball at …" to "Archive at …" so the zip pipeline's OOM path doesn't surface a misleading message.
  • Adds zip = "5", default-features = false, features = ["deflate"] to the workspace; reused in pacquet-tarball.

Test plan

7 new unit tests (6 zip-extractor + 1 tarball ignore-filter). Each rejection / normalization guard was verified by temporarily disabling it and re-running the test (per the practice in plans/TEST_PORTING.md):

  • extract_zip_strips_prefix_from_entry_paths — prefix strip
  • extract_zip_applies_ignore_filter_on_stripped_path — filter sees post-strip paths
  • extract_zip_rejects_parent_dir_componentPATH_TRAVERSAL on file entry with ..
  • extract_zip_rejects_directory_entry_with_parent_componentPATH_TRAVERSAL on directory entry with .. (validation must run before is_dir() skip)
  • extract_zip_normalizes_dot_segments_in_entry_paths. segments collapse in the canonical key
  • extract_zip_uses_entry_path_when_no_prefixarchive_prefix: None keeps entry names verbatim
  • extract_tarball_applies_ignore_filter_dropping_entries_from_both_maps — tarball filter drops entries from both cas_paths and pkg_files_idx.files
  • All 42 pacquet-tarball tests pass
  • just ready green
  • taplo format --check, just dylint, cargo doc -D warnings green

Written by an agent (Claude Code, claude-opus-4-7).

Copilot AI review requested due to automatic review settings May 13, 2026 17:22
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-entry ignore filtering to archive extraction, extends TarballError for zip and path-traversal failures, implements zip fetch-and-extract pipeline with optional prefix-stripping and ignore-filtering, threads the new API through retry plumbing and callers, updates tests, and adds the zip dependency.

Changes

Archive Filtering and Zip Support

Layer / File(s) Summary
Dependency Setup
Cargo.toml, crates/tarball/Cargo.toml
Add zip crate to workspace and tarball crate dependencies with deflate feature.
Error Variants & Callback Type
crates/tarball/src/lib.rs
Introduce IgnoreEntryFilter and extend TarballError with PathTraversal, ReadZipArchive, and ReadZipEntries.
Tarball Entry Filtering & Plumbing
crates/tarball/src/lib.rs
extract_tarball_entries accepts ignore_file_pattern; add ignore_file_pattern to DownloadTarballToStore, thread through fetch_and_extract_*, retry plumbing, and update retry code mapping.
Zip Extraction Pipeline
crates/tarball/src/lib.rs
Add extract_zip_entries, fetch_and_extract_zip_* retry pipeline, and DownloadZipArchiveToStore with archive_prefix and ignore_file_pattern.
Caller Updates
crates/package-manager/src/install_package_by_snapshot.rs, crates/package-manager/src/install_package_from_registry.rs, tasks/micro-benchmark/src/main.rs
Pass explicit ignore_file_pattern: None when constructing DownloadTarballToStore.
Test Updates & Zip Tests
crates/tarball/src/tests.rs
Update imports and DownloadTarballToStore usages to include ignore_file_pattern: None; update tar extraction and retry call sites; add build_zip helper and zip extraction tests for prefix stripping, ignore-filtering, and path-traversal rejection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • pnpm/pacquet#375: Related work on tarball -> RequestRetryError mapping and retry logging that ties into the new zip/path-traversal codes.

Suggested reviewers

  • anthonyshew
  • KSXGitHub

Poem

🐰 I nibble entries, one by one,
Strip the prefix, hide from sun,
Reject the .. that tries to creep,
Zip and tar I safely keep,
CAS indexed, snug in heap.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the primary change: implementing the zip archive binary fetcher functionality as slice C of issue #437.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering all major aspects of the change with clear sections including Summary, Linked issue, Test plan, and implementation details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/437-slice-c

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the zip-archive half of pnpm’s “binary fetcher” in pacquet-tarball, adding an in-memory download + integrity check + per-entry CAS extraction pipeline, and threads an optional per-entry ignore filter through both tarball and zip extraction to support Node runtime “extras” stripping.

Changes:

  • Add zip extraction (extract_zip_entries) plus DownloadZipArchiveToStore with retry + progress/log emission parity to the tarball flow.
  • Thread ignore_file_pattern through DownloadTarballToStore / extract_tarball_entries to support per-archive entry exclusion.
  • Add zip as a workspace dependency and wire it into pacquet-tarball, updating call sites/tests/benchmarks for the new ignore_file_pattern field.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tasks/micro-benchmark/src/main.rs Updates benchmark harness to populate the new ignore_file_pattern field on tarball downloads.
crates/tarball/src/tests.rs Adjusts tarball tests for new extraction signature and adds zip-extractor unit tests.
crates/tarball/src/lib.rs Core implementation: zip fetch/extract pipeline, new error variants, and ignore-filter plumbing for tar + zip.
crates/tarball/Cargo.toml Adds zip dependency usage to pacquet-tarball.
crates/package-manager/src/install_package_from_registry.rs Updates tarball download call site to pass ignore_file_pattern: None.
crates/package-manager/src/install_package_by_snapshot.rs Updates tarball download call site to pass ignore_file_pattern: None.
Cargo.toml Adds zip = "5" to workspace dependencies (deflate-only, no default features).
Cargo.lock Locks new dependency graph introduced by zip.
Comments suppressed due to low confidence (2)

crates/tarball/src/tests.rs:2047

  • This dbg!(&cas_paths); debug print will spam test output. Please remove it to keep the test suite quiet.
    dbg!(&cas_paths);

crates/tarball/src/tests.rs:2100

  • This dbg!(&cas_paths); debug print will spam test output. Please remove it to keep the test suite quiet.
    dbg!(&cas_paths);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/tarball/src/lib.rs Outdated
Comment thread crates/tarball/src/lib.rs Outdated
Comment thread crates/tarball/src/lib.rs Outdated
Comment thread crates/tarball/src/tests.rs
@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.52553% with 248 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.55%. Comparing base (d8e7807) to head (667cf1d).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
crates/tarball/src/lib.rs 25.07% 248 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #472      +/-   ##
==========================================
- Coverage   89.61%   88.55%   -1.06%     
==========================================
  Files         119      121       +2     
  Lines       11098    12234    +1136     
==========================================
+ Hits         9945    10834     +889     
- Misses       1153     1400     +247     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Micro-Benchmark Results

Linux

group                          main                                   pr
-----                          ----                                   --
tarball/download_dependency    1.05     17.5±0.59ms   248.3 KB/sec    1.00     16.7±0.48ms   260.3 KB/sec

@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Integrated-Benchmark Report (Linux)

Scenario: Frozen Lockfile

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 2.719 ± 0.089 2.576 2.848 1.02 ± 0.04
pacquet@main 2.656 ± 0.055 2.563 2.755 1.00
pnpm 6.284 ± 0.054 6.218 6.357 2.37 ± 0.05
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 2.71904315346,
      "stddev": 0.08947966145406333,
      "median": 2.72895980536,
      "user": 2.56474022,
      "system": 3.7650946999999997,
      "min": 2.5764544333600004,
      "max": 2.84777590936,
      "times": [
        2.6282696863600004,
        2.73613882836,
        2.7286920303600004,
        2.70207382836,
        2.7650706103600005,
        2.72922758036,
        2.5764544333600004,
        2.84777590936,
        2.63048981336,
        2.8462388143600004
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.6555025690600003,
      "stddev": 0.055439845464651964,
      "median": 2.65779852136,
      "user": 2.6021929199999994,
      "system": 3.7749892000000003,
      "min": 2.56258926936,
      "max": 2.7545129143600002,
      "times": [
        2.62697032736,
        2.5863430053600003,
        2.56258926936,
        2.6659934193600003,
        2.6559302123600004,
        2.65365311436,
        2.7545129143600002,
        2.68005690436,
        2.65966683036,
        2.7093096933600003
      ]
    },
    {
      "command": "pnpm",
      "mean": 6.28395022156,
      "stddev": 0.053569903974164335,
      "median": 6.27955170736,
      "user": 9.34455502,
      "system": 4.5714847999999995,
      "min": 6.21835355036,
      "max": 6.35725753336,
      "times": [
        6.23967632336,
        6.29922734036,
        6.35725753336,
        6.21891111536,
        6.24473755936,
        6.21835355036,
        6.25987607436,
        6.32548814236,
        6.3427025313600005,
        6.33327204536
      ]
    }
  ]
}

Scenario: Frozen Lockfile (Hot Cache)

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 757.6 ± 54.4 708.9 905.3 1.00
pacquet@main 764.2 ± 67.7 709.3 898.7 1.01 ± 0.12
pnpm 2688.8 ± 137.0 2548.1 3011.2 3.55 ± 0.31
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.7575869656,
      "stddev": 0.054434556490960975,
      "median": 0.7466758209000001,
      "user": 0.39364111999999996,
      "system": 1.6048055599999995,
      "min": 0.7088926524000001,
      "max": 0.9052828304000001,
      "times": [
        0.9052828304000001,
        0.7544362784,
        0.7431580014000001,
        0.7490591994000001,
        0.7470446844,
        0.7671136764,
        0.7309489894000001,
        0.7088926524000001,
        0.7463069574000001,
        0.7236263864000001
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.7642349823000002,
      "stddev": 0.06772233840693095,
      "median": 0.7358048909000001,
      "user": 0.37653141999999995,
      "system": 1.62431146,
      "min": 0.7093041234,
      "max": 0.8987162884000001,
      "times": [
        0.7576539494000001,
        0.7093041234,
        0.8987162884000001,
        0.7266937424000001,
        0.7394712524000001,
        0.7157820094,
        0.8790623254000001,
        0.7321385294,
        0.7254050334000001,
        0.7581225694000001
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.6887758119,
      "stddev": 0.1369849955210745,
      "median": 2.6629462619,
      "user": 3.2528337200000004,
      "system": 2.2757651599999997,
      "min": 2.5481384534,
      "max": 3.0112050944,
      "times": [
        2.6981962094,
        2.7531606224000003,
        3.0112050944,
        2.6276963144,
        2.5794095204,
        2.5481384534,
        2.5749437494,
        2.7215333184,
        2.6119726174,
        2.7615022194
      ]
    }
  ]
}

zkochan added a commit that referenced this pull request May 13, 2026
Address Copilot review on #472:

- `extract_zip_entries` masks `entry.unix_mode().unwrap_or(0o644) & 0o777`
  before writing `CafsFileInfo.mode`, matching the permission-only
  convention enforced by `store_dir::add_files_from_dir::file_mode_from`.
  Without the mask, the high `st_mode` bits (e.g. `0o100000` for a
  regular file) would land in the `index.db` row and diverge from
  what pnpm writes for tarball entries / on-disk imports.
- Adds `TarballError::ReadZipEntries(std::io::Error)` for zip
  per-entry I/O failures (try_reserve / read_to_end), distinct from
  `ReadTarballEntries`. The retry-classification path now emits
  `ERR_PACQUET_ZIP` rather than the misleading
  `ERR_PACQUET_TARBALL_TAR` for zip-side failures.
- Changes `ignore_file_pattern` from `Option<&'static IgnoreEntryFilter>`
  to `Option<Arc<IgnoreEntryFilter>>` on both `DownloadTarballToStore`
  and `DownloadZipArchiveToStore`. The Slice D dispatcher can now
  construct one filter per fetch from runtime config (the per-package
  `archiveFilters` map upstream uses) without leaking the closure or
  pinning it to `'static`. Cloning the Arc per retry attempt is cheap;
  the inner trait object is shared.
Adds the zip-archive side of pnpm's binary fetcher. Mirrors
[`downloadAndUnpackZip` and `extractZipToTarget`](https://github.com/pnpm/pnpm/blob/94240bc046/fetching/binary-fetcher/src/index.ts):
walks zip entries, rejects absolute / `..` paths via
`zip::ZipFile::enclosed_name`, strips the archive's top-level
basename (`BinaryResolution::prefix`) so the ignore filter and CAS
keys see the same relative paths upstream's regex does, then
writes each file entry into the CAS through `StoreDir::write_cas_file`.

* `extract_zip_entries` — parallel to `extract_tarball_entries`,
  with `archive_prefix` + `ignore_file_pattern` plumbing
* `fetch_and_extract_zip_once` / `fetch_and_extract_zip_with_retry`
  — same network permit shape, retry policy, and post-download
  semaphore gating as the tarball path
* `DownloadZipArchiveToStore` — sibling of `DownloadTarballToStore`
  with the same store-index lookup, prefetch cache reuse, and
  store-index writer queue
* `ignore_file_pattern` threaded through `DownloadTarballToStore`
  and the tarball extractor (the tarball half of upstream's
  `archiveFilters` map)
* Adds `zip = "5", default-features = false, features = ["deflate"]`
  to the workspace; reused in `pacquet-tarball`

Includes 4 unit tests for the zip extractor: prefix-stripping,
ignore-filter applied to the stripped path, `..` rejection, and
the no-prefix passthrough. Each rejection guard was verified by
temporarily disabling it and re-running the test.

The install-dispatch site (slice D) and the per-package
`ignoreFilePattern` value (the Node-runtime extras filter) will
land in the follow-up PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/tarball/src/lib.rs (1)

1758-1759: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Key the cache/index by extraction shape, not just (integrity, package_id).

ignore_file_pattern and archive_prefix now change the persisted PackageFilesIndex, but both the lookup and write paths still use store_index_key(&integrity, package_id) only. That means a filtered fetch can be reused for an unfiltered one (or vice versa), so callers can get the wrong file set out of the store without touching the network. The same collision also exists for the tarball mem-cache keyed only by package_url. Consider threading an explicit extraction-variant key/fingerprint from the caller into both read and write paths, or bypassing reuse when non-default extraction is requested.

Also applies to: 1837-1839, 2153-2155, 2195-2197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tarball/src/lib.rs` around lines 1758 - 1759, The cache/index keys
(created via store_index_key(&package_integrity.to_string(), package_id) in the
read/write paths) must include the extraction “shape” fingerprint (e.g.,
ignore_file_pattern and archive_prefix) so filtered vs unfiltered
PackageFilesIndex entries do not collide; update the functions that call
store_index_key (and the tarball mem-cache keyed by package_url) to accept and
incorporate an extraction_variant or fingerprint parameter provided by the
caller (or compute a stable fingerprint from ignore_file_pattern +
archive_prefix) and use that in both lookup and write paths, or alternatively
short-circuit reuse when the extraction is non-default; ensure the same symbol
(extraction_variant/fingerprint) is threaded into the code paths referenced
around store_index_key, PackageFilesIndex write/read, and the tarball mem-cache
to keep keys consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/tarball/src/lib.rs`:
- Around line 699-706: The loop currently checks entry.is_dir() and continues
before validating the entry path, allowing malicious dir entries like "../evil/"
to bypass traversal checks; update the loop so you call entry.enclosed_name()
(or otherwise validate the path and return TarballError::PathTraversal with the
package URL and offending path) before skipping directory entries, and treat any
invalid or non-enclosed names as an error rather than silently continuing; apply
the same change to the other occurrence noted around the 713-719 range so every
entry (including directories) is validated prior to being skipped or processed.

---

Outside diff comments:
In `@crates/tarball/src/lib.rs`:
- Around line 1758-1759: The cache/index keys (created via
store_index_key(&package_integrity.to_string(), package_id) in the read/write
paths) must include the extraction “shape” fingerprint (e.g.,
ignore_file_pattern and archive_prefix) so filtered vs unfiltered
PackageFilesIndex entries do not collide; update the functions that call
store_index_key (and the tarball mem-cache keyed by package_url) to accept and
incorporate an extraction_variant or fingerprint parameter provided by the
caller (or compute a stable fingerprint from ignore_file_pattern +
archive_prefix) and use that in both lookup and write paths, or alternatively
short-circuit reuse when the extraction is non-default; ensure the same symbol
(extraction_variant/fingerprint) is threaded into the code paths referenced
around store_index_key, PackageFilesIndex write/read, and the tarball mem-cache
to keep keys consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 76c27e6b-6cfe-4d99-9f9e-a56056fb22eb

📥 Commits

Reviewing files that changed from the base of the PR and between 176c174 and eb854a6.

📒 Files selected for processing (1)
  • crates/tarball/src/lib.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Preserve existing method chains and pipe-trait chains; do not break them into intermediate let bindings unless there is a concrete justification such as a compilation failure, borrow checker rejection, meaningful performance improvement, or other technical necessity. Refactoring for style alone is not sufficient justification.
Choose owned vs. borrowed parameters to minimize copies; prefer borrowed types (&Path over &PathBuf, &str over &String) when it does not force extra copies.
Prefer Arc::clone(&x) and Rc::clone(&x) over x.clone() for reference-counted types to make the cost visible at the call site.
Do not use star imports inside module bodies. Write use super::{Foo, bar} instead of use super::*; for any glob whose target is a module you control. External-crate preludes (e.g., use rayon::prelude::*;) and root-of-module re-exports (e.g., pub use submodule::*; in lib.rs) are exceptions.
Follow Rust API Guidelines for naming, as documented in https://rust-lang.github.io/api-guidelines/naming.html.
Declare a newtype wrapper for any branded string type being ported from TypeScript pnpm. Do not collapse the brand into a plain String or &str; give the type its own struct so misuse is a type error.
When porting branded string types where upstream TypeScript always validates before construction, validate in the Rust port too. Construct the wrapper only via TryFrom<String> and/or FromStr; do not provide an infallible public constructor that takes an arbitrary string.
For branded string types where upstream TypeScript never validates (used purely for type-safety to prevent confusion between string slots), expose an infallible From<String> and From<&str> constructor in the Rust wrapper.
When upstream TypeScript occasionally constructs a branded type without validation (via bare as assertion), add a from_str_unchecked (or similarly named) constructor on the Rust side. Keep the validating constructor as well; `from_str_u...

Files:

  • crates/tarball/src/lib.rs
🧠 Learnings (1)
📚 Learning: 2026-05-07T23:19:08.272Z
Learnt from: KSXGitHub
Repo: pnpm/pacquet PR: 401
File: tasks/integrated-benchmark/src/work_env.rs:343-344
Timestamp: 2026-05-07T23:19:08.272Z
Learning: When reviewing Rust code in pnpm/pacquet for deprecated API usage, do not automatically treat `serde_saphyr::to_string` as deprecated. In `serde-saphyr` v0.0.25, `serde_saphyr::to_string` has no `#[deprecated]` attribute (the `#[deprecated]` later in `serde-saphyr-0.0.25/src/lib.rs` applies to a different function). Only flag `serde_saphyr::to_string` as deprecated if the resolved dependency version’s source shows `#[deprecated]` on that specific function.

Applied to files:

  • crates/tarball/src/lib.rs

Comment thread crates/tarball/src/lib.rs Outdated
Address Copilot review on #472:

- `extract_zip_entries` masks `entry.unix_mode().unwrap_or(0o644) & 0o777`
  before writing `CafsFileInfo.mode`, matching the permission-only
  convention enforced by `store_dir::add_files_from_dir::file_mode_from`.
  Without the mask, the high `st_mode` bits (e.g. `0o100000` for a
  regular file) would land in the `index.db` row and diverge from
  what pnpm writes for tarball entries / on-disk imports.
- Adds `TarballError::ReadZipEntries(std::io::Error)` for zip
  per-entry I/O failures (try_reserve / read_to_end), distinct from
  `ReadTarballEntries`. The retry-classification path now emits
  `ERR_PACQUET_ZIP` rather than the misleading
  `ERR_PACQUET_TARBALL_TAR` for zip-side failures.
- Changes `ignore_file_pattern` from `Option<&'static IgnoreEntryFilter>`
  to `Option<Arc<IgnoreEntryFilter>>` on both `DownloadTarballToStore`
  and `DownloadZipArchiveToStore`. The Slice D dispatcher can now
  construct one filter per fetch from runtime config (the per-package
  `archiveFilters` map upstream uses) without leaking the closure or
  pinning it to `'static`. Cloning the Arc per retry attempt is cheap;
  the inner trait object is shared.
Copilot AI review requested due to automatic review settings May 13, 2026 18:52
@zkochan zkochan force-pushed the feat/437-slice-c branch from eb854a6 to 089c26c Compare May 13, 2026 18:52
`extract_zip_entries` ran the `is_dir()` early-continue *before*
`enclosed_name()` validation, so an archive carrying a directory
entry like `../evil/` was silently skipped instead of surfacing
`TarballError::PathTraversal`. Pacquet wouldn't have written that
directory either way (only file entries take the CAS write path),
but the contract is "reject any unsafe entry" — for tooling that
inspects the error code.

Move the path validation above the `is_dir()` check. Adds a unit
test `extract_zip_rejects_directory_entry_with_parent_component`
that catches the regression (verified by temporarily reverting the
guard order).

Caught by CodeRabbit on #472.
`[crate::BinaryResolution::prefix]` resolved against `pacquet_tarball`
where `BinaryResolution` doesn't live — it's `pacquet_lockfile::BinaryResolution`.
Switched both references to text + fully-qualified path so rustdoc
under `-D warnings` (the CI Doc job) stops failing.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Comment thread crates/tarball/src/lib.rs
Comment thread crates/tarball/src/lib.rs
Comment thread crates/tarball/src/lib.rs
Comment thread crates/tarball/src/lib.rs Outdated
zkochan added 2 commits May 13, 2026 21:22
…nostic OOM error

Address Copilot review on #472:

- `extract_zip_entries` builds the canonical `cas_paths` /
  `pkg_files_idx` key from `enclosed_name()` rather than the raw
  `entry.name()`. `enclosed_name()` collapses `.` segments and is
  already path-traversal-checked, so the map key, the ignore-filter
  input, and the basename strip all see the same string.
  Cross-platform: rebuild from `Normal` components joined with `/`
  so a Windows host can't smuggle `\` separators into the key.
- `TarballError::TarballTooLarge` display text changed from
  "Tarball at {url} ..." to "Archive at {url} ..." so the zip
  pipeline (which reuses `allocate_tarball_buffer`) doesn't surface
  a misleading "Tarball ..." message on OOM / oversize. Diagnostic
  code stays `pacquet_tarball::tarball_too_large` for variant
  identity.
- `run_with_mem_cache` doc-blocks the "stable filter per URL"
  invariant: the mem cache keys on `package_url` alone (matching
  pnpm's `tarballCache`), so callers must keep the (URL, filter)
  relation functional. Tarball URLs encode (name, version,
  integrity) and `archiveFilters` is keyed by `pkg.name` upstream,
  so this naturally holds; the Slice D dispatcher follows the same
  table.
- New unit tests:
  - `extract_tarball_applies_ignore_filter_dropping_entries_from_both_maps`
    covers the tarball-side filter path (the existing zip tests
    only exercised it through `extract_zip_entries`).
  - `extract_zip_normalizes_dot_segments_in_entry_paths` pins the
    `.`-segment collapse. Verified by temporarily falling back to
    `raw_name`.
`extract_tarball_entries` built the cleaned entry path through
`PathBuf::push` + `to_string_lossy()`, which uses the platform's
native separator. On Windows that produces keys like `bin\tool`,
diverging from:

- pnpm's string-based path layer (always `/`), so a shared
  `index.db` would have non-byte-identical keys depending on which
  tool wrote the row.
- `pacquet_store_dir::add_files_from_dir` (already uses
  `format!("{relative_dir}/{name}")`), the existing tarball-fetcher
  tests that look up `cas_paths["package.json"]` etc., and the
  zip-side path normalization landed earlier in this slice.
- Any `ignore_file_pattern` regex / hand-coded matcher, all of
  which expect `/` separators.

Replaced the `PathBuf::push` accumulator with a `Vec<String>` of
per-component `to_string_lossy()` results joined by `/`. The
validation loop (every component must be `Component::Normal`) is
unchanged, and on Unix the cleaned string is the same as before.

The Windows test failure that surfaced this was
`extract_tarball_applies_ignore_filter_dropping_entries_from_both_maps`
(added this slice) — the new test was the first to exercise a
multi-component tarball path on Windows.

Caught by Windows CI on #472.
Copilot AI review requested due to automatic review settings May 13, 2026 19:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.

Comment thread crates/tarball/src/lib.rs
Comment thread crates/tarball/src/lib.rs Outdated
Address Copilot review on #472:

- `fetch_and_extract_zip_once` was hitting `client.get(package_url)`
  without resolving `AuthHeaders::for_url`, so a runtime archive
  hosted behind a token-protected proxy would 401 even when
  credentials were configured. Threaded `auth_headers:
  &AuthHeaders` through `DownloadZipArchiveToStore`,
  `fetch_and_extract_zip_with_retry`, and `fetch_and_extract_zip_once`,
  and attach the `authorization` header the same way the tarball
  path does.
- `TarballError::ReadZipEntries` only wrapped `std::io::Error`,
  so the rendered error said "Failed to read zip entry: <io_err>"
  with no URL or entry path. The accompanying doc claims pacquet
  surfaces the entry path that triggered the failure (matching
  `ReadZipArchive` and `PathTraversal`), so promote the variant
  to a struct shape carrying `url` + `entry_path` + `source` and
  attach the cleaned path at both call sites (the `try_reserve`
  for the entry's payload and the `read_to_end`).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants