pacquet: registry metadata fetch issues a single request — port pnpm's retry policy

## Summary

`fetch_full_metadata` in `pacquet-resolving-npm-resolver` issues exactly one `reqwest::Client::send().await` and treats any error as fatal. Pnpm's TypeScript implementation routes every registry fetch through `make-fetch-happen`, which retries transient network errors (`fetchRetries`, default `2`, exponential backoff). The gap is a known follow-up — pacquet's own [`Config::fetch_retries` doc comment at `8695496`](https://github.com/pnpm/pnpm/blob/8695496/pacquet/crates/config/src/lib.rs#L555-L570) says:

> Today this only gates the `pacquet-tarball` download path; `crates/registry`'s metadata fetches still issue a single request. Threading the same retry policy through the registry client is a follow-up.

This issue tracks that follow-up.

## Where the gap is

The metadata fetcher at [`pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs:73-78` (`8695496`)](https://github.com/pnpm/pnpm/blob/8695496/pacquet/crates/resolving-npm-resolver/src/fetch_full_metadata.rs#L73-L78):

```rust
let response = request
    .send()
    .await
    .map_err(|error| FetchMetadataError::Network { url: url.clone(), error })?
    .error_for_status()
    .map_err(|error| FetchMetadataError::Network { url: url.clone(), error })?;
```

No wrapper, no retry, no backoff. The two `install_package_*` call sites that download tarballs *do* honor retries — they pass [`retry_opts_from_config(config)`](https://github.com/pnpm/pnpm/blob/8695496/pacquet/crates/package-manager/src/retry_config.rs#L8-L15) into the tarball path — so the infrastructure exists; it just isn't threaded through to the metadata client.

For comparison, pnpm's [`fetchFromRegistry.ts` at `2a9bd897bf`](https://github.com/pnpm/pnpm/blob/2a9bd897bf/network/fetch/src/fetchFromRegistry.ts) wraps every registry call (metadata and tarball) in `make-fetch-happen` with the same `fetchRetries`/`fetchRetryFactor`/`fetchRetryMintimeout`/`fetchRetryMaxtimeout` policy.

## Symptom

The user-visible failure is a `Failed to fetch metadata from <url>: error sending request for url (<url>)` error that aborts the entire install. The wrapped reqwest error covers transient TCP-level failures the `make-fetch-happen` equivalent would silently retry — chiefly **stale keep-alive sockets**, which pacquet's network crate documents as a known mode at [`pacquet/crates/network/src/lib.rs:130-140` (`8695496`)](https://github.com/pnpm/pnpm/blob/8695496/pacquet/crates/network/src/lib.rs#L130-L140):

> `pool_idle_timeout(4s)` matches agentkeepalive's default `freeSocketTimeout`. Most CDN / load-balancer edges in front of `registry.npmjs.org` close idle sockets after 5–15s without sending FIN that hyper notices; a pool TTL above that **lets pacquet reuse a half-dead socket and surface the next request as a generic "error sending request for url"**.

The 4 s pool TTL narrows the race window but doesn't close it; on the tarball download path the `RetryOpts` wrapper absorbs the same class of error transparently. The metadata path doesn't, so a single stale-socket reuse fails the install.

## Reproduction

```sh
# Build pacquet@main (8695496 or later)
cargo build --release --bin=pacquet

# Stand up @pnpm/registry-mock 6.0.0's verdaccio (or any verdaccio with
# `proxy: npmjs` against registry.npmjs.org)
node node_modules/.pnpm/node_modules/verdaccio/bin/verdaccio \
    --config registry-mock-config.yaml --listen 4873 &

# Set up the integrated-benchmark `alotta-files` fixture as a clean
# install (no lockfile)
cp pacquet/tasks/integrated-benchmark/src/fixtures/package.json .
cat > .npmrc <<EOF
registry=http://localhost:4873/
auto-install-peers=true
ignore-scripts=true
lockfile=false
EOF
cat > pnpm-workspace.yaml <<EOF
storeDir: ./store-dir
registry: http://localhost:4873/
autoInstallPeers: true
ignoreScripts: true
lockfile: false
EOF

# Run; flakes intermittently with `error sending request for url`
target/release/pacquet install
```

In a 16-run sample against a freshly-started verdaccio, 1 cold run failed with the symptom above; 15 subsequent warm-pool runs all succeeded with `rc=0`. The bug isn't deterministic — it's a TCP-pool race — but the per-request failure probability compounded across the ~hundreds of metadata fetches a clean install of `alotta-files` makes is high enough to flake CI scenarios that go through this path.

## Scenarios this affects

Scenarios that don't use a lockfile (clean install, full resolution from `package.json`) issue hundreds of metadata fetches. Frozen-lockfile scenarios fetch zero metadata — they consume the lockfile's pinned snapshots directly — so they don't expose the gap. This is why the existing CI runs `frozen-lockfile` and `frozen-lockfile-hot-cache` cleanly, while attempts to add `clean-install` / `full-resolution` to the per-PR integrated-benchmark workflow flake on `Benchmark 1: pacquet@HEAD`'s first hyperfine command.

## Suggested shape of the fix

1. Reuse [`RetryOpts`](https://github.com/pnpm/pnpm/blob/8695496/pacquet/crates/tarball/src/lib.rs) (or factor it up) so the metadata client gets the same `fetch_retries` / `fetch_retry_factor` / `fetch_retry_mintimeout` / `fetch_retry_maxtimeout` policy. Default = `2` retries, factor `10`, mintimeout `10000` ms, maxtimeout `60000` ms — matching pnpm.
2. Retries should be scoped to genuinely transient errors (network-level `reqwest::Error` whose source is a `hyper`/IO error, plus 5xx / 408 / 429). 4xx other than 408/429 should still be fatal so a misspelt pkg name fails fast.
3. Honor `Retry-After` on 429 / 503 if pnpm does (worth a quick check of `make-fetch-happen`'s policy).
4. Thread retries through both the full and the abbreviated metadata path (both currently bypass it).

## Out of scope

- The `pool_idle_timeout` value itself. Reducing it narrows the race but trades TCP-handshake overhead for robustness; the right fix is the retry layer that pnpm has. Tracking that here would conflate two changes.
- PR #11837's prefetch wiring. That PR resolves and downloads tarballs in parallel; it doesn't touch the metadata-fetch site. The two changes are orthogonal.

---
Written by an agent (Claude Code, claude-opus-4-7).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pacquet: registry metadata fetch issues a single request — port pnpm's retry policy #11841

Summary

Where the gap is

Symptom

Reproduction

Scenarios this affects

Suggested shape of the fix

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

pacquet: registry metadata fetch issues a single request — port pnpm's retry policy #11841

Description

Summary

Where the gap is

Symptom

Reproduction

Scenarios this affects

Suggested shape of the fix

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions