This repository was archived by the owner on May 14, 2026. It is now read-only.
fix(network): set explicit timeouts on default reqwest client; enable CI tracing#267
Merged
Conversation
`ThrottledClient::new_from_cpu_count` handed back `Client::new()`, which has no connect / request / pool-idle timeouts at all. The existing docstring on the sibling `from_client` already called this out as a footgun — that note stays relevant for tests, but the production path can't afford "wait forever". On CI this is how `integrated-benchmark` hangs at "Benchmark 1: pacquet@HEAD" until the 10-min GHA step cap kicks in: the bench's single-process verdaccio occasionally stalls (GC pause, uplink hiccup, TCP loss without RST) and pacquet sits on the half-open socket indefinitely, emitting nothing even under `--show-output` (see #263 investigation notes). Set explicit defaults: * `connect_timeout(10s)` — generous for localhost, snaps on resource-exhausted runners. * `timeout(60s)` — per-request deadline. Typical npm tarball is under 5 MB, so 60 s on localhost is comfortable; a real hang turns into `TarballError::FetchTarball` and hyperfine reports the non-zero exit instead of sitting silent. * `pool_idle_timeout(30s)` — evict conns verdaccio closed but we didn't notice. Retries on transient failures are tracked separately in #259. Also enable pacquet tracing on the CI bench step via `TRACE=pacquet=info`, so the next time one of these stalls makes it through despite the new timeouts the step log shows which phase pacquet was in (download, extraction, import, index write) before it went silent. `--show-output` was already in place; without a tracing filter it only ever captured an empty stream. Refs #263.
`[`new_from_cpu_count`]` doesn't resolve from inside the `impl` block because rustdoc needs a path; `Self::new_from_cpu_count` does. The Doc CI job runs with `RUSTDOCFLAGS=-D warnings` so the broken link was a hard error. No behavioural change.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent silent CI hangs during the Integrated-Benchmark by adding explicit timeouts to pacquet’s default reqwest client and by enabling pacquet tracing in the benchmark workflow so future stalls produce actionable logs.
Changes:
- Build the default
ThrottledClientwith explicit connect/request/pool-idle timeouts instead ofClient::new(). - Update
from_clientdocs to reflect thatnew_from_cpu_countnow sets defaults andfrom_clientis the override hook (especially for tests). - Set
TRACE=pacquet=infofor the integrated benchmark step to emit tracing output during CI runs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/network/src/lib.rs | Switches default HTTP client construction to Client::builder() with explicit timeouts and updates related docs. |
| .github/workflows/integrated-benchmark.yml | Enables pacquet tracing for the benchmark step via TRACE env var to improve CI diagnosability. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #267 +/- ##
==========================================
- Coverage 89.75% 89.65% -0.10%
==========================================
Files 58 58
Lines 4527 4533 +6
==========================================
+ Hits 4063 4064 +1
- Misses 464 469 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
60 seconds is too aggressive for a default that governs real registry traffic, not just the bench's localhost verdaccio. Large npm tarballs can legitimately take more than a minute over a slow network, and pacquet has no retry-with-backoff yet (#259) — a 60-second timeout would turn slow-but-progressing downloads into outright install failures. 5 minutes keeps the \"catch truly stuck sockets\" property (still well inside the bench's step budget and the CLI's reasonable-user patience) while comfortably accommodating slow real-world connections. Also reword the doc to call out that the value is the default for registry traffic too, not just the bench, and that making it user-configurable is the natural next step once retries exist. Addresses Copilot review on #267.
Micro-Benchmark ResultsLinux |
Integrated-Benchmark Report (Linux)Scenario: Frozen Lockfile
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 1.2547423268000002,
"stddev": 0.029335149948103264,
"median": 1.2513892324000002,
"user": 0.41937515999999997,
"system": 2.4228385999999995,
"min": 1.2228729024,
"max": 1.3014688984000002,
"times": [
1.2644831614,
1.3014688984000002,
1.2270366804000001,
1.2910592794,
1.2239778654000002,
1.2335402064,
1.2786038864000002,
1.2228729024,
1.2382953034000002,
1.2660850844
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pacquet@main",
"mean": 1.2659456904000002,
"stddev": 0.033120048720833045,
"median": 1.2573686989000001,
"user": 0.40887506,
"system": 2.4628332,
"min": 1.2149860354000002,
"max": 1.3175138554,
"times": [
1.2547471794,
1.3175138554,
1.2517915944,
1.2149860354000002,
1.2529949564,
1.2315173324000002,
1.2599902184000003,
1.3073690614000002,
1.2682369754,
1.3003096954000002
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
},
{
"command": "pnpm",
"mean": 2.6674933733,
"stddev": 0.03789681013172619,
"median": 2.6657566009,
"user": 3.3339536599999997,
"system": 2.2899702,
"min": 2.6180032334,
"max": 2.7461594114,
"times": [
2.6412264264,
2.6466241854,
2.6870982074,
2.6547795574,
2.6849891614,
2.6272391404,
2.7461594114,
2.6920807654,
2.6767336444,
2.6180032334
],
"exit_codes": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
]
}
]
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
CI runs of
Integrated-Benchmarkfrequently hang at:With
--show-outputalready enabled, the absence of any pacquet stderr afterBenchmark 1:means the install process genuinely went silent — no progress, no error, no panic. That rules out "bench is slow" and points at something blocking inside pacquet with no deadline on it.Root cause
crates/network/src/lib.rs:31hands backClient::new(), which has no connect/request/pool-idle timeouts. The siblingfrom_clientdocstring even called this out as a footgun:Every
http_client.get(tarball_url).send()incrates/tarball/src/lib.rs:310is therefore a potential forever-hang. The bench's verdaccio is a single Node process handling 16 concurrent requests from pacquet; on a GHA runner it's easy for the event loop to stall (GC, uplink hiccup, TCP packet loss without RST) for longer than our patience. Pacquet then sits on the half-open socket until the GHA job timeout, which is exactly the symptom.Pacquet has no retry on transient network failures yet (tracked separately in #259), so even a transient stall becomes a permanent hang in the current code.
Changes
ThrottledClient::new_from_cpu_countnow builds itsClientwith explicit defaults:connect_timeout(10s)— generous for localhost, snaps on resource-exhausted runners.timeout(60s)— per-request deadline. Typical npm tarball < 5 MB, so 60s on localhost has huge margin; a real hang turns intoTarballError::FetchTarballand hyperfine reports the non-zero exit instead of silent-hang-until-timeout.pool_idle_timeout(30s)— evicts connections that verdaccio closed but we didn't notice.from_clientdocstring updated — it's still the escape hatch for tests that want different (usually shorter) timeouts, not "the only way to get any timeouts at all".TRACE=pacquet=info. pacquet's tracing subscriber is gated on theTRACEenv var; without it,--show-outputonly ever captured an empty stream. With it, the next stall will show which phase the stuck install was in ("Download completed", "New cache", "Reusing cached CAFS entry") before going silent.What this doesn't fix
This converts hangs into fast failures rather than preventing the stall itself. If verdaccio genuinely starts dropping requests, pacquet will now surface that as a
FetchTarballerror → hyperfine exits non-zero → the step fails visibly, but the underlying resource-exhaustion or verdaccio-instability problem still needs addressing. Options once we have the first post-fix log:ThrottledClientpermit count).Test plan
cargo check --workspace --all-targets.cargo test -p pacquet-network -p pacquet-tarball -p pacquet-package-manager— no test regressions (tests that usefast_fail_clientwith sub-second timeouts still work; they set their ownClient::builder).TarballError::FetchTarballplus a tracing trail showing the last phase before the stall. Either way, actionable rather than silent.