perf(pacquet): parse packuments off the reactor and release the network permit before CPU work#12318
Conversation
…lease the network permit first A cold resolve parsed every packument body inline on the tokio worker that buffered it — while still holding its network-concurrency permit — and then wrote the multi-megabyte metadata mirror to disk synchronously on the same worker. High-release-cadence packages (`@fluentui/*`, `@types/node`) ship packuments of several megabytes, so a burst of them pinned every reactor worker in serde for hundreds of milliseconds at a time, stalling the sockets those workers pump and parking concurrency permits on CPU work. On a cold babylon resolve (1,476 packuments, ~520 MB) the metadata phase ran at a third of pnpm's effective throughput and the tarball pipeline sat idle behind it (a 15-second dead zone in the download timeline while the `@fluentui` subtree parsed). Both metadata fetchers now drop the request guard as soon as the body is buffered — the semaphore goes back to bounding sockets, matching the buffer-then-release shape the tarball pipeline already uses — and deserialize + persist the mirror inside one `spawn_blocking`, keeping the reactor free to pump the remaining transfers. A 304 releases the guard before the mirror disk read for the same reason. Cold babylon full installs in a Linux container improved from a ~38 s median to ~32.7 s (pnpm: ~27 s), winning 5 of 6 interleaved A/B rounds; cold `--lockfile-only` resolves dropped from 44 s to 24 s in the back-to-back pair.
Code Review by Qodo
1. Socket permit undercount on 304
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🧰 Additional context used📓 Path-based instructions (1)pacquet/**/*.rs📄 CodeRabbit inference engine (pacquet/AGENTS.md)
Files:
🧠 Learnings (4)📚 Learning: 2026-05-20T19:40:55.051ZApplied to files:
📚 Learning: 2026-05-22T00:08:44.646ZApplied to files:
📚 Learning: 2026-05-20T23:07:58.444ZApplied to files:
📚 Learning: 2026-06-06T18:58:37.156ZApplied to files:
🔇 Additional comments (3)
📝 WalkthroughWalkthroughThe PR introduces a ChangesResource lifecycle and spawn_blocking JSON parsing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Summary by Qodoperf: parse packuments off the reactor and release network permit before CPU work WalkthroughsDescription• Moves packument deserialization and mirror persistence into spawn_blocking in both fetch_full_metadata and fetch_full_metadata_cached, preventing multi-MB serde parses from pinning tokio reactor workers and stalling concurrent socket I/O. • Drops the ThrottledClientGuard (network-concurrency semaphore permit) immediately after the response body is fully buffered — before any CPU work — so the semaphore bounds sockets rather than parse time, matching the existing tarball pipeline pattern. • On 304 responses, releases the guard before the mirror disk read for the same reason. • Adds a new FetchMetadataError::ParseTask variant to surface JoinError from the blocking task (panic or runtime shutdown). DiagramsequenceDiagram
participant R as Tokio Reactor
participant S as Semaphore
participant F as Fetcher
participant B as spawn_blocking
participant D as Disk Mirror
R->>S: acquire permit
S-->>F: ThrottledClientGuard
F->>R: GET /registry/pkg (stream body)
R-->>F: raw_body buffered
F->>S: drop(client) — release permit
Note over S,R: Semaphore free; reactor pumps next sockets
F->>B: spawn_blocking(serde_json::from_str)
B->>D: save_meta (mirror write)
B-->>F: Package
High-Level AssessmentThe following are alternative approaches to this PR: 1. block_in_place instead of spawn_blocking
2. Combine guard drop + parse into a single async step without spawn_blocking
Recommendation: The PR's approach — explicit File ChangesEnhancement (1)
Other (2)
|
Micro-Benchmark ResultsLinux |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12318 +/- ##
==========================================
+ Coverage 87.65% 87.71% +0.06%
==========================================
Files 288 289 +1
Lines 35475 35624 +149
==========================================
+ Hits 31095 31247 +152
+ Misses 4380 4377 -3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Integrated-Benchmark Report (Linux)Each scenario reports direct installs and pnpr installs. Bencher consumes pacquet@HEAD and pnpr@HEAD. Scenario: Isolated linker: fresh restore, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 10.017704292480001,
"stddev": 0.16067292411308276,
"median": 9.95407801108,
"user": 3.3073586999999995,
"system": 3.29330364,
"min": 9.85795805158,
"max": 10.38778808558,
"times": [
9.92033034858,
10.38778808558,
9.93648043258,
9.91833999058,
10.058451249580001,
9.93477742558,
9.992880553580001,
10.19836119758,
9.97167558958,
9.85795805158
]
},
{
"command": "pacquet@main",
"mean": 10.05747461448,
"stddev": 0.1299342290316299,
"median": 10.02964970108,
"user": 3.1967086,
"system": 3.2011163399999996,
"min": 9.97844568758,
"max": 10.41709596858,
"times": [
10.06891978858,
10.00074651658,
10.04224697958,
10.02596453858,
9.97886235658,
10.04058469658,
10.03333486358,
9.97844568758,
10.41709596858,
9.98854474858
]
},
{
"command": "pnpr@HEAD",
"mean": 5.3842999462800005,
"stddev": 0.13755358585405802,
"median": 5.35208286508,
"user": 2.5425676,
"system": 2.8733183399999995,
"min": 5.27676659358,
"max": 5.75598956258,
"times": [
5.40803522758,
5.31567597058,
5.36640672358,
5.3951475035800005,
5.2836461765800005,
5.27676659358,
5.36679216558,
5.75598956258,
5.33678053258,
5.33775900658
]
},
{
"command": "pnpr@main",
"mean": 5.40449842818,
"stddev": 0.133264656044673,
"median": 5.32742165358,
"user": 2.5429517,
"system": 2.8487267399999996,
"min": 5.2869473755800005,
"max": 5.63098520158,
"times": [
5.63098520158,
5.2869473755800005,
5.32624802258,
5.48295811758,
5.28955286058,
5.31052059858,
5.57216380758,
5.52359831258,
5.29341470058,
5.32859528458
]
}
]
}Scenario: Isolated linker: fresh restore, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.64843110006,
"stddev": 0.012855532797284895,
"median": 0.6489226985600001,
"user": 0.36625749999999996,
"system": 1.3115138799999997,
"min": 0.6305207220600001,
"max": 0.6724569330600001,
"times": [
0.6724569330600001,
0.6305207220600001,
0.64758253706,
0.65026286006,
0.6583546570600001,
0.6534967460600001,
0.6372756930600001,
0.6356680450600001,
0.65862360206,
0.6400692050600001
]
},
{
"command": "pacquet@main",
"mean": 0.6651677703600001,
"stddev": 0.03565056959686553,
"median": 0.64578285706,
"user": 0.36864449999999993,
"system": 1.31026848,
"min": 0.6373482340600001,
"max": 0.7383924080600001,
"times": [
0.6427776390600001,
0.69820455406,
0.7383924080600001,
0.6651904750600001,
0.6380965580600001,
0.64878807506,
0.7030181590600001,
0.6388038340600001,
0.6373482340600001,
0.6410577670600001
]
},
{
"command": "pnpr@HEAD",
"mean": 0.77909978656,
"stddev": 0.08147469586256419,
"median": 0.75620586106,
"user": 0.3888621,
"system": 1.32867268,
"min": 0.7358225020600001,
"max": 1.00953619306,
"times": [
0.7537868200600001,
0.7560261970600001,
0.75638552506,
0.75874824606,
0.7358225020600001,
0.7416027810600001,
1.00953619306,
0.7679393290600001,
0.75935763006,
0.7517926420600001
]
},
{
"command": "pnpr@main",
"mean": 0.7764442101600001,
"stddev": 0.04927642223606241,
"median": 0.7544804190600001,
"user": 0.39519039999999994,
"system": 1.31221738,
"min": 0.7435991270600001,
"max": 0.90547633806,
"times": [
0.75748533206,
0.7736609430600001,
0.80675769606,
0.7508382770600001,
0.7469096460600001,
0.75026090406,
0.90547633806,
0.7779783320600001,
0.7435991270600001,
0.75147550606
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 9.170304871119999,
"stddev": 0.023374401523629327,
"median": 9.16537366612,
"user": 3.70488122,
"system": 3.28502824,
"min": 9.131187692620001,
"max": 9.20745777562,
"times": [
9.20745777562,
9.16038210462,
9.17636382262,
9.16391298862,
9.18810480762,
9.131187692620001,
9.20049435162,
9.148288282620001,
9.16002254162,
9.16683434362
]
},
{
"command": "pacquet@main",
"mean": 9.27756225862,
"stddev": 0.04630413011140634,
"median": 9.27123243262,
"user": 3.6605168199999993,
"system": 3.2652195399999995,
"min": 9.214396754620001,
"max": 9.36318135562,
"times": [
9.232666011620001,
9.280934290620001,
9.273439917620001,
9.26902494762,
9.32562192662,
9.214396754620001,
9.36318135562,
9.31989133862,
9.24391286162,
9.25255318162
]
},
{
"command": "pnpr@HEAD",
"mean": 5.10497121992,
"stddev": 0.04733092717171625,
"median": 5.10418520762,
"user": 2.3939623199999995,
"system": 2.7781312400000004,
"min": 5.04190246662,
"max": 5.19754261562,
"times": [
5.09856333462,
5.080519929619999,
5.123783502619999,
5.1098070806199996,
5.04190246662,
5.19754261562,
5.08463188762,
5.042733141619999,
5.11939551262,
5.150832727619999
]
},
{
"command": "pnpr@main",
"mean": 5.104766965719998,
"stddev": 0.10986918684231943,
"median": 5.08045095162,
"user": 2.3825613199999998,
"system": 2.76368494,
"min": 5.02116636362,
"max": 5.3985165216199995,
"times": [
5.10698995962,
5.142216501619999,
5.073444246619999,
5.048519102619999,
5.03662251062,
5.0353945506199995,
5.08745765662,
5.3985165216199995,
5.09734224362,
5.02116636362
]
}
]
}Scenario: Isolated linker: fresh install, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 1.4121855616599999,
"stddev": 0.013933077579351548,
"median": 1.40885295176,
"user": 1.52979132,
"system": 1.78811608,
"min": 1.39488547326,
"max": 1.43274780226,
"times": [
1.4010979292599999,
1.39838453826,
1.4085775462599999,
1.40027107926,
1.39488547326,
1.42721844426,
1.40912835726,
1.43274780226,
1.4262752862599999,
1.42326916026
]
},
{
"command": "pacquet@main",
"mean": 1.40617268466,
"stddev": 0.011358453118770977,
"median": 1.40469467176,
"user": 1.5400773200000002,
"system": 1.78112528,
"min": 1.39380359126,
"max": 1.42392898326,
"times": [
1.39528005226,
1.42392898326,
1.39471976726,
1.41460778026,
1.42343435026,
1.40352645626,
1.3982765772599999,
1.4058628872599999,
1.39380359126,
1.40828640126
]
},
{
"command": "pnpr@HEAD",
"mean": 0.67072438896,
"stddev": 0.023739783761279734,
"median": 0.66514325176,
"user": 0.33862531999999995,
"system": 1.2619566799999997,
"min": 0.6503194572600001,
"max": 0.7323890592600001,
"times": [
0.7323890592600001,
0.6738196632600001,
0.6791604132600001,
0.66364444826,
0.6736317202600001,
0.65553247926,
0.6666420552600001,
0.6518109712600001,
0.6503194572600001,
0.6602936222600001
]
},
{
"command": "pnpr@main",
"mean": 0.6716872321599999,
"stddev": 0.038514660819486735,
"median": 0.66022310926,
"user": 0.33849142,
"system": 1.2521583799999998,
"min": 0.6429031582600001,
"max": 0.77645472226,
"times": [
0.68714502426,
0.6618315452600001,
0.6541664432600001,
0.6610052832600001,
0.65944093526,
0.6429031582600001,
0.77645472226,
0.65379351726,
0.6649766492600001,
0.6551550432600001
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 4.896212041420001,
"stddev": 0.03101217792385769,
"median": 4.90448953812,
"user": 1.8125289199999997,
"system": 1.9896081200000002,
"min": 4.844166609619999,
"max": 4.93746342362,
"times": [
4.91859012562,
4.844166609619999,
4.93746342362,
4.908537358619999,
4.90344073062,
4.89055172662,
4.851929290619999,
4.905538345619999,
4.92755537562,
4.87434742762
]
},
{
"command": "pacquet@main",
"mean": 5.20378988752,
"stddev": 0.030883993797042063,
"median": 5.19615913362,
"user": 1.7897567200000002,
"system": 2.1917720199999997,
"min": 5.15981348362,
"max": 5.2710631356199995,
"times": [
5.22936070862,
5.18823785862,
5.204080408619999,
5.183247402619999,
5.2710631356199995,
5.21587934062,
5.15981348362,
5.18707937762,
5.18641696062,
5.21272019862
]
},
{
"command": "pnpr@HEAD",
"mean": 0.68897308072,
"stddev": 0.018057391692706468,
"median": 0.68267650912,
"user": 0.34679301999999995,
"system": 1.2992290199999998,
"min": 0.6689919186200001,
"max": 0.7291547906200001,
"times": [
0.7291547906200001,
0.69656302962,
0.6949073896200001,
0.6808321726200001,
0.6779835136200001,
0.70469350362,
0.6831554656200001,
0.67125147062,
0.68219755262,
0.6689919186200001
]
},
{
"command": "pnpr@main",
"mean": 0.68795327802,
"stddev": 0.027047176533396746,
"median": 0.6865555696200001,
"user": 0.33963652,
"system": 1.27114192,
"min": 0.65693897162,
"max": 0.7568784596200001,
"times": [
0.7568784596200001,
0.6860325056200001,
0.65693897162,
0.67555786362,
0.68821351762,
0.67767559162,
0.6870786336200001,
0.6937233986200001,
0.69292370962,
0.66451012862
]
}
]
} |
|
| Branch | pr/12318 |
| Testbed | pacquet |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| isolated-linker.fresh-install.cold-cache.cold-store | 📈 view plot 🚷 view threshold | 9,170.30 ms(+3.52%)Baseline: 8,858.08 ms | 10,629.70 ms (86.27%) |
| isolated-linker.fresh-install.cold-cache.hot-store | 📈 view plot 🚷 view threshold | 4,896.21 ms(-2.48%)Baseline: 5,020.81 ms | 6,024.97 ms (81.27%) |
| isolated-linker.fresh-install.hot-cache.hot-store | 📈 view plot 🚷 view threshold | 1,412.19 ms(-0.82%)Baseline: 1,423.87 ms | 1,708.64 ms (82.65%) |
| isolated-linker.fresh-restore.cold-cache.cold-store | 📈 view plot 🚷 view threshold | 10,017.70 ms(-0.42%)Baseline: 10,059.54 ms | 12,071.45 ms (82.99%) |
| isolated-linker.fresh-restore.hot-cache.hot-store | 📈 view plot 🚷 view threshold | 648.43 ms(-0.06%)Baseline: 648.85 ms | 778.62 ms (83.28%) |
|
| Branch | pr/12318 |
| Testbed | pnpr |
⚠️ WARNING: No Threshold found!Without a Threshold, no Alerts will ever be generated.
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholdsflag.
Click to view all benchmark results
| Benchmark | Latency | milliseconds (ms) |
|---|---|---|
| isolated-linker.fresh-install.cold-cache.cold-store | 📈 view plot | 5,104.97 ms |
| isolated-linker.fresh-install.cold-cache.hot-store | 📈 view plot | 688.97 ms |
| isolated-linker.fresh-install.hot-cache.hot-store | 📈 view plot | 670.72 ms |
| isolated-linker.fresh-restore.cold-cache.cold-store | 📈 view plot | 5,384.30 ms |
| isolated-linker.fresh-restore.hot-cache.hot-store | 📈 view plot | 779.10 ms |
…ndexed metadata mirror (#12322) ## Why Profiling a warm babylon resolve (metadata mirrors hot, ~520 MB across 1,476 packuments) showed the dominant cost was not I/O but **hydration**: every version of every packument was deserialized into typed manifests — maps, strings, and `serde_json::Value` trees built, hashed, and dropped — even though a pick consults only the version *strings* plus the handful of manifests it actually considers. typescript ships 3,800 versions; a pick needs one. The `#[serde(flatten)]` catch-all on `PackageVersion` compounded it by routing the whole struct through serde's buffering deserializer. The same hydration cost was paid on cold resolves inside the `spawn_blocking` parses from #12318. ## What Three changes, applied in profile-driven order: **1. Lazy packument hydration** (`df6f70eb57`). `Package::versions` becomes a `PackageVersions` map whose entries hold the raw JSON fragment serde captured (`Arc<RawValue>`, shared not copied) and hydrate into `Arc<PackageVersion>` on first access, cached per slot. Key scans never hydrate; `pinned_version` hydrates only the winner; the publish-date filter moves slots without touching manifests; undecodable fragments degrade to "version absent" (matching the JS implementation's tolerance); serialization re-emits raw fragments verbatim. Picked manifests travel as `Arc<PackageVersion>`. Enables serde_json's `raw_value` feature (already a workspace dep). **2. Sharded in-memory packument cache** (`4c28c6679e`). Every resolve edge consults the shared meta cache, and its single `Mutex<HashMap>` was the top mutex-wait site in the profile; it is now the same `DashMap` shape the crate's other shared maps use. Honest note: wall time was unchanged by this alone — the post-hydration profile shows the warm resolve is critical-path-bound, not lock-bound — but the contention disappears and the cache no longer serializes workers under load. **3. Indexed on-disk metadata mirror** (`126a416ae8`, maintainer-approved cache-format break). The two-line NDJSON mirror (headers + verbatim body) becomes: ``` pacquet-meta-v1 <headers_len> <index_len>\n <headers JSON> etag, modified <index JSON> name, dist-tags, time, homepage, versions: [version, offset, len] <fragments> concatenated raw per-version JSON ``` The loader reads the file once and hands `PackageVersions` byte spans into that buffer — no structural re-scan, hydration parses a slice in place. The writer streams the registry's own bytes (fragments borrow from the lazily-parsed response body), so the **cold-install cost stays one temp-file + rename per package**. Old-format files read as cache misses and are rewritten on the next 200. A span-per-fragment `pread` variant was tried first and measured *worse* (sys 0.4 s → 4.5 s; the pick paths probe many candidate fragments per package), hence the single-buffer design. ## Measurements (warm babylon `--lockfile-only` resolve, 10-core M-series) | build | wall | notes | |---|---|---| | before this PR | ~9.1 s | malloc/free + `deserialize_any` dominate the profile | | + lazy hydration | ~7.7–8 s | parse microbench 3–3.8× (`typescript` 36.1→9.5 ms) | | + indexed mirror | **~5.5–6.7 s** | sys 0.4 s; no whole-body scan | Cold resolves keep the same hydration savings inside the parse tasks; cold write volume and syscall pattern are unchanged. ## Evaluated and deliberately not included Consolidating the install-phase thread pools (tokio + global rayon + the dedicated CAS-write pool + capped blocking threads show ~100k involuntary context switches on cold installs vs ~750 for the pnpm CLI). After the resolution fixes, repeated container A/Bs show no measurable wall-clock cost from the churn — it hides entirely behind network time — and history warns that speculative concurrency reshuffles here regress badly (see the #11903 prefetch revert). Deferred until a benchmark shows it on the critical path. ## Tests New `package_versions` unit tests (hydrate-on-demand + Arc-identity caching, undecodable-fragment-as-absent, verbatim raw round-trip incl. unknown keys, eager construction, hydration-free filtering) and rewritten `mirror` tests (headers/index/fragment round-trip, span hydration, truncation → cache miss, old-format → cache miss, atomic overwrite). Full suites green: `pacquet-registry` (23), `pacquet-resolving-npm-resolver` (235), `pacquet-package-manager` + `pacquet-cli` (768); workspace clippy `-D warnings` (pedantic set), dylint, fmt.
Why
Follow-up to #12315's investigation: the one variation where pacquet still trailed pnpm on benchmarks.vlt.sh was
cleanbabylon (34.3 s vs 30.1 s on the 2026-06-10 run).A Linux-side profile (strace + download/linkat timelines in a container) localized the gap to the resolution phase, not store-write/linking as initially suspected:
--lockfile-onlyresolves: pacquet 44 s vs pnpm 17.7 s, with pacquet at ~30% of one core — network-throughput-bound, not CPU (warm-metadata resolves: pacquet 9.8 s vs pnpm 18.7 s, i.e. pacquet's CPU side is already 2× faster).@fluentui/*subtree resolved.Root cause:
fetch_full_metadata_cachedparsed every packument body inline on the tokio worker that buffered it — while still holding its network-concurrency permit — and then wrote the multi-MB metadata mirror to disk synchronously on the same worker. High-release-cadence packages ship multi-megabyte packuments; a burst of them pinned every reactor worker in serde for hundreds of milliseconds at a time, stalling the sockets those workers pump and parking the concurrency semaphore's permits on CPU work. The tarball pipeline already avoids exactly this ("body fully buffered; release the network permit before the CPU-bound work" +spawn_blocking); the packument path never got the same treatment.What
In both metadata fetchers (
fetch_full_metadata_cached,fetch_full_metadata):spawn_blocking, keeping the reactor free to pump the remaining transfers;New
FetchMetadataError::ParseTaskvariant carries aJoinErrorfrom the blocking task.Validation
--lockfile-only44 s → 24 s in the back-to-back pair.ubuntu-24.04-armrunner,cleanvariation, pnpm and patched pacquet in the same hyperfine session (the runner's network was degraded during the run — both tools ~2× slower than the official numbers — which makes the relative result the meaningful one): babylon: pacquet 54.8 s vs pnpm 59.2 s — flipped from −14 % behind (official run) to ahead under identical conditions. astrocleanstayed within noise (pacquet already led it officially). All hyperfine runs exited 0.cargo nextest runforpacquet-resolving-npm-resolver(234),pacquet-package-manager+pacquet-cli(740) all pass; clippy-D warnings, dylint, fmt clean.Profile notes for future work (not addressed here)
The extract+link phase still shows heavy scheduler churn relative to pnpm (~100 k involuntary context switches vs ~750;
sched_yieldstorms from the rayon pools): tokio workers + the dedicated CAS-write pool + the global rayon pool + up to 2×cores blocking threads can leave ~5×cores runnable threads on the box. Wall-clock impact is currently hidden behind network time, but it's the next candidate if the gap reopens on faster runners.Written by an agent (Claude Code, claude-fable-5).
Summary by CodeRabbit
Bug Fixes
Performance