perf(pacquet): prefetch packuments during fresh-resolve tree walk#11903
perf(pacquet): prefetch packuments during fresh-resolve tree walk#11903zkochan wants to merge 2 commits into
Conversation
…ve tree walk Detach packument fetches from the cooperative `try_join_all` task by firing `tokio::spawn`ed `resolver.resolve()` calls just before each level recurses: - Top-level prefetch in `extend_tree` for the importer's direct deps (and each hoist iteration's freshly-picked peers). - Lookahead prefetch inside `resolve_node` as soon as the parent's manifest is known, so the grandchildren's packuments are already in-flight by the time the recursion enters them. The npm resolver's per-`(registry, name)` packument fetch semaphore coalesces the prefetch with the later real `resolve()` call, so the recursion hits the in-memory packument and per-version manifest caches instead of a fresh network round-trip plus a fresh JSON parse + serde re-roundtrip per call site. Errors from the prefetch are dropped — the real `resolve()` later in `resolve_node` surfaces the same error through the normal error path. To make the spawn possible the resolver chain is now threaded as `Arc<dyn Resolver>` (instead of `&Chain` generic) through `resolve_dependency_tree` / `resolve_importer` / `extend_tree`, and stored on `TreeCtx` along with an `Arc<ResolveOptions>` so the detached tasks just clone refcounts instead of cloning `PreferredVersions` and `PackageVersionPolicy` per spawn. Mirrors upstream pnpm's `resolveDependencies` BFS-batched fan-out at `installing/deps-resolver/src/resolveDependencies.ts#L690-L697` where every sibling's `pickPackage` runs concurrently via `Promise.all` before the deferred `postponedResolution` queue descends. Closes #11900. --- Written by an agent (Claude Code, claude-opus-4-7).
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
🧰 Additional context used📓 Path-based instructions (1)pacquet/**/*.rs📄 CodeRabbit inference engine (pacquet/AGENTS.md)
Files:
🧠 Learnings (3)📚 Learning: 2026-05-20T19:40:55.051ZApplied to files:
📚 Learning: 2026-05-22T00:08:44.646ZApplied to files:
📚 Learning: 2026-05-20T23:07:58.444ZApplied to files:
🔇 Additional comments (1)
📝 WalkthroughWalkthroughThis PR refactors resolver ownership across the resolver and importer paths to use Arc, stores resolver and ResolveOptions as Arc in TreeCtx, invokes prefetch_packuments to prefetch child packuments before recursion, and updates call sites and tests to the Arc-based API. ChangesResolver Arc ownership and prefetch integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Micro-Benchmark ResultsLinux |
`pacquet-resolving-deps-resolver` doesn't depend on `pacquet-resolving-npm-resolver`, so the intra-doc link to `shared_packument_fetch_locker` resolves to nothing and trips `-D rustdoc::broken_intra_doc_links`. Reference the helper by name in backticks instead.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11903 +/- ##
==========================================
- Coverage 87.84% 87.84% -0.01%
==========================================
Files 206 206
Lines 24525 24533 +8
==========================================
+ Hits 21545 21550 +5
- Misses 2980 2983 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Integrated-Benchmark Report (Linux)Scenario: Isolated linker: fresh restore, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.49483338376,
"stddev": 0.11573594554268046,
"median": 2.50313421026,
"user": 2.73696174,
"system": 3.6343801,
"min": 2.3587478392600003,
"max": 2.7240554942600004,
"times": [
2.50511265126,
2.62325076026,
2.36775714426,
2.50115576926,
2.51918680826,
2.7240554942600004,
2.3587478392600003,
2.5142014932600003,
2.37409513426,
2.4607707432600003
]
},
{
"command": "pacquet@main",
"mean": 2.3993250014600003,
"stddev": 0.07287850767173795,
"median": 2.40912214526,
"user": 2.7042120400000003,
"system": 3.5976482999999995,
"min": 2.2981187482600003,
"max": 2.47631817626,
"times": [
2.47631817626,
2.47183195826,
2.4712638252600003,
2.2981187482600003,
2.39876058726,
2.46824691726,
2.35973895926,
2.32386414826,
2.3056229912600004,
2.41948370326
]
},
{
"command": "pnpm",
"mean": 4.88006519976,
"stddev": 0.029717276643322556,
"median": 4.890332494759999,
"user": 8.316276140000001,
"system": 4.2179574,
"min": 4.82868725526,
"max": 4.92377747426,
"times": [
4.92377747426,
4.90054831626,
4.843837069259999,
4.88392616026,
4.82868725526,
4.897169572259999,
4.85237266426,
4.896738829259999,
4.87624837026,
4.8973462862599995
]
}
]
}Scenario: Isolated linker: fresh restore, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.69311290418,
"stddev": 0.027608993851098988,
"median": 0.68686870038,
"user": 0.39370326,
"system": 1.4800386599999997,
"min": 0.67512800788,
"max": 0.76853018888,
"times": [
0.76853018888,
0.69159816888,
0.67871079188,
0.68698957888,
0.68706238288,
0.70070257388,
0.68674782188,
0.67784025288,
0.67512800788,
0.6778192738800001
]
},
{
"command": "pacquet@main",
"mean": 0.6753573055800001,
"stddev": 0.03750540057740294,
"median": 0.65961719438,
"user": 0.36563416000000004,
"system": 1.4640745599999998,
"min": 0.6387349598800001,
"max": 0.7390316408800001,
"times": [
0.7390316408800001,
0.6525406798800001,
0.65866022688,
0.7346974338800001,
0.66764365488,
0.65447563688,
0.6402649688800001,
0.6387349598800001,
0.70694969188,
0.6605741618800001
]
},
{
"command": "pnpm",
"mean": 2.56513433508,
"stddev": 0.06332603565450393,
"median": 2.53984702438,
"user": 3.2513525599999995,
"system": 2.2288590599999996,
"min": 2.50160932288,
"max": 2.67824857088,
"times": [
2.6440181058800003,
2.51442603088,
2.58983103688,
2.52581103388,
2.50176718788,
2.55388301488,
2.61616969988,
2.50160932288,
2.67824857088,
2.5255793468800003
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 6.46503683852,
"stddev": 0.1340345106490292,
"median": 6.421239805920001,
"user": 13.0965711,
"system": 3.7909842999999994,
"min": 6.31107717242,
"max": 6.70605679242,
"times": [
6.40243186142,
6.43278450742,
6.70605679242,
6.36635247242,
6.43868930142,
6.31107717242,
6.64426520642,
6.34852211042,
6.59049385642,
6.409695104420001
]
},
{
"command": "pacquet@main",
"mean": 4.84043628702,
"stddev": 0.1424980129389664,
"median": 4.76307670992,
"user": 6.550233399999999,
"system": 3.6315041,
"min": 4.68017984342,
"max": 5.098198286420001,
"times": [
4.74212069942,
4.68017984342,
5.060162822420001,
5.098198286420001,
4.75714755442,
4.89816202442,
4.756542764420001,
4.75232285742,
4.890520152420001,
4.7690058654200005
]
},
{
"command": "pnpm",
"mean": 6.46799614082,
"stddev": 0.049356668291705215,
"median": 6.48084137892,
"user": 10.808118200000001,
"system": 4.487744099999999,
"min": 6.37787152242,
"max": 6.532890749420001,
"times": [
6.461692244420001,
6.37787152242,
6.501571340420001,
6.49361436942,
6.456180570420001,
6.5038383494200005,
6.39061950442,
6.532890749420001,
6.47722014042,
6.48446261742
]
}
]
}Scenario: Isolated linker: fresh install, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 5.2659254644,
"stddev": 0.16019328125847948,
"median": 5.263438150600001,
"user": 10.83401066,
"system": 2.4446179199999998,
"min": 5.0446142066,
"max": 5.5761796186,
"times": [
5.0446142066,
5.454158014600001,
5.124632597600001,
5.306157965600001,
5.307611344600001,
5.202063342600001,
5.281986180600001,
5.2448901206,
5.1169612526,
5.5761796186
]
},
{
"command": "pacquet@main",
"mean": 3.7704669025,
"stddev": 0.07841988062261988,
"median": 3.7541864146,
"user": 4.15980176,
"system": 2.19442382,
"min": 3.6719549315999997,
"max": 3.9214461926,
"times": [
3.8286262236,
3.6719549315999997,
3.7280666256,
3.6766699606,
3.8418951466,
3.8030178806,
3.7480256206,
3.9214461926,
3.7246192346,
3.7603472086
]
},
{
"command": "pnpm",
"mean": 4.416248005600001,
"stddev": 0.08117547840113731,
"median": 4.419385741100001,
"user": 5.57652716,
"system": 2.60115772,
"min": 4.3007384466000005,
"max": 4.5471464696,
"times": [
4.4477229596,
4.5137429196,
4.391750404600001,
4.3548616746,
4.3007384466000005,
4.5471464696,
4.4470210776000005,
4.4672589436,
4.3291105956,
4.3631265646
]
}
]
} |
|
Closing this. The integrated benchmark proves the approach is wrong on the targeted path:
The hot/hot regression is the smoking gun: with everything an in-memory cache hit there is no I/O to parallelize, so the prefetch just doubles every Worth noting that the issue's premise (pacquet 4–12× slower than pnpm on fresh-resolve, sourced from Leaving the commits in for the record in case a future attempt wants to look at what doesn't work and why. The two changes were:
Both compile and pass tests, but the runtime cost-benefit is upside down. Closing without merge. Written by an agent (Claude Code, claude-opus-4-7). |
…ndexed metadata mirror (#12322) ## Why Profiling a warm babylon resolve (metadata mirrors hot, ~520 MB across 1,476 packuments) showed the dominant cost was not I/O but **hydration**: every version of every packument was deserialized into typed manifests — maps, strings, and `serde_json::Value` trees built, hashed, and dropped — even though a pick consults only the version *strings* plus the handful of manifests it actually considers. typescript ships 3,800 versions; a pick needs one. The `#[serde(flatten)]` catch-all on `PackageVersion` compounded it by routing the whole struct through serde's buffering deserializer. The same hydration cost was paid on cold resolves inside the `spawn_blocking` parses from #12318. ## What Three changes, applied in profile-driven order: **1. Lazy packument hydration** (`df6f70eb57`). `Package::versions` becomes a `PackageVersions` map whose entries hold the raw JSON fragment serde captured (`Arc<RawValue>`, shared not copied) and hydrate into `Arc<PackageVersion>` on first access, cached per slot. Key scans never hydrate; `pinned_version` hydrates only the winner; the publish-date filter moves slots without touching manifests; undecodable fragments degrade to "version absent" (matching the JS implementation's tolerance); serialization re-emits raw fragments verbatim. Picked manifests travel as `Arc<PackageVersion>`. Enables serde_json's `raw_value` feature (already a workspace dep). **2. Sharded in-memory packument cache** (`4c28c6679e`). Every resolve edge consults the shared meta cache, and its single `Mutex<HashMap>` was the top mutex-wait site in the profile; it is now the same `DashMap` shape the crate's other shared maps use. Honest note: wall time was unchanged by this alone — the post-hydration profile shows the warm resolve is critical-path-bound, not lock-bound — but the contention disappears and the cache no longer serializes workers under load. **3. Indexed on-disk metadata mirror** (`126a416ae8`, maintainer-approved cache-format break). The two-line NDJSON mirror (headers + verbatim body) becomes: ``` pacquet-meta-v1 <headers_len> <index_len>\n <headers JSON> etag, modified <index JSON> name, dist-tags, time, homepage, versions: [version, offset, len] <fragments> concatenated raw per-version JSON ``` The loader reads the file once and hands `PackageVersions` byte spans into that buffer — no structural re-scan, hydration parses a slice in place. The writer streams the registry's own bytes (fragments borrow from the lazily-parsed response body), so the **cold-install cost stays one temp-file + rename per package**. Old-format files read as cache misses and are rewritten on the next 200. A span-per-fragment `pread` variant was tried first and measured *worse* (sys 0.4 s → 4.5 s; the pick paths probe many candidate fragments per package), hence the single-buffer design. ## Measurements (warm babylon `--lockfile-only` resolve, 10-core M-series) | build | wall | notes | |---|---|---| | before this PR | ~9.1 s | malloc/free + `deserialize_any` dominate the profile | | + lazy hydration | ~7.7–8 s | parse microbench 3–3.8× (`typescript` 36.1→9.5 ms) | | + indexed mirror | **~5.5–6.7 s** | sys 0.4 s; no whole-body scan | Cold resolves keep the same hydration savings inside the parse tasks; cold write volume and syscall pattern are unchanged. ## Evaluated and deliberately not included Consolidating the install-phase thread pools (tokio + global rayon + the dedicated CAS-write pool + capped blocking threads show ~100k involuntary context switches on cold installs vs ~750 for the pnpm CLI). After the resolution fixes, repeated container A/Bs show no measurable wall-clock cost from the churn — it hides entirely behind network time — and history warns that speculative concurrency reshuffles here regress badly (see the #11903 prefetch revert). Deferred until a benchmark shows it on the critical path. ## Tests New `package_versions` unit tests (hydrate-on-demand + Arc-identity caching, undecodable-fragment-as-absent, verbatim raw round-trip incl. unknown keys, eager construction, hydration-free filtering) and rewritten `mirror` tests (headers/index/fragment round-trip, span hydration, truncation → cache miss, old-format → cache miss, atomic overwrite). Full suites green: `pacquet-registry` (23), `pacquet-resolving-npm-resolver` (235), `pacquet-package-manager` + `pacquet-cli` (768); workspace clippy `-D warnings` (pedantic set), dylint, fmt.
Summary
Closes #11900.
Detaches packument fetches from the deps-resolver's cooperative
try_join_alltask by firingtokio::spawnedresolver.resolve()calls just before each level of the tree walk descends. Two prefetch points, both modeled on pnpm'sresolveDependenciesBFS-batched fan-out:extend_tree. Direct deps from the importer (and each hoist iteration's freshly-picked peers) start fetching before the cooperativetry_join_allwalks them.resolve_node. As soon as a parent's manifest is known, packument fetches for every child name fire on background tasks so the recursiveresolve_nodecalls belowawaita cache hit instead of a blocking fetch.The npm resolver's per-
(registry, name)packument fetch semaphore (shared_packument_fetch_locker) coalesces the prefetch with the later realresolve()call, so the recursion hits the in-memory packument cache and the per-version manifest cache instead of paying for a fresh network round-trip plus a fresh JSON parse + serde re-roundtrip per call site.Detached spawns matter because the futures inside
try_join_allcooperate on the current task: a CPU-heavy step in one resolve (JSON parse, semver pick, manifest serde) blocks the others until it yields. The prefetch tasks run on the multi-thread executor, so I/O and any duplicate-work-after-cache-hit can land on whichever worker is free.Plumbing changes
To make
tokio::spawnpossible the resolver chain is now threaded asArc<dyn Resolver>(instead of&Chaingeneric) throughresolve_dependency_tree/resolve_importer/extend_tree, and stored onTreeCtxalongside anArc<ResolveOptions>so the detached tasks bump refcounts instead of cloningPreferredVersionsandPackageVersionPolicyper spawn.Errors from the prefetch are silently dropped — the real
resolve()later inresolve_nodesurfaces the same error through the normal error path. The result is otherwise identical to the existing tree walk: every assertion in the 50 deps-resolver tests + 299 package-manager tests + 96 cli tests still passes.Test plan
cargo nextest run -p pacquet-resolving-deps-resolver— 50/50 passcargo nextest run -p pacquet-package-manager— 299/299 pass (3 slow, no new slowness)cargo nextest run -p pacquet-cli— 96/96 pass (2 slow, expected)cargo clippy --locked --workspace --all-targets -- --deny warnings— cleancargo fmt --check— cleanclean astrobenchmark to confirm the issue's expected impact ratio. The change matches the upstream prefetch shape the issue calls for; a benchmark run on the same fixture is the next step.Written by an agent (Claude Code, claude-opus-4-7).
Summary by CodeRabbit