perf(tarball): run parallel CAS writes on a dedicated rayon pool#12248
Conversation
The per-file CAS-write parallelism added in #12247 ran on rayon's global pool. But the install pipeline overlaps tarball extraction with linking each resolved package into `node_modules`, and the linker drives its per-package work through `rayon::join` / `par_iter` on that same global pool. When a batch of downloads finished at once (hundreds of tarballs entering extraction together), the extraction work queued ahead of the linker's jobs and stalled linking for seconds. Aligning the download/extract trace with the `imported` progress events on a ~1300-package fresh install showed the linker dropping to zero completions for ~1s right as an extraction surge landed, then grinding the rest out afterward — extraction had gotten faster, but it stuttered the concurrent linker, so the net win on the pipeline was lost. Route the parallel CAS writes through a dedicated rayon pool (sized to the core count; the work is CPU-bound SHA-512 + CAFS write) so an extraction burst can't monopolize the global pool the linker uses. The two phases now run concurrently without one starving the other: on the same fixture the linker no longer stalls (continuous completions through the extraction window) and the big-package extraction tail stays parallelized. Falls back to the global pool if the dedicated pool can't be built. --- Written by an agent (Claude Code, claude-opus-4-8).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🧰 Additional context used📓 Path-based instructions (1)pacquet/**/*.rs📄 CodeRabbit inference engine (pacquet/AGENTS.md)
Files:
🧠 Learnings (5)📓 Common learnings📚 Learning: 2026-05-25T14:58:11.105ZApplied to files:
📚 Learning: 2026-05-20T19:40:55.051ZApplied to files:
📚 Learning: 2026-05-22T00:08:44.646ZApplied to files:
📚 Learning: 2026-05-20T23:07:58.444ZApplied to files:
🔇 Additional comments (2)
📝 WalkthroughWalkthroughThis PR adds a dedicated Rayon thread pool for CAS-write operations during tarball extraction. When the pending file count exceeds a threshold, writes execute on the dedicated pool; otherwise, small batches run serially. Pool creation failures gracefully fall back to existing behavior. ChangesDedicated CAS-write thread pool for tarball extraction
Sequence Diagram(No diagram needed for this change—the logic flow is simple and self-contained within a single function.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Micro-Benchmark ResultsLinux |
Review Summary by QodoIsolate tarball CAS writes to dedicated rayon pool
WalkthroughsDescription• Isolate CAS-write parallelism to dedicated rayon pool • Prevents extraction bursts from starving linker on global pool • Improves concurrent pipeline performance during install • Falls back gracefully if dedicated pool creation fails Diagramflowchart LR
A["Tarball Extraction"] -->|CAS writes| B["Dedicated Pool"]
B -->|CPU-bound work| C["SHA-512 + CAFS write"]
D["Package Linker"] -->|per-package work| E["Global Pool"]
C -.->|no contention| E
File Changes1. pacquet/crates/tarball/src/lib.rs
|
Integrated-Benchmark Report (Linux)Each scenario has pacquet rows (direct install) and pnpr rows (the same client through the pnpr install accelerator), so pnpr@HEAD vs pacquet@HEAD is the pnpr-vs-direct ratio. Cold-store scenarios wipe the client store between runs (warm server); hot-store scenarios keep it warm. The pacquet@HEAD rows feed the pacquet Bencher testbed; the pnpr@HEAD rows feed the pnpr testbed. Scenario: Isolated linker: fresh restore, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 10.010998551799998,
"stddev": 0.1481741587551117,
"median": 9.943700258499998,
"user": 3.6415678,
"system": 4.2730365799999985,
"min": 9.9156281455,
"max": 10.3964415155,
"times": [
10.1200656095,
9.941981845499999,
10.0172028455,
9.945418671499999,
9.940264384499999,
9.9334557685,
9.9351929365,
9.9643337955,
10.3964415155,
9.9156281455
]
},
{
"command": "pacquet@main",
"mean": 10.116023478499999,
"stddev": 0.16565828733731636,
"median": 10.0548988205,
"user": 3.6641863999999997,
"system": 4.31429418,
"min": 9.9790666715,
"max": 10.524537707499999,
"times": [
9.9790666715,
10.524537707499999,
10.035888182499999,
10.0326563465,
10.2610263905,
10.0018176475,
10.0739094585,
10.153635615499999,
10.0197867245,
10.077910040499999
]
},
{
"command": "pnpr@HEAD",
"mean": 5.079941864100001,
"stddev": 0.045950783414495176,
"median": 5.070211239500001,
"user": 2.8467627,
"system": 3.8858840800000003,
"min": 5.0307381035,
"max": 5.1800158675,
"times": [
5.0382895925,
5.0558838255000005,
5.1375636345,
5.1800158675,
5.072305210500001,
5.0681172685,
5.0307381035,
5.0824884815,
5.0529671855,
5.0810494715
]
},
{
"command": "pnpr@main",
"mean": 5.174951543400001,
"stddev": 0.08494575066894672,
"median": 5.146631060500001,
"user": 2.8768315,
"system": 3.93716988,
"min": 5.1066102685,
"max": 5.3719003455,
"times": [
5.1538591105,
5.1439517835,
5.2765226345,
5.184206731500001,
5.1345223155,
5.1066102685,
5.1181304115,
5.1493103375,
5.3719003455,
5.1105014955
]
}
]
}Scenario: Isolated linker: fresh restore, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.6751936207800001,
"stddev": 0.030065317416254456,
"median": 0.6674702190800001,
"user": 0.36508996,
"system": 1.32183664,
"min": 0.6577981955800001,
"max": 0.7591978075800001,
"times": [
0.7591978075800001,
0.67317357058,
0.6689526105800001,
0.6657430695800001,
0.67228710558,
0.6577981955800001,
0.6588049625800001,
0.6589178185800001,
0.6659878275800001,
0.6710732395800001
]
},
{
"command": "pacquet@main",
"mean": 0.69696154008,
"stddev": 0.015316989658648009,
"median": 0.6962639395800001,
"user": 0.37917646,
"system": 1.3360063400000002,
"min": 0.67128961758,
"max": 0.7243295005800001,
"times": [
0.7143976535800001,
0.6782702435800001,
0.69220863458,
0.7243295005800001,
0.70155570158,
0.6961345935800001,
0.69639328558,
0.67128961758,
0.69597967758,
0.69905649258
]
},
{
"command": "pnpr@HEAD",
"mean": 0.81981227978,
"stddev": 0.10392973796117223,
"median": 0.78248091408,
"user": 0.39651006000000005,
"system": 1.33988594,
"min": 0.76692927958,
"max": 1.10929334158,
"times": [
0.8400093655800001,
0.7835602325800001,
0.77727588158,
0.76692927958,
1.10929334158,
0.8075006425800001,
0.78202173358,
0.78294009458,
0.77539973758,
0.7731924885800001
]
},
{
"command": "pnpr@main",
"mean": 0.81790815378,
"stddev": 0.08559372757680923,
"median": 0.78442232858,
"user": 0.39047136,
"system": 1.3384131400000001,
"min": 0.76408355258,
"max": 1.0307366245799998,
"times": [
0.90537900658,
1.0307366245799998,
0.7856570755800001,
0.7840149715800001,
0.76408355258,
0.8122181475800001,
0.7848296855800001,
0.7784758795800001,
0.76868660058,
0.76499999358
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 5.37445151214,
"stddev": 0.049126776742596294,
"median": 5.39200366784,
"user": 3.8321015000000003,
"system": 3.28458396,
"min": 5.28219729634,
"max": 5.43675273634,
"times": [
5.28219729634,
5.41231844934,
5.40118171934,
5.43675273634,
5.3236248353399995,
5.32531298134,
5.36643891734,
5.41268085034,
5.39650274334,
5.38750459234
]
},
{
"command": "pacquet@main",
"mean": 5.4182296896399995,
"stddev": 0.05360788291160449,
"median": 5.41600819934,
"user": 3.8778683000000003,
"system": 3.34451916,
"min": 5.33400619134,
"max": 5.50172752834,
"times": [
5.33400619134,
5.50070352634,
5.4044781533399995,
5.38365102034,
5.37874210834,
5.43236534534,
5.50172752834,
5.4379621263399995,
5.42753824534,
5.38112265134
]
},
{
"command": "pnpr@HEAD",
"mean": 1.99874442574,
"stddev": 0.02968313808334842,
"median": 2.00023253984,
"user": 2.6341473,
"system": 3.1626518599999995,
"min": 1.96274379534,
"max": 2.0569664303399997,
"times": [
2.01862548234,
2.0101737473399997,
2.01670157134,
1.97441909234,
1.96485635534,
1.9794157733400002,
2.01325067734,
2.0569664303399997,
1.9902913323400002,
1.96274379534
]
},
{
"command": "pnpr@main",
"mean": 2.02294484104,
"stddev": 0.020230621638806925,
"median": 2.02198711334,
"user": 2.6870373,
"system": 3.2713739599999996,
"min": 1.98885453634,
"max": 2.0520475223399997,
"times": [
2.0252330433399997,
2.0196573563399998,
1.98885453634,
2.01234211834,
2.05191929234,
2.0520475223399997,
2.02431687034,
2.0136968503399997,
2.03813797234,
2.0032428483399998
]
}
]
}Scenario: Isolated linker: fresh install, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 1.4439011771,
"stddev": 0.019959774908014545,
"median": 1.4440132891,
"user": 1.57225428,
"system": 1.7813421400000002,
"min": 1.4149171051,
"max": 1.4884554731000001,
"times": [
1.4268267931,
1.4507420321000002,
1.4511244151,
1.4479200051,
1.4357730391,
1.4301187171,
1.4149171051,
1.4884554731000001,
1.4530276181000001,
1.4401065731
]
},
{
"command": "pacquet@main",
"mean": 1.4725261376000003,
"stddev": 0.021160226831649045,
"median": 1.4678062896,
"user": 1.5804691800000001,
"system": 1.8215798399999996,
"min": 1.4546816851,
"max": 1.5239479721,
"times": [
1.4871367361,
1.4552896921,
1.5239479721,
1.4553874691000002,
1.4818059191000001,
1.4695403191,
1.4616385591,
1.4546816851,
1.4697607641000001,
1.4660722601
]
},
{
"command": "pnpr@HEAD",
"mean": 0.673489305,
"stddev": 0.03178121765781224,
"median": 0.6625335376,
"user": 0.3232257799999999,
"system": 1.26366944,
"min": 0.6547989090999999,
"max": 0.7625212731,
"times": [
0.6693393151,
0.6581831681,
0.6547989090999999,
0.7625212731,
0.6630288351,
0.6617710411,
0.6668957761,
0.6746662661,
0.6616502261,
0.6620382401
]
},
{
"command": "pnpr@main",
"mean": 0.6563374208,
"stddev": 0.007448383359137579,
"median": 0.6571438231,
"user": 0.31936138000000003,
"system": 1.25111124,
"min": 0.6450081851,
"max": 0.6695899541,
"times": [
0.6450081851,
0.6484303901,
0.6576026771,
0.6485384031,
0.6625730081,
0.6695899541,
0.6603231141,
0.6547050231,
0.6566849691,
0.6599184841
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + hot storeResolution-only: cold packument cache (full re-resolve over the registry link) with a hot store (no tarball download), so this isolates pnpr offloading the client resolution to its warm server.
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 5.05599541454,
"stddev": 0.04340728444597576,
"median": 5.04051559344,
"user": 1.7090591399999997,
"system": 1.95851306,
"min": 5.02475625144,
"max": 5.17455418644,
"times": [
5.062395529440001,
5.02475625144,
5.03651120244,
5.06039393044,
5.17455418644,
5.04451998444,
5.03254392844,
5.03646766444,
5.05198066344,
5.035830804440001
]
},
{
"command": "pacquet@main",
"mean": 5.01019596164,
"stddev": 0.02089344345133252,
"median": 5.01239145744,
"user": 1.6955970399999998,
"system": 1.9470569599999998,
"min": 4.97931634744,
"max": 5.04773231144,
"times": [
5.04773231144,
5.01281057944,
4.99166816444,
5.01655851044,
5.03446882244,
5.01880644344,
5.01197233544,
4.97931634744,
4.99278060444,
4.99584549744
]
},
{
"command": "pnpr@HEAD",
"mean": 0.6478643085400001,
"stddev": 0.007928669086402353,
"median": 0.64523806044,
"user": 0.32489914000000003,
"system": 1.2348576599999999,
"min": 0.64076980944,
"max": 0.66348540744,
"times": [
0.66348540744,
0.64550863844,
0.64591157444,
0.64496748244,
0.64344670244,
0.64433552544,
0.64245501844,
0.64076980944,
0.64613455244,
0.66162837444
]
},
{
"command": "pnpr@main",
"mean": 0.65244192644,
"stddev": 0.011565644880580555,
"median": 0.65045053444,
"user": 0.31418273999999996,
"system": 1.2571509599999997,
"min": 0.6351119494399999,
"max": 0.67584896844,
"times": [
0.6444408914399999,
0.64635296044,
0.6351119494399999,
0.64554171244,
0.65177428644,
0.65045496744,
0.67584896844,
0.66412249644,
0.65044610144,
0.66032493044
]
}
]
} |
Follow-up to #12247.
Problem
The per-file CAS-write parallelism added in #12247 ran on rayon's global pool. But the install pipeline overlaps tarball extraction with linking each resolved package into
node_modules, and the linker drives its per-package work throughrayon::join/par_iteron that same global pool. When a batch of downloads finishes at once (hundreds of tarballs entering extraction together), the extraction work queues ahead of the linker's jobs and stalls linking.Aligning the download/extract trace with the
importedprogress events on a ~1300-package fresh install, the linker dropped to zero completions for ~1s right as an extraction surge landed, then ground out the rest afterward:Extraction had gotten faster, but it stuttered the concurrent linker — so the pipeline win was lost.
Fix
Route the parallel CAS writes through a dedicated rayon pool (sized to the core count; the work is CPU-bound SHA-512 + CAFS write) so an extraction burst can't monopolize the global pool the linker uses. The two phases now run concurrently without starving each other — on the same fixture the linker no longer stalls (continuous completions through the extraction window, cumulative links by sec 6: 592 vs 537), the big-package extraction tail stays parallelized, and total nudged 17s→16s. Falls back to the global pool if the dedicated pool can't be built.
(Note: absolute time on macOS is dominated by an APFS metadata-op floor in the link phase that's identical with and without this change; the benefit of removing the stall shows most where linking parallelizes, i.e. Linux.)
Tests
All 54
pacquet-tarballtests pass; clippy clean. Output is unchanged — this only moves which thread pool the writes run on.Written by an agent (Claude Code, claude-opus-4-8).
Summary by CodeRabbit