fix: reduce pack verification memory use on Node#2299
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughReplaces direct full-buffer SHA-1 hashing with a range-based, chunked hashing utility and wires a Node-specific implementation into the Node Rollup build via an alias; updates packfile integrity check to use the chunked hash over the payload range. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rollup.config.js (1)
50-50: Consider applying the alias toecmaConfigfor Node ES module users.The alias is only applied to
nodeConfig(CJS). Users consuming the ES module build (index.js) on Node.js won't benefit from the chunked hashing—they'll get the generic implementation that doesn't chunk.If Node ES module usage is common, consider also adding
plugins: [alias(nodeAliases)]toecmaConfig, or documenting that the memory optimization only applies to the CJS entry point.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rollup.config.js` at line 50, The rollup config currently applies alias(nodeAliases) only to nodeConfig, so the ecmaConfig build doesn't get the alias and ES module consumers won't get the chunked hashing optimization; update ecmaConfig to include plugins: [alias(nodeAliases)] (same as nodeConfig) so aliasing is applied to the ES module entry, and verify the alias import behavior for ecmaConfig; alternatively, if you prefer not to change builds, add a note in the documentation clarifying that the memory/chunking optimization only applies to nodeConfig (CJS).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@rollup.config.js`:
- Line 50: The rollup config currently applies alias(nodeAliases) only to
nodeConfig, so the ecmaConfig build doesn't get the alias and ES module
consumers won't get the chunked hashing optimization; update ecmaConfig to
include plugins: [alias(nodeAliases)] (same as nodeConfig) so aliasing is
applied to the ES module entry, and verify the alias import behavior for
ecmaConfig; alternatively, if you prefer not to change builds, add a note in the
documentation clarifying that the memory/chunking optimization only applies to
nodeConfig (CJS).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ebf0f4e8-3a6e-4ad0-b4c4-509c957c6c25
📒 Files selected for processing (4)
rollup.config.jssrc/storage/readObjectPacked.jssrc/utils/shasumRange.jssrc/utils/shasumRange.node.js
|
Can you explain exactly what is causing the memory increase and how this solves the issue? |
|
I'm not sure what happened, but the tests failed. |
The pack verification code does this: const payload = pack.subarray(0, -20)
await shasum(payload)Which keeps the whole pack in memory and hashes it. On large packs that is super expensive . This PR changes the hashing for node builds where we can use node's Behavior stays the same but is much more performant. |
Tbh not sure either, 401 in remotes looks unrelated to my changes and maybe an issue with the creds? |
8e30814 to
e805ed9
Compare
|
🎉 This PR is included in version 1.37.5 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Fixes #2298 by using chunked hashing in node envs.
This would be one idea on how to fix the memory regression for our usage, another idea would be to add a flag in methods like
statusMatrixto just 'trust' the repo and then not go down the pack integrity validation code path.The regression comes from the verification path added in
readObjectPacked():That means we load the pack into memory and then hash the entire thing. On very large packs this is much more expensive than it was before 1.37.2, both in runtime and peak RSS.
This PR keeps the same verification behavior, but changes the hashing implementation used for the pack check in Node builds: Node builds use crypto.createHash('sha1') and the payload is fed incrementally in 8 MiB chunks.
What do you think @jcubic ?
Summary by CodeRabbit
Performance Improvements
New Features