perf: optimize how the integrities of files in the CAFS are stored#10504
perf: optimize how the integrities of files in the CAFS are stored#10504
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates pnpm’s CAFS package index format to avoid storing full SRI strings per file by storing the hash algorithm once per index and storing per-file hex digests instead of <algo>-<base64> integrities.
Changes:
- Replace per-file
integritystrings with per-filedigest(hex) and add analgofield to the package files index. - Update CAFS path lookups, integrity verification, and tooling (
store status,find-hash, prune) to work with hex digests. - Update affected tests and dependent packages to the new index format.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/src/start.ts | Writes new index format (algo + hex digests) from worker when adding tarballs/dirs. |
| store/plugin-commands-store/src/storeStatus/index.ts | Reconstructs SRI strings for dint checks using algo + stored hex digest. |
| store/plugin-commands-store-inspecting/src/findHash.ts | Updates hash search to compare against stored hex digests. |
| store/package-store/src/storeController/prune.ts | Tracks removed CAFS file hashes as hex digests (matching new stored format). |
| store/cafs/test/index.ts | Adjusts CAFS tests to use hex digests for file path resolution. |
| store/cafs/src/writeBufferToCafs.ts | Updates integrity verification to use hex digest + algorithm. |
| store/cafs/src/readManifestFromStore.ts | Reads manifests via CAFS file paths derived from stored hex digests. |
| store/cafs/src/index.ts | Updates CAFS public API type for getFilePathByModeInCafs (digest-based). |
| store/cafs/src/getFilePathInCafs.ts | Removes integrity parsing for file paths; uses hex directly. |
| store/cafs/src/checkPkgFilesIntegrity.ts | Verifies files via ssri.fromHex(digest, algo) and updates index schema. |
| store/cafs-types/src/index.ts | Renames PackageFileInfo.integrity -> digest and updates CAFS types. |
| reviewing/license-scanner/src/getPkgInfo.ts | Reads license/manifest files using digest-based CAFS paths. |
| pkg-manager/core/test/install/sideEffects.ts | Updates side-effects tests to compare digests and resolve CAFS paths via digest. |
| pkg-manager/core/test/install/patch.ts | Updates patch tests to compare digests instead of integrity strings. |
| modules-mounter/daemon/src/createFuseHandlers.ts | Reads file contents from CAFS via digest-based paths. |
| exec/plugin-commands-rebuild/test/index.ts | Updates rebuild tests’ side-effects entries to use digest. |
| .changeset/bright-digests-store.md | Declares major bumps and documents the new index format. |
Comments suppressed due to low confidence (1)
worker/src/start.ts:274
- In the side-effects update path,
existingFilesIndexis read from disk and then written back viawriteIndexFile()without ensuring it has the new requiredalgofield. If the existing index file predates this change (or is otherwise missingalgo), the rewritten file will still be missing it and later readers (e.g. integrity checks) may crash. Consider settingexistingFilesIndex.algo ??= algo(or defaulting to'sha512') before writing.
const { algo, filesIntegrity, filesMap } = processFilesIndex(filesIndex)
let requiresBuild: boolean
if (sideEffectsCacheKey) {
let existingFilesIndex!: PackageFilesIndex
try {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)
worker/src/start.ts:277
- When sideEffectsCacheKey is set, existingFilesIndex may come from an older store format (per-file "integrity" and no top-level "algo"). This path reads and re-writes that index without migrating it, which can leave the index in a legacy/mixed shape and break later readers expecting digest/algo. Consider normalizing the loaded index before further use (populate algo and convert file entries to { digest }).
let existingFilesIndex!: PackageFilesIndex
try {
existingFilesIndex = readMsgpackFileSync<PackageFilesIndex>(filesIndexFile)
} catch {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Ensure side effects use the same algorithm as the original package | ||
| if (existingFilesIndex.algo !== HASH_ALGORITHM) { | ||
| throw new PnpmError( | ||
| 'ALGO_MISMATCH', | ||
| `Algorithm mismatch: package index uses "${existingFilesIndex.algo}" but side effects were computed with "${HASH_ALGORITHM}"` | ||
| ) | ||
| } |
There was a problem hiding this comment.
The side-effects algorithm check throws ALGO_MISMATCH even when existingFilesIndex.algo is missing/undefined (e.g. corrupted index file), producing a confusing message (uses "undefined"). Consider handling the missing-algo case separately (e.g. a dedicated error code like MISSING_ALGO or treating it as an invalid index file that should be regenerated).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
We currently store the base64 integrity checksum of every file of the package in the index file. We also store the algo in each even though the algo is the same for ever file.
In this PR I do 2 optimizations:
before:
{ "files": { "index.js": { "integrity": "sha512-V9oc/lDp3+F2qOJ+Xp2kZu2Tzl8asNXpQVy+QZV9Mw11+YGad0XBuqwEsPsLMZt40jOibpiTtR/rjNU4C3wEtw==", "checkedAt": 1769155915890, "mode": 420, "size": 3214 } } }after:
{ "algo": "sha512", "files": { "index.js": { "digest": "f310afae50bb5b74e5c17c5eb6fe426538b9deccd88664fbb66a5717fb6d36d86d4d1f530bb63b58914f9894e81da490e2e39bb99c8e01174e258358b9349b5c", // The file's location in the store is: // f3/10afae50bb5b74e5c17c5eb6fe426538b9deccd88664fbb66a5717fb6d36d86d4d1f530bb63b58914f9894e81da490e2e39bb99c8e01174e258358b9349b5c "checkedAt": 1769155915890, "mode": 420, "size": 3214 } } }I am not sure there is a difference in performance.
Before:
After: