Skip to content

perf: optimize how the integrities of files in the CAFS are stored#10504

Merged
zkochan merged 15 commits intomainfrom
integrity-change
Jan 24, 2026
Merged

perf: optimize how the integrities of files in the CAFS are stored#10504
zkochan merged 15 commits intomainfrom
integrity-change

Conversation

@zkochan
Copy link
Copy Markdown
Member

@zkochan zkochan commented Jan 23, 2026

We currently store the base64 integrity checksum of every file of the package in the index file. We also store the algo in each even though the algo is the same for ever file.

In this PR I do 2 optimizations:

  1. the algo is only stored once per index file
  2. the integrity checksum is stored in HEX format, which makes the index file larger but removes the need to convert the hash in most cases. The file paths are also in hex format, so we only needed base64 for verifying the content of the file. However, in most cases we skip verification of the file if we see that the file was not modified since creation (we check file attributes)

before:

{
  "files": {
    "index.js": {
      "integrity": "sha512-V9oc/lDp3+F2qOJ+Xp2kZu2Tzl8asNXpQVy+QZV9Mw11+YGad0XBuqwEsPsLMZt40jOibpiTtR/rjNU4C3wEtw==",
      "checkedAt": 1769155915890,
      "mode": 420,
      "size": 3214
    }
  }
}

after:

{
  "algo": "sha512",
  "files": {
    "index.js": {
      "digest": "f310afae50bb5b74e5c17c5eb6fe426538b9deccd88664fbb66a5717fb6d36d86d4d1f530bb63b58914f9894e81da490e2e39bb99c8e01174e258358b9349b5c",
      // The file's location in the store is:
      // f3/10afae50bb5b74e5c17c5eb6fe426538b9deccd88664fbb66a5717fb6d36d86d4d1f530bb63b58914f9894e81da490e2e39bb99c8e01174e258358b9349b5c
      "checkedAt": 1769155915890,
      "mode": 420,
      "size": 3214
    }
  }
}

I am not sure there is a difference in performance.

Before:

Benchmark 1: node ~/src/pnpm/pnpm2/pnpm/dist/pnpm.mjs i --ignore-scripts --offline
  Time (mean ± σ):     72.017 s ±  1.790 s    [User: 8.415 s, System: 41.856 s]
  Range (min … max):   69.791 s … 74.961 s    10 runs

After:

Benchmark 1: node ~/src/pnpm/pnpm3/pnpm/dist/pnpm.mjs i --ignore-scripts --offline
  Time (mean ± σ):     69.996 s ±  2.690 s    [User: 8.266 s, System: 41.843 s]
  Range (min … max):   65.967 s … 74.928 s    10 runs

@zkochan zkochan marked this pull request as ready for review January 23, 2026 19:42
Copilot AI review requested due to automatic review settings January 23, 2026 19:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates pnpm’s CAFS package index format to avoid storing full SRI strings per file by storing the hash algorithm once per index and storing per-file hex digests instead of <algo>-<base64> integrities.

Changes:

  • Replace per-file integrity strings with per-file digest (hex) and add an algo field to the package files index.
  • Update CAFS path lookups, integrity verification, and tooling (store status, find-hash, prune) to work with hex digests.
  • Update affected tests and dependent packages to the new index format.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
worker/src/start.ts Writes new index format (algo + hex digests) from worker when adding tarballs/dirs.
store/plugin-commands-store/src/storeStatus/index.ts Reconstructs SRI strings for dint checks using algo + stored hex digest.
store/plugin-commands-store-inspecting/src/findHash.ts Updates hash search to compare against stored hex digests.
store/package-store/src/storeController/prune.ts Tracks removed CAFS file hashes as hex digests (matching new stored format).
store/cafs/test/index.ts Adjusts CAFS tests to use hex digests for file path resolution.
store/cafs/src/writeBufferToCafs.ts Updates integrity verification to use hex digest + algorithm.
store/cafs/src/readManifestFromStore.ts Reads manifests via CAFS file paths derived from stored hex digests.
store/cafs/src/index.ts Updates CAFS public API type for getFilePathByModeInCafs (digest-based).
store/cafs/src/getFilePathInCafs.ts Removes integrity parsing for file paths; uses hex directly.
store/cafs/src/checkPkgFilesIntegrity.ts Verifies files via ssri.fromHex(digest, algo) and updates index schema.
store/cafs-types/src/index.ts Renames PackageFileInfo.integrity -> digest and updates CAFS types.
reviewing/license-scanner/src/getPkgInfo.ts Reads license/manifest files using digest-based CAFS paths.
pkg-manager/core/test/install/sideEffects.ts Updates side-effects tests to compare digests and resolve CAFS paths via digest.
pkg-manager/core/test/install/patch.ts Updates patch tests to compare digests instead of integrity strings.
modules-mounter/daemon/src/createFuseHandlers.ts Reads file contents from CAFS via digest-based paths.
exec/plugin-commands-rebuild/test/index.ts Updates rebuild tests’ side-effects entries to use digest.
.changeset/bright-digests-store.md Declares major bumps and documents the new index format.
Comments suppressed due to low confidence (1)

worker/src/start.ts:274

  • In the side-effects update path, existingFilesIndex is read from disk and then written back via writeIndexFile() without ensuring it has the new required algo field. If the existing index file predates this change (or is otherwise missing algo), the rewritten file will still be missing it and later readers (e.g. integrity checks) may crash. Consider setting existingFilesIndex.algo ??= algo (or defaulting to 'sha512') before writing.
  const { algo, filesIntegrity, filesMap } = processFilesIndex(filesIndex)
  let requiresBuild: boolean
  if (sideEffectsCacheKey) {
    let existingFilesIndex!: PackageFilesIndex
    try {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (1)

worker/src/start.ts:277

  • When sideEffectsCacheKey is set, existingFilesIndex may come from an older store format (per-file "integrity" and no top-level "algo"). This path reads and re-writes that index without migrating it, which can leave the index in a legacy/mixed shape and break later readers expecting digest/algo. Consider normalizing the loaded index before further use (populate algo and convert file entries to { digest }).
    let existingFilesIndex!: PackageFilesIndex
    try {
      existingFilesIndex = readMsgpackFileSync<PackageFilesIndex>(filesIndexFile)
    } catch {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +291 to +297
// Ensure side effects use the same algorithm as the original package
if (existingFilesIndex.algo !== HASH_ALGORITHM) {
throw new PnpmError(
'ALGO_MISMATCH',
`Algorithm mismatch: package index uses "${existingFilesIndex.algo}" but side effects were computed with "${HASH_ALGORITHM}"`
)
}
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side-effects algorithm check throws ALGO_MISMATCH even when existingFilesIndex.algo is missing/undefined (e.g. corrupted index file), producing a confusing message (uses "undefined"). Consider handling the missing-algo case separately (e.g. a dedicated error code like MISSING_ALGO or treating it as an invalid index file that should be regenerated).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zkochan zkochan merged commit e2e0a32 into main Jan 24, 2026
15 checks passed
@zkochan zkochan deleted the integrity-change branch January 24, 2026 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants