fix(audit): prune path traversal#12087
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.{ts,tsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (1)📚 Learning: 2026-05-14T09:04:00.133ZApplied to files:
🔇 Additional comments (1)
📝 WalkthroughWalkthroughThis PR precomputes reachable vulnerabilities per dependency path and adds saturation checks into audit path enumeration so DFS traversal prunes non-vulnerable subtrees and stops exploring once per-finding path caps are reached; tests validate pruning, saturation, and cyclic reachability. ChangesAudit Path Traversal Performance
Sequence Diagram(s)sequenceDiagram
participant walkForPaths
participant createReachableVulnerabilitiesGetter
participant visit
participant allReachableVulnerabilitiesSaturated
walkForPaths->>createReachableVulnerabilitiesGetter: construct reachable-vulnerabilities getter
walkForPaths->>visit: begin DFS from entry edges
visit->>createReachableVulnerabilitiesGetter: query reachable keys for depPath
visit->>allReachableVulnerabilitiesSaturated: check if reachable findings are saturated
alt empty or saturated
visit->>visit: return early (prune subtree)
else
visit->>visit: record path for vulnerable edge
visit->>allReachableVulnerabilitiesSaturated: re-check saturation after record
alt saturated
visit->>visit: return early (stop exploring children)
else
visit->>visit: continue to children
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoOptimize pnpm audit performance via lockfile subtree pruning
WalkthroughsDescription• Optimize audit performance by pruning non-vulnerable lockfile subtrees during path traversal • Stop descending into dependency trees once all reachable vulnerabilities reach path cap • Add helper functions to track reachable vulnerabilities and detect saturation • Include comprehensive test coverage for pruning and saturation scenarios Diagramflowchart LR
A["Audit Path Traversal"] --> B["Check Reachable Vulnerabilities"]
B --> C["Prune Non-Vulnerable Subtrees"]
C --> D["Check Saturation"]
D --> E["Skip Traversal if Saturated"]
E --> F["Improved Performance"]
File Changes1. deps/compliance/audit/src/lockfileToAuditIndex.ts
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deps/compliance/audit/src/lockfileToAuditIndex.ts`:
- Around line 246-271: The memoization currently caches an incomplete result
because get() returns new Set() for back-edges and only writes memo.set(...)
after recursion; change the algorithm to create and memoize an empty placeholder
set for edge.depPath immediately when starting get (e.g., const result = new
Set<string>(); memo.set(edge.depPath, result);), and on encountering a back-edge
(inTrail.has(edge.depPath)) return the already-memoized placeholder
(memo.get(edge.depPath)) instead of new Set(); then proceed to fill that
placeholder during recursion, remove from inTrail when done, and leave the
placeholder in memo for future calls; update the get function (and related
symbols nameVerFromPkgSnapshot, vulnerabilityKey, addAll) accordingly and add a
regression test that builds a cyclic lockfile graph to assert reachability is
preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 880ccf00-3ff0-48e4-91cf-1f48b412f4e9
📒 Files selected for processing (3)
.changeset/audit-path-traversal-performance.mddeps/compliance/audit/src/lockfileToAuditIndex.tsdeps/compliance/audit/test/index.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Agent
- GitHub Check: Compile & Lint
- GitHub Check: Analyze (javascript)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{ts,tsx}: Follow Standard Style with trailing commas, preferring functions over classes, and declaring functions after they are used (relying on hoisting)
Use a single options object instead of multiple parameters when a function needs more than two or three arguments
Follow Import Order: Standard libraries first, then external dependencies (alphabetically), then relative imports
Write self-documenting code where function names, parameters, and types explain what a function does without requiring prose comments
Do not write comments that restate what the code already says; refactor via renaming, splitting helpers, or restructuring instead
Do not repeat documentation at call sites that already exists in JSDoc on the callee; update JSDoc once for all call sites to benefit
Use JSDoc only for a function's contract (preconditions, postconditions, edge cases, why the function exists), not for re-narrating the body
Do not record past implementation shape, refactor history, or 'the previous code did X' framing in code; use git log and git blame instead
Write comments only when: the reason for code is non-obvious (hidden invariant, workaround for known bug, deliberate exception), or the right name doesn't fit (temporary technical constraint)
Files:
deps/compliance/audit/test/index.tsdeps/compliance/audit/src/lockfileToAuditIndex.ts
🧠 Learnings (2)
📚 Learning: 2026-05-26T21:01:06.666Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11966
File: .changeset/require-tarball-integrity.md:6-6
Timestamp: 2026-05-26T21:01:06.666Z
Learning: In pnpm lockfile-related release notes/docs (especially changeset markdown), preserve URL hostnames exactly as they appear in pnpm-lock.yaml tarball resolution entries—keep hosts like `codeload.github.com`, `bitbucket.org`, and `gitlab.com` in lowercase. Do not “correct” them to title-case/preserve brand capitalization (e.g., LanguageTool rules like `GITHUB` capitalization) because these are literal URL fragments, not platform brand names.
Applied to files:
.changeset/audit-path-traversal-performance.md
📚 Learning: 2026-05-14T09:04:00.133Z
Learnt from: zkochan
Repo: pnpm/pnpm PR: 11622
File: resolving/npm-resolver/test/publishedBy.test.ts:350-354
Timestamp: 2026-05-14T09:04:00.133Z
Learning: In the pnpm/pnpm repository, ESLint is the authoritative style linter. Do not raise review findings for missing trailing commas in multiline function calls (e.g., `fs.writeFileSync(...)`) when this repo’s ESLint configuration does not report them and lint passes. Prefer deferring to the ESLint results for this specific trailing-comma rule rather than enforcing it manually in code review.
Applied to files:
deps/compliance/audit/test/index.tsdeps/compliance/audit/src/lockfileToAuditIndex.ts
🔇 Additional comments (2)
.changeset/audit-path-traversal-performance.md (1)
1-6: LGTM!deps/compliance/audit/test/index.ts (1)
5-5: LGTM!Also applies to: 87-147
There was a problem hiding this comment.
Pull request overview
This PR optimizes pnpm audit’s client-side path enumeration by pruning lockfile subtrees that cannot lead to vulnerable packages and by stopping traversal once all reachable vulnerable findings have reached the per-finding path cap, addressing the severe CPU-bound slowdown reported in #12086.
Changes:
- Add a memoized “reachable vulnerabilities” precomputation to prune non-vulnerable subtrees during
buildAuditPathIndex()traversal. - Short-circuit traversal when all reachable vulnerable findings are saturated at
MAX_PATHS_PER_FINDING. - Add targeted tests covering subtree pruning and saturated-finding short-circuiting; add a changeset for patch releases.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| deps/compliance/audit/src/lockfileToAuditIndex.ts | Adds reachable-vulnerability pruning + saturation-based traversal short-circuit for audit path building. |
| deps/compliance/audit/test/index.ts | Adds tests to assert pruning behavior and saturated-finding traversal short-circuiting. |
| .changeset/audit-path-traversal-performance.md | Declares patch bumps for @pnpm/deps.compliance.audit and pnpm with a performance note. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e reachability The reachable-vulnerabilities getter returned a non-memoized empty Set for back-edges, causing incomplete results for nodes in dependency cycles. Memoize the result Set immediately so the same mutable placeholder is returned for back-edges and filled as recursion unwinds.
204fbe4 to
733dae6
Compare
The placeholder-before-recursion approach only made the SCC entry node's reachable set correct; non-entry cycle members were memoized with an under-approximated set, dropping valid audit paths reached through them. Cache a node's reachable vulnerabilities only when no descendant back-edges to an ancestor; recompute cycle-touching nodes per query.
The reachable-vulnerability pruning added in 11.5.1 (pnpm#12087) only memoized acyclic subtrees, so any node whose subtree touched a dependency cycle, and all of its ancestors, was recomputed on every query. That made the audit path walk O(N^2) on lockfiles that contain cycles, which is the common case. Reachability is now computed once per node with Tarjan's strongly-connected-components algorithm. Every member of a cycle shares one reachable set, so cyclic graphs are handled in linear time. Closes pnpm#12212 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reachable-vulnerability pruning added in 11.5.1 (pnpm#12087) only memoized acyclic subtrees, so any node whose subtree touched a dependency cycle, and all of its ancestors, was recomputed on every query. That made the audit path walk O(N^2) on lockfiles that contain cycles, which is the common case. Reachability is now computed once per node with Tarjan's strongly-connected-components algorithm. Every member of a cycle shares one reachable set, so cyclic graphs are handled in linear time. Closes pnpm#12212 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`pnpm audit` enumerates the install paths to every vulnerable package. The reachability-based pruning added in 11.5.1 (#12087) lets the walker skip subtrees that reach no unsaturated finding by precomputing, per node, the set of vulnerabilities reachable from it. That getter only memoised acyclic subtrees: a node whose subtree contained a cycle was `complete === false`, and so was every ancestor up to the importer roots. None of them were cached, so their reachable set was recomputed on every query. Real dependency graphs commonly contain cycles, and a single cycle high in the graph makes a large fraction of nodes non-memoisable, yielding an O(N^2) walk. This matched the report in #12212 exactly (CPU-bound, identical audit output across versions). Reachability is now computed with Tarjan's strongly-connected-components algorithm. Every node is scanned once; all members of an SCC reach the same set of vulnerabilities and share one set, finalised in reverse-topological order. Cyclic graphs are handled in O(N + E). The reachable set is used only to prune, so it must never under-approximate (that would hide a real finding). Tarjan yields the exact set for every node, so no finding can be dropped, and the path-recording logic is unchanged. The getter returns ReadonlySet<string> so the shared sets cannot be mutated by callers, and a missing memo entry (an impossible-by-construction state) throws rather than silently returning an empty set. A regression test asserts the read-count growth ratio between two cycle sizes (L=200 and L=400) is sub-quadratic: the fix scales ~2x (linear), the previous code ~4x (quadratic). Asserting the ratio cancels the per-node constant, so the test is not brittle to constant-factor changes. Closes #12212. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Zoltan Kochan <z@kochan.io>
Summary
Benchmark
In our large monorepo (
/Users/aqeelat/fathom/frontend/fathom-frontend),pnpm auditwent from 4+ minutes to less than 2 seconds with this change:Validation
pnpm --filter @pnpm/deps.compliance.audit testgit diff --checkFixes #12086
Written by an agent (OpenCode, gpt-5.5).
Summary by CodeRabbit