Summary
Pacquet's resolve_node unconditionally allocates a fresh NodeId (u64 counter) for every occurrence of every package and inserts a per-occurrence DependenciesTreeNode. Upstream pnpm uses a branded NodeId = string | number and, for leaf packages (no children), reuses the package id as the NodeId so repeated leaves don't allocate independent occurrence nodes. See pnpm's resolveDependencies.ts:1580.
The cost is concrete: a leaf like is-array referenced by 100 parents currently produces 100 DependenciesTreeNode entries in pacquet vs 1 in pnpm. Every downstream tree-iteration consumer — peer resolution being the dominant one — sees the inflated count.
This is the second-biggest unimplemented resolver-side win after peekManifestFromStore.
Why this matters
resolve_importer is ~3.1 s of the ~5.03 s warm-cache wall on the alotta-files benchmark (after the wins on #11837). Leaf-occurrence inflation feeds directly into:
- Per-tree-node iteration in the peer-resolution pass (it walks every
DependenciesTreeNode and inspects its children and ancestors).
HashMap<NodeId, DependenciesTreeNode> allocations — every visit grows the map regardless of whether the leaf has any per-occurrence distinguishing data.
- Downstream peer-suffix construction, which pacquet keys by tree-node identity.
In monorepo-shaped graphs, the long tail of single-file leaves (is-array, inherits, safe-buffer, …) is the dominant repeat case. Each one referenced N times produces N redundant nodes.
What's available
DependenciesTreeNode has four fields:
resolved_package_id: String — same across all occurrences of the same package.
children: BTreeMap<String, NodeId> — empty for leaves by definition; same across all occurrences.
depth: i32 — can differ across occurrences.
installable: bool — today always true (the doc comment on the field says so).
Two of the four fields are constant per-package; one is constant today by virtue of pacquet's npm-shaped slice; only depth genuinely varies per occurrence.
What's NOT in scope (per-leaf data that does differ)
depth differs across occurrences. Pnpm gets away with collapsing leaves because depth on a leaf has no downstream consumers — no children means no walk, no recursion, no depth-driven ordering downstream of the leaf itself. The peer resolver's depth checks fire on intermediate nodes, not leaves.
- Each occurrence is recorded as a child of its parent via the parent's
children: BTreeMap<String, NodeId> entry — collapsing means many parents' children maps point at the same shared NodeId. That's the intended semantics, but it does require verifying pnpm's peer-resolution iteration uses the parent edges (not a flattened node list) so the visit count is driven by the unique node set.
Gating conditions
The leaf-collapse fast path is safe iff all of:
child_specs is empty after extract_children(&result) (genuine leaf — no recursion would happen anyway).
- The package has no peer dependencies that need per-occurrence resolution (pnpm reuses leaves only when there's nothing peer-dependent to track). Verify via the resolver's manifest:
peerDependencies absent or empty.
- No downstream consumer reads
DependenciesTreeNode.depth for leaf nodes in a way that requires the per-occurrence value. The current single consumer of depth is the peer-resolution pass; confirm it tolerates a fixed depth for shared leaves (or, equivalently, that pnpm passes a fixed depth here too).
When any gate misses, fall through to today's NodeId::next() allocation unchanged.
Implementation sketch
Two reasonable shapes:
Option A — make NodeId an enum
pub enum NodeId {
Counter(u64),
Leaf(Arc<str>), // packs the package id; cheap clone
}
Pros: matches pnpm's union directly; the discriminator is explicit; future workspace-link NodeId::Link(Arc<str>) slots in.
Cons: every HashMap<NodeId, _> and BTreeMap<_, NodeId> slot grows from 8 bytes to 24 (enum + Arc); minor copy cost per peer-resolution lookup.
Option B — keep NodeId(u64) and namespace
Hash the package id into the high half of the u64 and reserve the high bit (or top byte) to discriminate. The leaf NodeId table becomes a DashMap<String, NodeId> populated lazily.
Pros: 8-byte NodeId preserved; no widening of any downstream hash slot.
Cons: hash collisions are a real risk (FxHash + 63-bit space → birthday-style collision at ~2³² leaves; safe in practice, but worth a comment + a debug-mode collision check); discrimination logic lives in helpers, not the type.
Recommendation: Option A. The downstream maps are not deep enough for the 16-byte widening to matter, and the type explicitness will save reader bandwidth on every future port of pnpm's resolver internals (which use the union pervasively).
Wiring inside resolve_node
// After `extract_children(&result)` returns the child specs:
let node_id = if child_specs.is_empty() && !has_peer_dependencies(&result) {
// Leaf — reuse the package id as the NodeId. Multiple
// occurrences map to one tree node, matching pnpm's
// `nodeIdForLeafNode` shape.
NodeId::leaf(&id)
} else {
NodeId::next()
};
// Skip the `dependencies_tree.insert` when a leaf node is already
// present; the first writer's record wins.
let mut tree = lock_recoverable(&ctx.dependencies_tree);
tree.entry(node_id.clone()).or_insert_with(|| DependenciesTreeNode { ... });
The children: BTreeMap<String, NodeId> field on the parent already holds the per-edge alias → leaf NodeId mapping. Collapsing leaves means many parents' children maps point at the same NodeId; that's the whole point of the optimization.
Expected impact
Hard to predict precisely without measuring, but a back-of-envelope:
- ~1362 nodes in the
alotta-files benchmark today.
- Roughly half are leaves (most utility packages are).
- Median leaf has 3-5 distinct parent occurrences in a real lockfile.
So today's tree probably has ~700 redundant leaf nodes that would collapse into ~200 — a ~70% reduction in tree-node count for leaves, and a proportional drop in any per-tree-node iteration in the peer-resolution pass. Whether that translates to ms or hundreds of ms depends on how peer resolution iterates, which would need a profile run to characterize precisely.
References
Written by an agent (Claude Code, claude-opus-4-7).
Summary
Pacquet's
resolve_nodeunconditionally allocates a freshNodeId(u64counter) for every occurrence of every package and inserts a per-occurrenceDependenciesTreeNode. Upstream pnpm uses a brandedNodeId = string | numberand, for leaf packages (no children), reuses the package id as theNodeIdso repeated leaves don't allocate independent occurrence nodes. See pnpm'sresolveDependencies.ts:1580.The cost is concrete: a leaf like
is-arrayreferenced by 100 parents currently produces 100DependenciesTreeNodeentries in pacquet vs 1 in pnpm. Every downstream tree-iteration consumer — peer resolution being the dominant one — sees the inflated count.This is the second-biggest unimplemented resolver-side win after
peekManifestFromStore.Why this matters
resolve_importeris ~3.1 s of the ~5.03 s warm-cache wall on thealotta-filesbenchmark (after the wins on #11837). Leaf-occurrence inflation feeds directly into:DependenciesTreeNodeand inspects itschildrenand ancestors).HashMap<NodeId, DependenciesTreeNode>allocations — every visit grows the map regardless of whether the leaf has any per-occurrence distinguishing data.In monorepo-shaped graphs, the long tail of single-file leaves (
is-array,inherits,safe-buffer, …) is the dominant repeat case. Each one referenced N times produces N redundant nodes.What's available
DependenciesTreeNodehas four fields:resolved_package_id: String— same across all occurrences of the same package.children: BTreeMap<String, NodeId>— empty for leaves by definition; same across all occurrences.depth: i32— can differ across occurrences.installable: bool— today alwaystrue(the doc comment on the field says so).Two of the four fields are constant per-package; one is constant today by virtue of pacquet's npm-shaped slice; only
depthgenuinely varies per occurrence.What's NOT in scope (per-leaf data that does differ)
depthdiffers across occurrences. Pnpm gets away with collapsing leaves because depth on a leaf has no downstream consumers — no children means no walk, no recursion, no depth-driven ordering downstream of the leaf itself. The peer resolver's depth checks fire on intermediate nodes, not leaves.children: BTreeMap<String, NodeId>entry — collapsing means many parents'childrenmaps point at the same sharedNodeId. That's the intended semantics, but it does require verifying pnpm's peer-resolution iteration uses the parent edges (not a flattened node list) so the visit count is driven by the unique node set.Gating conditions
The leaf-collapse fast path is safe iff all of:
child_specsis empty afterextract_children(&result)(genuine leaf — no recursion would happen anyway).peerDependenciesabsent or empty.DependenciesTreeNode.depthfor leaf nodes in a way that requires the per-occurrence value. The current single consumer ofdepthis the peer-resolution pass; confirm it tolerates a fixed depth for shared leaves (or, equivalently, that pnpm passes a fixed depth here too).When any gate misses, fall through to today's
NodeId::next()allocation unchanged.Implementation sketch
Two reasonable shapes:
Option A — make
NodeIdan enumPros: matches pnpm's union directly; the discriminator is explicit; future workspace-link
NodeId::Link(Arc<str>)slots in.Cons: every
HashMap<NodeId, _>andBTreeMap<_, NodeId>slot grows from 8 bytes to 24 (enum + Arc); minor copy cost per peer-resolution lookup.Option B — keep
NodeId(u64)and namespaceHash the package id into the high half of the
u64and reserve the high bit (or top byte) to discriminate. The leaf NodeId table becomes aDashMap<String, NodeId>populated lazily.Pros: 8-byte
NodeIdpreserved; no widening of any downstream hash slot.Cons: hash collisions are a real risk (FxHash + 63-bit space → birthday-style collision at ~2³² leaves; safe in practice, but worth a comment + a debug-mode collision check); discrimination logic lives in helpers, not the type.
Recommendation: Option A. The downstream maps are not deep enough for the 16-byte widening to matter, and the type explicitness will save reader bandwidth on every future port of pnpm's resolver internals (which use the union pervasively).
Wiring inside
resolve_nodeThe
children: BTreeMap<String, NodeId>field on the parent already holds the per-edge alias → leaf NodeId mapping. Collapsing leaves means many parents'childrenmaps point at the sameNodeId; that's the whole point of the optimization.Expected impact
Hard to predict precisely without measuring, but a back-of-envelope:
alotta-filesbenchmark today.So today's tree probably has ~700 redundant leaf nodes that would collapse into ~200 — a ~70% reduction in tree-node count for leaves, and a proportional drop in any per-tree-node iteration in the peer-resolution pass. Whether that translates to ms or hundreds of ms depends on how peer resolution iterates, which would need a profile run to characterize precisely.
References
resolveDependencies.ts:1580.NodeIdshape:nextNodeId.ts.NodeId:node_id.rs:14.resolve_dependency_tree.rs:423.resolved_tree.rs:139-154.Written by an agent (Claude Code, claude-opus-4-7).