perf: cache hot-path work across install, resolver, and registry#453
perf: cache hot-path work across install, resolver, and registry#453jdx merged 15 commits intoendevco:mainfrom
Conversation
Greptile SummaryWide-ranging performance pass across install, resolver, registry, linker, manifest, and settings — adding shared Confidence Score: 5/5Safe to merge — all findings are P2 and do not affect correctness on any common path. All three inline comments are P2: a stale doc comment, a Windows-only cache that can cause redundant work (not incorrect results), and a note about acceptable concurrent-miss behavior. No P0 or P1 findings remain after verifying the previous thread items were all addressed in this revision. crates/aube-linker/src/lib.rs (CANON_CACHE on Windows), crates/aube-lockfile/src/dep_path_filename.rs (stale module comment) Important Files Changed
Reviews (3): Last reviewed commit: "fixes" | Re-trigger Greptile |
A second pass over the install pipeline, the resolver, the registry client, the linker, and the cli front door. Each change either removes work that runs more often than the surrounding code needs it to, or routes existing work through a smaller set of shared helpers so the same pattern stays consistent across crates. None of the changes alter on-disk output, lockfile contents, fetched bytes, or run-script outcomes.
util: shared helpers
A new
aube_util::cachemodule exposes three primitives that several call sites had been growing ad-hoc copies of:ProcessCache<K, V>for in-memory memoization withArc<V>returns.DiskCachefor sharded file-backed key/value storage layered on the existingfs_atomic::atomic_write.FreshnessSnapshotfor the(mtime, size, blake3)triple that answers "did this file change?" via two cheap stats before falling back to BLAKE3.aube_util::bufaddswith_scratch_string/with_scratch_bytesthat hand out a thread-local buffer cleared on entry. The same pattern was already living inline inaube-scripts::policy; the helper means future call sites do not have to re-invent it.aube_util::hashgainsBlake3Builder, a length-prefixed-field hasher that codifies the encodingdelta::fingerprintinvented (tag, length, bytes per field) so any new hashing site picks up the same anti-collision shape. It also addsTeeReaderand theByteHashertrait so future streaming callers can update a hasher as bytes flow through aReadwithout taking onsha2as a dependency.aube_util::fs_atomicgainswrite_excl(path, bytes, mode)returning aWriteOutcomeenum. Content-addressed callers can now write straight to the final path because a colliding write is bit-identical by construction, while non-CAS callers still have the existingatomic_writebehavior.registry: tune the http client
build_http_clientnow requestsgzip,brotli, andzstdon packument fetches. Tarball requests still sendAccept-Encoding: identitybecause the payload is already gzipped, but packument JSON for popular packages is hundreds of KB to several MB and benefits from wire-level compression. The hardcodedaube/0.1.0user-agent that landed in cold CDN cache buckets is replaced with a realaube/<workspace-version> (<os> <arch>)header built once via aOnceLock.hickory-dnsis enabled so the first cold lookup per origin runs through reqwest's async resolver with in-process caching instead ofgetaddrinfoon the system thread pool.store: simplify the cas write path
create_cas_filereturned a flatResult<(), Error>regardless of whether it had freshly written the file or found an existing match. A new internalCasWriteOutcomeenum carries that distinction back toimport_bytes, which now skips the post-writecas_file_matches_lenstat when the outcome isCreated: the bytes at the final path are exactly what we just wrote, so re-stating only adds noise. TheAlreadyExistedarm still runs the length check because a partial torn file from a previous crash is the only realistic way the lengths can differ.The tempfile + persist-noclobber dance is replaced by a direct
O_CREAT|O_EXCL|O_WRONLYwrite throughaube_util::fs_atomic::write_excl. Content-addressed paths cannot suffer from a torn rename in the way classical paths can, because a racing concurrent writer's bytes are bit-identical by construction. Failure to open withEEXISTis a successful share, not an error.linker: cache canonicalize, reflink probe; skip redundant chmod
reconcile_top_level_linkcallscanonicalizeon both the link target and the expected target on Windows. NTFS canonicalization opens the reparse point, reads the target buffer, and queries attributes - five-ish syscalls per call. The expected target almost always shares the same prefix across an install, so aOnceLock<RwLock<HashMap<PathBuf, PathBuf>>>memoizes results for the lifetime of the process. The Linux and macOS paths are unchanged because theircanonicalizeis cheap.Linker::detect_strategy_crosswrites a real test file under the source directory and tries reflink, hardlink, and copy in turn. The probe is correct but invariant for the lifetime of the process: store and project file system mounts do not move between calls. A secondOnceLock<RwLock<HashMap<(PathBuf, PathBuf), LinkStrategy>>>caches the result keyed by the source / destination pair. Multiple Linker instances inside one install (prewarm + per-workspace + final) all see the cached answer instead of re-running the probe.make_executableran achmodon every linked file whoseexecutablebit was set. Hardlink and reflink both share the source inode's mode, so the CAS source already carries0o755for executables. Only theCopyfallback produces a fresh inode whose mode must be set explicitly; the other two strategies now skip the syscall.resolver: skip duplicate graph hash per peer-context iter
apply_peer_contexts's fixed-point loop hashes the graph twice per iteration: once before applying a pass, again after, comparing the two for convergence.graph_hashbuilds aVec<&str>sized to roughlypkgs * 3 + deps * 2tokens and walks it throughordered_seq_hash. The post-iteration hash of iterationNis the pre-iteration hash of iterationN+1, so it can be carried forward as a single variable. The innerapply_peer_contexts_onceand the dedupe pass it composes with are unchanged; the loop just stops paying for one full graph walk per iteration.lockfile: hash short dep_path names with blake3
dep_path_filename::short_hashwas the last user ofsha2::Sha256for an internal-only digest. Aube does not share the.aube/<encoded>/directory layout with pnpm, so there is no interop boundary to preserve, and BLAKE3 is the project default for non-cryptographic hashes elsewhere. The hex output stays 32 characters, the sharding is unchanged, and theSha256import is dropped.A separate, smaller change adds a docstring to
parse_jsonrecording why the input must stay cloned across the simd-json call: simd-json mutates the buffer in place to unflatten escape sequences, and the serde_json fallback diagnostic must run on the original bytes so the rendered span lines up with what the user wrote.manifest: cache parsed package.json and typed workspace config
PackageJson::from_pathis called from at least three different sites during a singleaube runinvocation: the interactive prompt path that raw-parses, the typedload_manifest, and the External catch-all that re-parses to checkscripts.contains_key. A newfrom_path_cachedreturnsArc<PackageJson>from a process-wide cache so repeat reads pay one parse and hand out shared references afterward. The ownedfrom_pathstays for callers that need to mutate.WorkspaceConfig::loadhad a siblingRAW_CACHEfor the rawBTreeMap<String, yaml_serde::Value>view, but the typed shape was uncached.find_workspace_packages, lockfile-dir resolution, catalog cleanup, jail-builds, and the install write-target picker all hitloadfour to eight times per command with the same cwd. A secondTYPED_CACHEparallel to the existing raw one closes the gap; missed entries fall through toload_uncachedand populate the cache.settings: binary-search the sorted meta table
SETTINGSis generated alphabetically bybuild.rs(the sourceBTreeMap<String, _>enforces it). The runtimemeta::findwas a linear scan over 100+ entries called dozens of times per command. Switching tobinary_search_byagainst the same slice drops the lookup toO(log N)without changing the data.install: share graph hashes between prewarm and link
compute_graph_hashes_with_patchesran twice on the no-lockfile path: once inside the prewarmtokio::spawntask, once in the link phase. The inputs are identical when nothing filters the graph between the two call sites (the common case). The prewarm task now wraps its result inArc<GraphHashes>and returns it through theJoinHandle; the link site reuses thatArcwhen the writeback graph node count and key set still matchgraph_for_link. Mismatches fall through to a fresh compute, so filtered installs and edge cases stay correct.The lockfile-only branch's
lo_clientpreviously rebuilt a freshRegistryClientjust to calltarball_urlfor every locked package. Reusing the resolver's already-builtArc<RegistryClient>skips one fullmake_clientpass and re-walks of.npmrc. The network-mode override is irrelevant in this branch because the call only constructs URLs from registry config; no actual fetches happen.The
network_concurrency_for_workersclamp is raised from 64 to 128. The npm registry advertises around 100 concurrent HTTP/2 streams per connection, and a 16-core box was previously capped at 48. The unit test for the clamp ladder is updated with the new ceiling.cli: memoize project root walks and thread borrowed env
find_project_rootandfind_workspace_rootwalk ancestors stat-ingpackage.json(and additionally JSON-shape-scanning it forfind_workspace_root) to pin the project boundary. They are invoked four to eight times peraube runfromrun_script_with,ensure_installed,enforce_package_manager_guardrails, and the catalog discovery path. Both functions are now thin wrappers overaube_util::cache::ProcessCachekeyed on the start path; the first call performs the walk, every subsequent call returns the clonedOption<PathBuf>from the cache.Several command entry points (
fetch,deploy,update,npm_fallback, plus three sites in the sharedcommands::mod) used to callaube_settings::values::capture_env()to hand a freshVec<(String, String)>intoResolveCtx. The same module already exposesprocess_env()returning a&'static [(String, String)]view of theLazyLock-cached snapshot. The call sites now borrow the slice directly and skip the per-call clone of every env entry.state: mtime fast-path on direct-dep manifest freshness
check_needs_installBLAKE3-hashes every direct-deppackage.jsonto confirm freshness. On a 30-direct-dep monorepo that is 30 hashes peraube runstartup, even when nothing has changed.InstallStateand itsFreshnessStateprojection gain an additivepackage_json_metafield mapping each manifest's relative path to(size, mtime_secs).package_jsons_staleconsults this map first: if the file's current size and mtime match the recorded snapshot, the file is byte-identical and the BLAKE3 hash is skipped. Mismatches (or missing entries from a state file written by an older version of aube) fall through to the existing hash check, so the schema is forward-compatible.The new field is
serde(default)andskip_serializing_if = "BTreeMap::is_empty", so older state files keep loading and the on-disk format is unchanged for projects that have never run a version of aube that captures the snapshot.why this is worth merging
Every change is bounded, reviewable, and locally provable. Each commit names exactly one logical area; no commit straddles concerns. The shared helpers are tested where the behavior is non-trivial (cache.rs, buf.rs, fs_atomic.rs each have unit tests). The freshness snapshot, the canonicalize cache, the reflink probe cache, the workspace typed cache, and the manifest cache all fall through to the existing path on miss, so the pre-PR behavior is the limit case of the new behavior.