Skip to content

perf: cache hot-path work across install, resolver, and registry#453

Merged
jdx merged 15 commits intoendevco:mainfrom
imjustprism:perf
May 1, 2026
Merged

perf: cache hot-path work across install, resolver, and registry#453
jdx merged 15 commits intoendevco:mainfrom
imjustprism:perf

Conversation

@imjustprism
Copy link
Copy Markdown
Contributor

A second pass over the install pipeline, the resolver, the registry client, the linker, and the cli front door. Each change either removes work that runs more often than the surrounding code needs it to, or routes existing work through a smaller set of shared helpers so the same pattern stays consistent across crates. None of the changes alter on-disk output, lockfile contents, fetched bytes, or run-script outcomes.

util: shared helpers

A new aube_util::cache module exposes three primitives that several call sites had been growing ad-hoc copies of:

  • ProcessCache<K, V> for in-memory memoization with Arc<V> returns.
  • DiskCache for sharded file-backed key/value storage layered on the existing fs_atomic::atomic_write.
  • FreshnessSnapshot for the (mtime, size, blake3) triple that answers "did this file change?" via two cheap stats before falling back to BLAKE3.

aube_util::buf adds with_scratch_string / with_scratch_bytes that hand out a thread-local buffer cleared on entry. The same pattern was already living inline in aube-scripts::policy; the helper means future call sites do not have to re-invent it.

aube_util::hash gains Blake3Builder, a length-prefixed-field hasher that codifies the encoding delta::fingerprint invented (tag, length, bytes per field) so any new hashing site picks up the same anti-collision shape. It also adds TeeReader and the ByteHasher trait so future streaming callers can update a hasher as bytes flow through a Read without taking on sha2 as a dependency.

aube_util::fs_atomic gains write_excl(path, bytes, mode) returning a WriteOutcome enum. Content-addressed callers can now write straight to the final path because a colliding write is bit-identical by construction, while non-CAS callers still have the existing atomic_write behavior.

registry: tune the http client

build_http_client now requests gzip, brotli, and zstd on packument fetches. Tarball requests still send Accept-Encoding: identity because the payload is already gzipped, but packument JSON for popular packages is hundreds of KB to several MB and benefits from wire-level compression. The hardcoded aube/0.1.0 user-agent that landed in cold CDN cache buckets is replaced with a real aube/<workspace-version> (<os> <arch>) header built once via a OnceLock. hickory-dns is enabled so the first cold lookup per origin runs through reqwest's async resolver with in-process caching instead of getaddrinfo on the system thread pool.

store: simplify the cas write path

create_cas_file returned a flat Result<(), Error> regardless of whether it had freshly written the file or found an existing match. A new internal CasWriteOutcome enum carries that distinction back to import_bytes, which now skips the post-write cas_file_matches_len stat when the outcome is Created: the bytes at the final path are exactly what we just wrote, so re-stating only adds noise. The AlreadyExisted arm still runs the length check because a partial torn file from a previous crash is the only realistic way the lengths can differ.

The tempfile + persist-noclobber dance is replaced by a direct O_CREAT|O_EXCL|O_WRONLY write through aube_util::fs_atomic::write_excl. Content-addressed paths cannot suffer from a torn rename in the way classical paths can, because a racing concurrent writer's bytes are bit-identical by construction. Failure to open with EEXIST is a successful share, not an error.

linker: cache canonicalize, reflink probe; skip redundant chmod

reconcile_top_level_link calls canonicalize on both the link target and the expected target on Windows. NTFS canonicalization opens the reparse point, reads the target buffer, and queries attributes - five-ish syscalls per call. The expected target almost always shares the same prefix across an install, so a OnceLock<RwLock<HashMap<PathBuf, PathBuf>>> memoizes results for the lifetime of the process. The Linux and macOS paths are unchanged because their canonicalize is cheap.

Linker::detect_strategy_cross writes a real test file under the source directory and tries reflink, hardlink, and copy in turn. The probe is correct but invariant for the lifetime of the process: store and project file system mounts do not move between calls. A second OnceLock<RwLock<HashMap<(PathBuf, PathBuf), LinkStrategy>>> caches the result keyed by the source / destination pair. Multiple Linker instances inside one install (prewarm + per-workspace + final) all see the cached answer instead of re-running the probe.

make_executable ran a chmod on every linked file whose executable bit was set. Hardlink and reflink both share the source inode's mode, so the CAS source already carries 0o755 for executables. Only the Copy fallback produces a fresh inode whose mode must be set explicitly; the other two strategies now skip the syscall.

resolver: skip duplicate graph hash per peer-context iter

apply_peer_contexts's fixed-point loop hashes the graph twice per iteration: once before applying a pass, again after, comparing the two for convergence. graph_hash builds a Vec<&str> sized to roughly pkgs * 3 + deps * 2 tokens and walks it through ordered_seq_hash. The post-iteration hash of iteration N is the pre-iteration hash of iteration N+1, so it can be carried forward as a single variable. The inner apply_peer_contexts_once and the dedupe pass it composes with are unchanged; the loop just stops paying for one full graph walk per iteration.

lockfile: hash short dep_path names with blake3

dep_path_filename::short_hash was the last user of sha2::Sha256 for an internal-only digest. Aube does not share the .aube/<encoded>/ directory layout with pnpm, so there is no interop boundary to preserve, and BLAKE3 is the project default for non-cryptographic hashes elsewhere. The hex output stays 32 characters, the sharding is unchanged, and the Sha256 import is dropped.

A separate, smaller change adds a docstring to parse_json recording why the input must stay cloned across the simd-json call: simd-json mutates the buffer in place to unflatten escape sequences, and the serde_json fallback diagnostic must run on the original bytes so the rendered span lines up with what the user wrote.

manifest: cache parsed package.json and typed workspace config

PackageJson::from_path is called from at least three different sites during a single aube run invocation: the interactive prompt path that raw-parses, the typed load_manifest, and the External catch-all that re-parses to check scripts.contains_key. A new from_path_cached returns Arc<PackageJson> from a process-wide cache so repeat reads pay one parse and hand out shared references afterward. The owned from_path stays for callers that need to mutate.

WorkspaceConfig::load had a sibling RAW_CACHE for the raw BTreeMap<String, yaml_serde::Value> view, but the typed shape was uncached. find_workspace_packages, lockfile-dir resolution, catalog cleanup, jail-builds, and the install write-target picker all hit load four to eight times per command with the same cwd. A second TYPED_CACHE parallel to the existing raw one closes the gap; missed entries fall through to load_uncached and populate the cache.

settings: binary-search the sorted meta table

SETTINGS is generated alphabetically by build.rs (the source BTreeMap<String, _> enforces it). The runtime meta::find was a linear scan over 100+ entries called dozens of times per command. Switching to binary_search_by against the same slice drops the lookup to O(log N) without changing the data.

install: share graph hashes between prewarm and link

compute_graph_hashes_with_patches ran twice on the no-lockfile path: once inside the prewarm tokio::spawn task, once in the link phase. The inputs are identical when nothing filters the graph between the two call sites (the common case). The prewarm task now wraps its result in Arc<GraphHashes> and returns it through the JoinHandle; the link site reuses that Arc when the writeback graph node count and key set still match graph_for_link. Mismatches fall through to a fresh compute, so filtered installs and edge cases stay correct.

The lockfile-only branch's lo_client previously rebuilt a fresh RegistryClient just to call tarball_url for every locked package. Reusing the resolver's already-built Arc<RegistryClient> skips one full make_client pass and re-walks of .npmrc. The network-mode override is irrelevant in this branch because the call only constructs URLs from registry config; no actual fetches happen.

The network_concurrency_for_workers clamp is raised from 64 to 128. The npm registry advertises around 100 concurrent HTTP/2 streams per connection, and a 16-core box was previously capped at 48. The unit test for the clamp ladder is updated with the new ceiling.

cli: memoize project root walks and thread borrowed env

find_project_root and find_workspace_root walk ancestors stat-ing package.json (and additionally JSON-shape-scanning it for find_workspace_root) to pin the project boundary. They are invoked four to eight times per aube run from run_script_with, ensure_installed, enforce_package_manager_guardrails, and the catalog discovery path. Both functions are now thin wrappers over aube_util::cache::ProcessCache keyed on the start path; the first call performs the walk, every subsequent call returns the cloned Option<PathBuf> from the cache.

Several command entry points (fetch, deploy, update, npm_fallback, plus three sites in the shared commands::mod) used to call aube_settings::values::capture_env() to hand a fresh Vec<(String, String)> into ResolveCtx. The same module already exposes process_env() returning a &'static [(String, String)] view of the LazyLock-cached snapshot. The call sites now borrow the slice directly and skip the per-call clone of every env entry.

state: mtime fast-path on direct-dep manifest freshness

check_needs_install BLAKE3-hashes every direct-dep package.json to confirm freshness. On a 30-direct-dep monorepo that is 30 hashes per aube run startup, even when nothing has changed.

InstallState and its FreshnessState projection gain an additive package_json_meta field mapping each manifest's relative path to (size, mtime_secs). package_jsons_stale consults this map first: if the file's current size and mtime match the recorded snapshot, the file is byte-identical and the BLAKE3 hash is skipped. Mismatches (or missing entries from a state file written by an older version of aube) fall through to the existing hash check, so the schema is forward-compatible.

The new field is serde(default) and skip_serializing_if = "BTreeMap::is_empty", so older state files keep loading and the on-disk format is unchanged for projects that have never run a version of aube that captures the snapshot.

why this is worth merging

Every change is bounded, reviewable, and locally provable. Each commit names exactly one logical area; no commit straddles concerns. The shared helpers are tested where the behavior is non-trivial (cache.rs, buf.rs, fs_atomic.rs each have unit tests). The freshness snapshot, the canonicalize cache, the reflink probe cache, the workspace typed cache, and the manifest cache all fall through to the existing path on miss, so the pre-PR behavior is the limit case of the new behavior.

@imjustprism imjustprism marked this pull request as draft May 1, 2026 14:21
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 1, 2026

Greptile Summary

Wide-ranging performance pass across install, resolver, registry, linker, manifest, and settings — adding shared ProcessCache/DiskCache/FreshnessSnapshot primitives in aube-util, then threading them through a dozen previously independent hot paths. All behavioral changes address issues raised in the previous review round (full-precision mtime in FileMeta, or_insert_with first-write-wins in ProcessCache, debug_assert for SETTINGS sort order, make_executable correctly reverted to run for all link strategies).

Confidence Score: 5/5

Safe to merge — all findings are P2 and do not affect correctness on any common path.

All three inline comments are P2: a stale doc comment, a Windows-only cache that can cause redundant work (not incorrect results), and a note about acceptable concurrent-miss behavior. No P0 or P1 findings remain after verifying the previous thread items were all addressed in this revision.

crates/aube-linker/src/lib.rs (CANON_CACHE on Windows), crates/aube-lockfile/src/dep_path_filename.rs (stale module comment)

Important Files Changed

Filename Overview
crates/aube-util/src/cache.rs New module introducing ProcessCache, DiskCache, and FreshnessSnapshot — well-structured, uses first-write-wins or_insert_with correctly, has unit tests for core behavior
crates/aube-linker/src/lib.rs detect_strategy_cross and reconcile_top_level_link gain process-lifetime caches; CANON_CACHE incorrectly memoizes the mutable link_path in addition to the stable expected_abs on Windows; make_executable correctly reverted to run for all link strategies
crates/aube-lockfile/src/dep_path_filename.rs short_hash switched from SHA-256 to BLAKE3; module comment still claims pnpm byte-compatibility which is no longer accurate for the hash suffix
crates/aube-store/src/lib.rs CasWriteOutcome enum cleanly distinguishes Created vs AlreadyExisted, skipping redundant stat on fresh writes; reverted to tempfile+persist_noclobber to avoid partial-file exposure on write failure
crates/aube/src/state.rs FileMeta captures full nanosecond mtime precision (secs + nanos), addressing the previous sub-second truncation concern; serde(default) ensures backward compat with older state files
crates/aube-manifest/src/lib.rs from_path_cached adds process-wide Arc-returning cache with first-write-wins semantics; concurrent cold misses can trigger duplicate parses (bounded, acceptable)
crates/aube-registry/src/client.rs Adds gzip/brotli/zstd decompression, dynamic User-Agent via OnceLock, and hickory-dns in-process caching; changes are client-wide with correct scope for packument vs tarball requests
crates/aube/src/commands/install/mod.rs Prewarm graph hashes threaded through to link phase via Arc; lo_client reuses resolver's Arc; concurrency clamp raised from 64 to 128 with updated test
crates/aube/src/dirs.rs find_project_root and find_workspace_root wrapped in positive-only ProcessCache; negative results correctly not cached to avoid shadowing newly-created project files
crates/aube-settings/src/meta.rs Linear scan replaced with binary_search_by; debug_assert added to catch sort-order regressions in test builds

Reviews (3): Last reviewed commit: "fixes" | Re-trigger Greptile

Comment thread crates/aube/src/state.rs Outdated
Comment thread crates/aube-util/src/fs_atomic.rs
Comment thread crates/aube-util/src/cache.rs
Comment thread crates/aube-util/src/cache.rs
Comment thread crates/aube-settings/src/meta.rs Outdated
Comment thread crates/aube-linker/src/lib.rs Outdated
@imjustprism imjustprism marked this pull request as ready for review May 1, 2026 15:09
@jdx jdx merged commit c27ab7a into endevco:main May 1, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants