Skip to content

perf: streaming sha512, parallel cas, tls prewarm, fetch reorder#469

Merged
jdx merged 4 commits intoendevco:mainfrom
imjustprism:perf
May 2, 2026
Merged

perf: streaming sha512, parallel cas, tls prewarm, fetch reorder#469
jdx merged 4 commits intoendevco:mainfrom
imjustprism:perf

Conversation

@imjustprism
Copy link
Copy Markdown
Contributor

@imjustprism imjustprism commented May 2, 2026

Second pass over install hot path. Bounded changes, no on-disk output drift, no lockfile-byte drift, no fetched-byte drift.

Added

Shared utils (aube-util)

  • cache::{ProcessCache, DiskCache, FreshnessSnapshot} for in-memory + disk + mtime/size/blake3 freshness primitives
  • buf::{with_scratch_string, with_scratch_bytes} thread-local scratch buffers
  • hash::{Blake3Builder, TeeReader, ByteHasher, blake3_hash_file} length-prefixed hasher, streaming tee, mmap-rayon BLAKE3 over 4 MiB
  • fs_atomic::write_excl direct O_CREAT|O_EXCL write returning WriteOutcome
  • concurrency::parse_concurrency_env clamped AUBE_CONCURRENCY=N override
  • snapshot::clone_tree reflink-aware tree copy (CoW on APFS / btrfs / ReFS, fallback fs::copy)

Install pipeline

  • RegistryClient::prewarm_connection fire-and-forget HEAD warms TLS + TCP + HTTP/2 behind manifest parse
  • RegistryClient::fetch_tarball_bytes_streaming_sha512 streams SHA-512 over wire chunks, skips post-buffer hash pass
  • aube_store::verify_precomputed_sha512 returns Result<bool, Error> (true = matched, false = non-SHA-512 fallback)
  • import_verified_tarball_streamed consumes precomputed digest, falls through to buffered path on legacy SRI
  • is_likely_native_build allowlist for critical-path fetch reorder

Changed

Install

  • streaming SHA-512 wired in BOTH lockfile + catch-up fetch paths
  • critical-path fetch sort: native-build packages float to front via stable sort_by_key
  • shared Arc<GraphHashes> between prewarm and link, no double-compute
  • lockfile-only branch reuses resolver's Arc<RegistryClient> instead of rebuilding
  • network concurrency default raised 64 → 128

Store

  • two-phase import_tarball: serial tar walk stages entries, rayon batch writes when entries ≥ 256
  • O_CREAT|O_EXCL direct write replaces tempfile + persist-noclobber for content path
  • new internal CasWriteOutcome skips redundant len check on Created outcome
  • post-write cas_file_matches_len only on AlreadyExisted arm

Registry

  • Accept-Encoding: gzip, br, zstd on packument fetches (tarballs stay identity)
  • real aube/<workspace-version> (<os> <arch>) UA replaces hardcoded aube/0.1.0
  • hickory-dns enabled for in-process async DNS cache
  • buffered read_body_capped + retry_bytes_body_read delegate through one streaming chunk loop

Linker

  • canonicalize result memoized via OnceLock<RwLock<HashMap>> on Windows
  • reflink-probe result cached process-wide keyed on (src, dst)
  • skip make_executable chmod on hardlink + reflink paths (CAS source already 0o755)

Resolver

  • peer-context fixed-point loop carries forward graph hash, one walk per iter instead of two

Manifest

  • PackageJson::from_path_cached returns Arc<PackageJson> from process-wide cache
  • WorkspaceConfig::load typed cache parallel to existing raw cache

Settings

  • meta::find switches from linear scan to binary_search_by over the codegen-sorted table

State

  • package_json_meta mtime + size fast-path skips BLAKE3 when both unchanged
  • additive serde field, forward-compatible with older state files

CLI

  • find_project_root + find_workspace_root memoize via ProcessCache
  • 7 sites borrow process_env() slice instead of cloning a fresh Vec<(String, String)>

Lockfile

  • dep_path_filename::short_hash SHA-256 → BLAKE3 (internal-only, not interop)
  • parse_json clone-then-mutate contract documented

Killswitches

env falls back to
AUBE_DISABLE_STREAMING_SHA512 buffered SHA-512 verify pass
AUBE_DISABLE_SPECULATIVE_TLS no prewarm
AUBE_DISABLE_CRITICAL_PATH resolver-discovery fetch order
AUBE_DISABLE_PARALLEL_IMPORT serial tar entry writes
AUBE_DISABLE_MMAP_BLAKE3 streaming BLAKE3
AUBE_DISABLE_SNAPSHOTS per-file fs::copy
AUBE_CONCURRENCY=N clamped fixed permit count

Round 2 (post-Greptile)

Fixed

  • Linker probe cache entry().or_insert(strategy) first-write-wins, was .insert() last-write-wins. Two concurrent linker probes (prewarm + final) racing on shared test files could clobber a correct Reflink result with a Copy fallback for the rest of the process
  • DiskCache::write_bytes does create_dir_all(parent) before atomic_write. Cold first write would fail with NotFound under the old code
  • verify_precomputed_sha512 distinguishes malformed base64 and decoded N bytes, expected 64 from integrity mismatch. Three clear messages instead of one misleading bucket

Simplified

  • Deleted dead AIMD scaffolding from aube_util::concurrency. ConcurrencyProbe + bump/back_off/AdjustReason had zero non-test callers. Replaced with parse_concurrency_env() -> Option<u32> (~120 LOC removed)
  • Error::IntegrityAlgoUnsupported variant deleted. verify_precomputed_sha512 returns Result<bool, Error> so callers detect non-SHA-512 fallback without matching on a sentinel error
  • import_verified_tarball_streamed accepts Option<&[u8; 64]>, both call sites collapse from 22-line match arms to single calls
  • is_likely_native_build allowlist trimmed: ws and sass are pure-JS, only node-sass is the actual native build

Tests

  • verify_precomputed_sha512 unit tests (happy / mismatch / non-SHA-512 fallback / malformed)
  • is_likely_native_build + sort stability tests
  • parse_concurrency_env env-bound tests

Round 3 (cold-path audit, 6-agent sweep)

Targeted at items the agents flagged as cold-relevant only. All preserve symlink topology, lockfile bytes, CAS bytes.

Registry — biggest single cold win

  • parse_full_response: bytes.to_vec() was cloning a 5-50 MB packument body on every fetch. Replaced with Bytes::try_into_mut zero-copy → BytesMut, dropped the dead serde_json::from_slice fallback (simd-json is a strict superset of RFC 8259 for valid JSON). Estimated ~1-3 s saved on a 200-packument cold install
  • same_host() URL re-parsing on every authed request: cached self.config.registry parse via OnceLock<reqwest::Url> on RegistryClient. Comparison shape (scheme + host + port) preserved byte-for-byte to keep the global-auth-token leak guard semantics identical

Linker

  • Pre-create aube_dir + scope dirs once before the GVS step1 par_iter (both branches), drop the per-package mkdirp(parent) that paid 1.4k stat syscalls on a typical install
  • materialize_into parents BTreeSet<PathBuf>Vec + sort_unstable + dedup. On heavy packages (typescript ~5k entries, next ~3k) the BTreeSet's per-insert log-N PathBuf comparison on 50-byte paths was a real cost
  • claimed: HashSet<String>FxHashSet<&str> in both hoist passes. Drops pkg.name.clone() × hundreds + SipHash overhead

Store

  • normalize_tar_entry_path pre-sizes output via String::with_capacity(raw.as_os_str().len()). Normalized form never grows beyond input, so the prealloc eliminates the String Vec growth on every tar entry

Test plan

  • cargo fmt --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo test --workspace 1289+ assertions, 0 failures
  • hermetic bench (mise run bench:bump) shows no cold-install regression vs main
  • verdaccio access-log diff: same set of unique URLs requested
  • store byte-identity snapshot unchanged
  • lockfile byte-identity unchanged on fixtures/medium

Real-world cold install bench (Windows, real npm registry)

3 runs each project, fresh aube store + cache + node_modules between runs. Min wall (cleanest signal):

project upstream main 1.6.2 this PR × faster
svelte (56 pkg) 1393 ms 1386 ms 1.01×
vue (117 pkg) 1590 ms 1360 ms 1.17×
nextjs (336 pkg) 14071 ms 9160 ms 1.54×
babylon (21 pkg) ~6000 ms 3186 ms ~1.9×

vs bun 1.3.13 cold (real npm registry, fresh cache):

project bun this PR × faster
svelte 7598 ms 1438 ms 5.28×
vue 13792 ms 1501 ms 9.19×
nextjs 46971 ms 8177 ms 5.74×
babylon 4217 ms 3366 ms 1.25×

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 2, 2026

Greptile Summary

Second performance pass over the install hot path: streaming SHA-512 during tarball download (skipping post-buffer hash), parallel CAS writes for large tarballs (≥256 entries), TLS/TCP prewarm before resolver starts, critical-path fetch reorder for native-build packages, and a collection of smaller wins (pre-dedup parent dirs, FxHashSet for claimed-name sets, cached URL parse for auth header). Both previous P1/P2 findings are addressed: the reflink-probe cache now uses entry().or_insert() (first-write-wins) and DiskCache::write_bytes creates the shard directory before writing.

Confidence Score: 5/5

Safe to merge; all previous P1 findings are resolved and remaining findings are P2 style/usability concerns.

No P0 or P1 issues remain. The three P2 comments are: a UX edge case in AUBE_CONCURRENCY below-floor clamping, duplicated retry loop code, and the removal of a serde_json parse fallback. None affect correctness on the standard install path.

crates/aube-registry/src/client.rs (retry loop duplication, serde_json fallback removal), crates/aube-util/src/concurrency.rs (out-of-range clamp behavior)

Important Files Changed

Filename Overview
crates/aube-linker/src/lib.rs Fixes the previous P1 reflink-probe cache race (insert→entry/or_insert), pre-creates parent dirs serially before par_iter, and switches two HashSet to FxHashSet<&str>. Parent-dir pre-creation verified correct: both pre-pass and par_iter filter on same local_source condition using same aube_dir.join(...) path.
crates/aube-registry/src/client.rs Adds streaming SHA-512 tarball fetch, TLS prewarm, cached same_host check, UA string, and Accept-Encoding for packuments. Previous SHA-512-on-every-response concern is resolved. Two P2s: retry loop duplicated between streaming and buffered variants; serde_json fallback removed from JSON parsing.
crates/aube-store/src/lib.rs Adds two-phase parallel import (serial tar walk + rayon batch CAS writes above 256-entry threshold), verify_precomputed_sha512, and #[non_exhaustive] on Error enum. already_verified fallthrough logic is correct. Tests are thorough.
crates/aube-util/src/concurrency.rs New module for AUBE_CONCURRENCY env override. Out-of-range values return None and fall back to default (128) instead of clamping to floor (8), which can worsen throttling for users trying to limit concurrency.
crates/aube-util/src/hash.rs Adds blake3_hash_file with mmap-rayon path for large files and streaming fallback. Hasher reset on mmap error before streaming fallback is correct. AUBE_DISABLE_MMAP_BLAKE3 killswitch wired in.
crates/aube-util/src/snapshot.rs New cross-platform reflink-aware tree copy. Windows symlink dir/file discrimination is correct and documented with limitations. Tests cover files, empty src, and dst-exists guard.
crates/aube/src/commands/install/lifecycle.rs Adds import_verified_tarball_streamed which short-circuits the integrity re-hash when a precomputed SHA-512 digest matches. Fallthrough to buffered path for non-SHA-512 SRI is correct.
crates/aube/src/commands/install/mod.rs Wires streaming SHA-512 into both lockfile and catch-up fetch paths, adds critical-path sort (native builds first), TLS prewarm, and AUBE_CONCURRENCY override. Killswitch hoisted out of per-tarball loop correctly.
crates/aube-util/src/cache.rs Fixes previous P2: write_bytes now calls create_dir_all on shard parent before atomic_write. ProcessCache concurrency contract clarified in doc comment.

Reviews (4): Last reviewed commit: "perf: drop packument vec clone, batch li..." | Re-trigger Greptile

@imjustprism imjustprism marked this pull request as draft May 2, 2026 12:16
@imjustprism imjustprism marked this pull request as ready for review May 2, 2026 16:04
@jdx jdx merged commit 7a231d6 into endevco:main May 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants