perf: streaming sha512, parallel cas, tls prewarm, fetch reorder#469
perf: streaming sha512, parallel cas, tls prewarm, fetch reorder#469jdx merged 4 commits intoendevco:mainfrom
Conversation
Greptile SummarySecond performance pass over the install hot path: streaming SHA-512 during tarball download (skipping post-buffer hash), parallel CAS writes for large tarballs (≥256 entries), TLS/TCP prewarm before resolver starts, critical-path fetch reorder for native-build packages, and a collection of smaller wins (pre-dedup parent dirs, FxHashSet for claimed-name sets, cached URL parse for auth header). Both previous P1/P2 findings are addressed: the reflink-probe cache now uses Confidence Score: 5/5Safe to merge; all previous P1 findings are resolved and remaining findings are P2 style/usability concerns. No P0 or P1 issues remain. The three P2 comments are: a UX edge case in AUBE_CONCURRENCY below-floor clamping, duplicated retry loop code, and the removal of a serde_json parse fallback. None affect correctness on the standard install path. crates/aube-registry/src/client.rs (retry loop duplication, serde_json fallback removal), crates/aube-util/src/concurrency.rs (out-of-range clamp behavior) Important Files Changed
Reviews (4): Last reviewed commit: "perf: drop packument vec clone, batch li..." | Re-trigger Greptile |
Second pass over install hot path. Bounded changes, no on-disk output drift, no lockfile-byte drift, no fetched-byte drift.
Added
Shared utils (
aube-util)cache::{ProcessCache, DiskCache, FreshnessSnapshot}for in-memory + disk + mtime/size/blake3 freshness primitivesbuf::{with_scratch_string, with_scratch_bytes}thread-local scratch buffershash::{Blake3Builder, TeeReader, ByteHasher, blake3_hash_file}length-prefixed hasher, streaming tee, mmap-rayon BLAKE3 over 4 MiBfs_atomic::write_excldirectO_CREAT|O_EXCLwrite returningWriteOutcomeconcurrency::parse_concurrency_envclampedAUBE_CONCURRENCY=Noverridesnapshot::clone_treereflink-aware tree copy (CoW on APFS / btrfs / ReFS, fallbackfs::copy)Install pipeline
RegistryClient::prewarm_connectionfire-and-forget HEAD warms TLS + TCP + HTTP/2 behind manifest parseRegistryClient::fetch_tarball_bytes_streaming_sha512streams SHA-512 over wire chunks, skips post-buffer hash passaube_store::verify_precomputed_sha512returnsResult<bool, Error>(true= matched,false= non-SHA-512 fallback)import_verified_tarball_streamedconsumes precomputed digest, falls through to buffered path on legacy SRIis_likely_native_buildallowlist for critical-path fetch reorderChanged
Install
sort_by_keyArc<GraphHashes>between prewarm and link, no double-computeArc<RegistryClient>instead of rebuildingStore
import_tarball: serial tar walk stages entries, rayon batch writes when entries ≥ 256O_CREAT|O_EXCLdirect write replaces tempfile + persist-noclobber for content pathCasWriteOutcomeskips redundant len check onCreatedoutcomecas_file_matches_lenonly onAlreadyExistedarmRegistry
Accept-Encoding: gzip, br, zstdon packument fetches (tarballs stayidentity)aube/<workspace-version> (<os> <arch>)UA replaces hardcodedaube/0.1.0hickory-dnsenabled for in-process async DNS cacheread_body_capped+retry_bytes_body_readdelegate through one streaming chunk loopLinker
OnceLock<RwLock<HashMap>>on Windows(src, dst)make_executablechmod on hardlink + reflink paths (CAS source already0o755)Resolver
Manifest
PackageJson::from_path_cachedreturnsArc<PackageJson>from process-wide cacheWorkspaceConfig::loadtyped cache parallel to existing raw cacheSettings
meta::findswitches from linear scan tobinary_search_byover the codegen-sorted tableState
package_json_metamtime + size fast-path skips BLAKE3 when both unchangedCLI
find_project_root+find_workspace_rootmemoize viaProcessCacheprocess_env()slice instead of cloning a freshVec<(String, String)>Lockfile
dep_path_filename::short_hashSHA-256 → BLAKE3 (internal-only, not interop)parse_jsonclone-then-mutate contract documentedKillswitches
AUBE_DISABLE_STREAMING_SHA512AUBE_DISABLE_SPECULATIVE_TLSAUBE_DISABLE_CRITICAL_PATHAUBE_DISABLE_PARALLEL_IMPORTAUBE_DISABLE_MMAP_BLAKE3AUBE_DISABLE_SNAPSHOTSfs::copyAUBE_CONCURRENCY=NRound 2 (post-Greptile)
Fixed
entry().or_insert(strategy)first-write-wins, was.insert()last-write-wins. Two concurrent linker probes (prewarm + final) racing on shared test files could clobber a correct Reflink result with a Copy fallback for the rest of the processDiskCache::write_bytesdoescreate_dir_all(parent)beforeatomic_write. Cold first write would fail withNotFoundunder the old codeverify_precomputed_sha512distinguishesmalformed base64anddecoded N bytes, expected 64fromintegrity mismatch. Three clear messages instead of one misleading bucketSimplified
aube_util::concurrency.ConcurrencyProbe+bump/back_off/AdjustReasonhad zero non-test callers. Replaced withparse_concurrency_env() -> Option<u32>(~120 LOC removed)Error::IntegrityAlgoUnsupportedvariant deleted.verify_precomputed_sha512returnsResult<bool, Error>so callers detect non-SHA-512 fallback without matching on a sentinel errorimport_verified_tarball_streamedacceptsOption<&[u8; 64]>, both call sites collapse from 22-line match arms to single callsis_likely_native_buildallowlist trimmed:wsandsassare pure-JS, onlynode-sassis the actual native buildTests
verify_precomputed_sha512unit tests (happy / mismatch / non-SHA-512 fallback / malformed)is_likely_native_build+ sort stability testsparse_concurrency_envenv-bound testsRound 3 (cold-path audit, 6-agent sweep)
Targeted at items the agents flagged as cold-relevant only. All preserve symlink topology, lockfile bytes, CAS bytes.
Registry — biggest single cold win
parse_full_response:bytes.to_vec()was cloning a 5-50 MB packument body on every fetch. Replaced withBytes::try_into_mutzero-copy →BytesMut, dropped the deadserde_json::from_slicefallback (simd-json is a strict superset of RFC 8259 for valid JSON). Estimated ~1-3 s saved on a 200-packument cold installsame_host()URL re-parsing on every authed request: cachedself.config.registryparse viaOnceLock<reqwest::Url>onRegistryClient. Comparison shape (scheme + host + port) preserved byte-for-byte to keep the global-auth-token leak guard semantics identicalLinker
aube_dir+ scope dirs once before the GVS step1 par_iter (both branches), drop the per-packagemkdirp(parent)that paid 1.4k stat syscalls on a typical installmaterialize_intoparentsBTreeSet<PathBuf>→Vec+sort_unstable+dedup. On heavy packages (typescript ~5k entries, next ~3k) the BTreeSet's per-insert log-N PathBuf comparison on 50-byte paths was a real costclaimed: HashSet<String>→FxHashSet<&str>in both hoist passes. Dropspkg.name.clone()× hundreds + SipHash overheadStore
normalize_tar_entry_pathpre-sizes output viaString::with_capacity(raw.as_os_str().len()). Normalized form never grows beyond input, so the prealloc eliminates theStringVec growth on every tar entryTest plan
cargo fmt --checkcleancargo clippy --workspace --all-targets -- -D warningscleancargo test --workspace1289+ assertions, 0 failuresmise run bench:bump) shows no cold-install regression vs mainfixtures/mediumReal-world cold install bench (Windows, real npm registry)
3 runs each project, fresh aube store + cache + node_modules between runs. Min wall (cleanest signal):
vs bun 1.3.13 cold (real npm registry, fresh cache):