perf: cold install pipeline overhaul#522
Conversation
cold install wins on Windows bench (release, same-volume): aube 1.80x-8.75x faster than bun across svelte/vite/next/babylon. linux + macos paths gated cross-platform safe. ## perf - materialize-stream into lockfile-fast-path (run_gvs_prewarm_materializer helper extracted, both branches share). hides 30-200ms of GVS reflinks behind the in-flight download tail - streaming tarball pipeline (gated AUBE_TARBALL_STREAM=1), HTTP body chunks pipe through SHA-512 + gz + tar + CAS via tokio mpsc bridge. bounded by tarball_max_bytes cap + non-SHA-512 SRI fallback to buffered - O_TMPFILE + linkat CAS publish on linux with EOPNOTSUPP fallback to tempfile path. posix_fallocate before write avoids ext4 fragmentation. posix_fadvise(DONTNEED) after publish frees page cache, drop sync_data to match the no-fsync CAS policy - bounded mpsc(2048) at materialize handoff, replaces unbounded_channel - pin tokio worker_threads (cpu.min(8)) and max_blocking_threads (64), AUBE_TOKIO_WORKERS / AUBE_TOKIO_BLOCKING env overrides - foldhash-backed FxHashMap aliases for resolver and lockfile graph_hash, better avalanche than FxHash on dep_path keys with peer-context suffixes - pre-size resolver caches (cache, resolved_versions, visited) - thread-local node_semver::Version parse cache mirrors Range cache - PARALLEL_IMPORT_THRESHOLD 256 to 16 (median npm tarball is 7 files) - Windows FILE_ATTRIBUTE_NOT_CONTENT_INDEXED on store root, hints Defender + Search to skip - pre-allocate to_fetch Vec capacity from check_results.len() - hoist AUBE_TARBALL_STREAM env reads outside per-tarball loops - single-version manifest endpoint API (fetch_single_version_metadata) ## fix - propagate state::remove_state errors at --force and GVS-transition sites. silent swallow let permission-denied or Windows-locked sidecars survive, defeating freshness check - reject tar entry declared 0 bytes with non-zero stream payload, synthetic-entry injection - accept GNU LongName / LongLink metadata records, tar crate folds them into Entry::path() automatically - apply_peer_contexts returns Result<LockfileGraph, Error>, divergence at MAX_ITERATIONS becomes fatal instead of warning + shipping broken graph. new Error::PeerContextDivergence variant ## refactor - aube_util::collections::FxMap / FxSet (foldhash-backed aliases) replace inline rustc_hash defs in 3 places - aube_util::fs::cross_volume + set_not_content_indexed move out of inline cfg-gated blocks - aube_util::io::ChunkReader generic mpsc-to-Read bridge, replaces inline impl in install/lifecycle - import_tarball generic over R: Read via import_tarball_reader, &[u8] derefs naturally so existing callers unchanged - patches::load_patches_for_linker bundles the (Patches, hashes) shape build, replaces 3 inline copies - lifecycle::resolve_prewarm_shared bundles node_version + build_policy prep, replaces 2 inline copies - lifecycle::run_import_on_blocking centralizes spawn_blocking + clone + into_diagnostic dance, replaces 2 inline copies - materialize_channel + spawn_gvs_prewarm + MATERIALIZE_CHANNEL_CAPACITY consolidate the channel construction + tokio::spawn boilerplate
|
holy guacamole is all I got to say |
Greptile SummaryThis PR overhaulds the cold-install pipeline for significant speed wins on Windows (1.8x–8.75x faster than bun), with Linux and macOS improvements as well. The streaming tarball pipeline (
Confidence Score: 5/5Safe to merge. The streaming path is opt-in and off by default, so the two minor edge cases found do not affect existing install behavior. All new fast paths are cfg-gated or env-gated and fall back gracefully. Both findings are limited to the opt-in streaming path and do not affect the default code path. The streaming tarball pipeline in lifecycle.rs and env-flag parsing in mod.rs are the areas to revisit before making AUBE_TARBALL_STREAM default-on. Important Files Changed
Reviews (4): Last reviewed commit: "fixes" | Re-trigger Greptile |
Summary
cold install wins on Windows bench (release, same-volume): aube 1.80x-8.75x faster than bun across svelte/vite/next/babylon. linux + macos paths cfg-gated cross-platform safe.
bench:
hyperfine --warmup 1 --runs 3 --prepare 'rm -rf node_modules aube-lock.yaml bun.lock'over each project, target/release/aube vs bun 1.3.13 on Windows 11, NTFS, same-volume project + store.Added
aube_util::collections::FxMap/FxSet(foldhash-backed) shared aliasesaube_util::fs::cross_volume(windows drive letter, unix dev id)aube_util::fs::set_not_content_indexed(windows defender / search hint)aube_util::io::ChunkReader(mpsc-to-Read bridge for streaming tarball pipeline)RegistryClient::start_tarball_stream+tarball_max_bytesaccessor (streaming fetch entry point)RegistryClient::fetch_single_version_metadataAPI (lockfile drift refetch fast path)Error::PeerContextDivergence(usize)resolver variantWARN_AUBE_PROGRESS_OVERFLOWandMATERIALIZE_CHANNEL_CAPACITYexposed for downstreamAUBE_TARBALL_STREAMenv (opt-in streaming pipeline)AUBE_DISABLE_TARBALL_STREAMenv (kill switch)AUBE_DISABLE_O_TMPFILEenv (linux fast path kill switch)AUBE_TOKIO_WORKERS/AUBE_TOKIO_BLOCKINGenv (runtime tuning)Changed
perf
run_gvs_prewarm_materializerhelper extracted, both lockfile + no-lockfile branches share. hides 30-200ms of GVS reflinks behind the in-flight download tail. biggest single win on cold-CI shape.AUBE_TARBALL_STREAM=1): HTTP body chunks pipe through SHA-512 + gz + tar + CAS via mpsc bridge. bounded bytarball_max_bytescap. non-SHA-512 SRI falls back to buffered.posix_fallocatebefore write avoids ext4 fragmentation.posix_fadvise(DONTNEED)after publish frees page cache. droppedsync_datato match the no-fsync CAS policy.mpsc(2048)at materialize handoff (was unbounded)cpu.min(8)) and max_blocking_threads (64)node_semver::Versionparse cache mirrors range cachePARALLEL_IMPORT_THRESHOLD256 to 16 (median npm tarball is 7 files)FILE_ATTRIBUTE_NOT_CONTENT_INDEXEDon store rootto_fetchVec capacity fromcheck_results.len()refactor
import_tarballgeneric overR: Readviaimport_tarball_reader, &[u8] derefs so existing callers unchangedpatches::load_patches_for_linkerbundles(Patches, hashes)shape (replaces 3 inline copies)lifecycle::resolve_prewarm_sharedbundlesnode_version+build_policyprep (replaces 2 inline copies)lifecycle::run_import_on_blockingcentralizesspawn_blocking+ clone + into_diagnostic (replaces 2 inline copies)materialize_channel+spawn_gvs_prewarm+MATERIALIZE_CHANNEL_CAPACITYconsolidate channel + tokio::spawn boilerplateFixed
state::remove_stateerrors at--forceand GVS-transition sites. silent swallow let permission-denied or Windows-locked sidecars survive, defeating the freshness checkEntry::path()automatically)apply_peer_contextsdivergence atMAX_ITERATIONSbecomes fatal instead of warning + shipping broken graph. returnsResult<LockfileGraph, Error>nowCross-platform
every platform-specific code path is
cfg-gated:cross_volumeMetadataExt::devMetadataExt::devset_not_content_indexedlinux gets the brand-new O_TMPFILE+linkat fast path that windows does not have. macos benefits from
clonefile()already wired viareflink_copy. expect cold install ratios to be even larger on macos than on windows.Test plan
cargo buildcleancargo clippy --all-targets -- -D warningscleancargo test --liball 956+ unit tests passAUBE_TARBALL_STREAM=1(left-pad et al, fresh fetch)mise run bench) on linuxAUBE_TARBALL_STREAM=1default-on