Skip to content

perf: cold install pipeline overhaul#522

Merged
jdx merged 6 commits intoendevco:mainfrom
imjustprism:perf-nextgen
May 5, 2026
Merged

perf: cold install pipeline overhaul#522
jdx merged 6 commits intoendevco:mainfrom
imjustprism:perf-nextgen

Conversation

@imjustprism
Copy link
Copy Markdown
Contributor

Summary

cold install wins on Windows bench (release, same-volume): aube 1.80x-8.75x faster than bun across svelte/vite/next/babylon. linux + macos paths cfg-gated cross-platform safe.

project aube bun aube faster
svelte (31 pkg) 0.31s 2.69s 8.75x
vite (64 pkg) 0.72s 5.14s 7.12x
next (27 pkg) 1.64s 2.94s 1.80x
babylon (3 pkg) 0.18s 0.53s 2.93x

bench: hyperfine --warmup 1 --runs 3 --prepare 'rm -rf node_modules aube-lock.yaml bun.lock' over each project, target/release/aube vs bun 1.3.13 on Windows 11, NTFS, same-volume project + store.

Added

  • aube_util::collections::FxMap / FxSet (foldhash-backed) shared aliases
  • aube_util::fs::cross_volume (windows drive letter, unix dev id)
  • aube_util::fs::set_not_content_indexed (windows defender / search hint)
  • aube_util::io::ChunkReader (mpsc-to-Read bridge for streaming tarball pipeline)
  • RegistryClient::start_tarball_stream + tarball_max_bytes accessor (streaming fetch entry point)
  • RegistryClient::fetch_single_version_metadata API (lockfile drift refetch fast path)
  • Error::PeerContextDivergence(usize) resolver variant
  • WARN_AUBE_PROGRESS_OVERFLOW and MATERIALIZE_CHANNEL_CAPACITY exposed for downstream
  • AUBE_TARBALL_STREAM env (opt-in streaming pipeline)
  • AUBE_DISABLE_TARBALL_STREAM env (kill switch)
  • AUBE_DISABLE_O_TMPFILE env (linux fast path kill switch)
  • AUBE_TOKIO_WORKERS / AUBE_TOKIO_BLOCKING env (runtime tuning)

Changed

perf

  • materialize-stream into lockfile-fast-path: run_gvs_prewarm_materializer helper extracted, both lockfile + no-lockfile branches share. hides 30-200ms of GVS reflinks behind the in-flight download tail. biggest single win on cold-CI shape.
  • streaming tarball pipeline (gated AUBE_TARBALL_STREAM=1): HTTP body chunks pipe through SHA-512 + gz + tar + CAS via mpsc bridge. bounded by tarball_max_bytes cap. non-SHA-512 SRI falls back to buffered.
  • O_TMPFILE + linkat CAS publish on linux: with EOPNOTSUPP fallback to tempfile path. posix_fallocate before write avoids ext4 fragmentation. posix_fadvise(DONTNEED) after publish frees page cache. dropped sync_data to match the no-fsync CAS policy.
  • bounded mpsc(2048) at materialize handoff (was unbounded)
  • pin tokio worker_threads (cpu.min(8)) and max_blocking_threads (64)
  • foldhash for resolver + lockfile graph_hash hot maps (better avalanche on dep_path keys with peer-context suffixes)
  • pre-size resolver caches (cache, resolved_versions, visited)
  • thread-local node_semver::Version parse cache mirrors range cache
  • PARALLEL_IMPORT_THRESHOLD 256 to 16 (median npm tarball is 7 files)
  • Windows FILE_ATTRIBUTE_NOT_CONTENT_INDEXED on store root
  • pre-allocate to_fetch Vec capacity from check_results.len()
  • hoist env reads outside per-tarball loops

refactor

  • import_tarball generic over R: Read via import_tarball_reader, &[u8] derefs so existing callers unchanged
  • patches::load_patches_for_linker bundles (Patches, hashes) shape (replaces 3 inline copies)
  • lifecycle::resolve_prewarm_shared bundles node_version + build_policy prep (replaces 2 inline copies)
  • lifecycle::run_import_on_blocking centralizes spawn_blocking + clone + into_diagnostic (replaces 2 inline copies)
  • materialize_channel + spawn_gvs_prewarm + MATERIALIZE_CHANNEL_CAPACITY consolidate channel + tokio::spawn boilerplate

Fixed

  • propagate state::remove_state errors at --force and GVS-transition sites. silent swallow let permission-denied or Windows-locked sidecars survive, defeating the freshness check
  • reject tar entry declared 0 bytes with non-zero stream payload (synthetic-entry injection)
  • accept GNU LongName / LongLink metadata records (tar crate folds them into Entry::path() automatically)
  • apply_peer_contexts divergence at MAX_ITERATIONS becomes fatal instead of warning + shipping broken graph. returns Result<LockfileGraph, Error> now

Cross-platform

every platform-specific code path is cfg-gated:

fix windows linux macos
cross_volume drive letters MetadataExt::dev MetadataExt::dev
set_not_content_indexed NTFS attr no-op no-op
O_TMPFILE+linkat CAS n/a active n/a
streaming pipeline active active active
foldhash active active active
materialize-stream active active active
posix_fallocate / fadvise n/a active n/a

linux gets the brand-new O_TMPFILE+linkat fast path that windows does not have. macos benefits from clonefile() already wired via reflink_copy. expect cold install ratios to be even larger on macos than on windows.

Test plan

  • cargo build clean
  • cargo clippy --all-targets -- -D warnings clean
  • cargo test --lib all 956+ unit tests pass
  • real-project smoke: svelte / vite / next.js / babylon installs clean (warm + cold)
  • real-project smoke with AUBE_TARBALL_STREAM=1 (left-pad et al, fresh fetch)
  • cross-volume cold install correctness (NTFS C: -> D:, copy fallback path)
  • same-volume cold bench vs bun (numbers above)
  • linux + macos benches (pending hardware access)
  • hermetic Verdaccio bench (mise run bench) on linux
  • bats coverage of mid-stream-drop / gzip-bomb-cap / SHA-mismatch rollback before flipping AUBE_TARBALL_STREAM=1 default-on

imjustprism and others added 3 commits May 5, 2026 22:03
cold install wins on Windows bench (release, same-volume): aube 1.80x-8.75x
faster than bun across svelte/vite/next/babylon. linux + macos paths
gated cross-platform safe.

## perf
- materialize-stream into lockfile-fast-path (run_gvs_prewarm_materializer
  helper extracted, both branches share). hides 30-200ms of GVS reflinks
  behind the in-flight download tail
- streaming tarball pipeline (gated AUBE_TARBALL_STREAM=1), HTTP body
  chunks pipe through SHA-512 + gz + tar + CAS via tokio mpsc bridge.
  bounded by tarball_max_bytes cap + non-SHA-512 SRI fallback to buffered
- O_TMPFILE + linkat CAS publish on linux with EOPNOTSUPP fallback to
  tempfile path. posix_fallocate before write avoids ext4 fragmentation.
  posix_fadvise(DONTNEED) after publish frees page cache, drop sync_data
  to match the no-fsync CAS policy
- bounded mpsc(2048) at materialize handoff, replaces unbounded_channel
- pin tokio worker_threads (cpu.min(8)) and max_blocking_threads (64),
  AUBE_TOKIO_WORKERS / AUBE_TOKIO_BLOCKING env overrides
- foldhash-backed FxHashMap aliases for resolver and lockfile graph_hash,
  better avalanche than FxHash on dep_path keys with peer-context suffixes
- pre-size resolver caches (cache, resolved_versions, visited)
- thread-local node_semver::Version parse cache mirrors Range cache
- PARALLEL_IMPORT_THRESHOLD 256 to 16 (median npm tarball is 7 files)
- Windows FILE_ATTRIBUTE_NOT_CONTENT_INDEXED on store root, hints
  Defender + Search to skip
- pre-allocate to_fetch Vec capacity from check_results.len()
- hoist AUBE_TARBALL_STREAM env reads outside per-tarball loops
- single-version manifest endpoint API (fetch_single_version_metadata)

## fix
- propagate state::remove_state errors at --force and GVS-transition
  sites. silent swallow let permission-denied or Windows-locked sidecars
  survive, defeating freshness check
- reject tar entry declared 0 bytes with non-zero stream payload,
  synthetic-entry injection
- accept GNU LongName / LongLink metadata records, tar crate folds them
  into Entry::path() automatically
- apply_peer_contexts returns Result<LockfileGraph, Error>, divergence
  at MAX_ITERATIONS becomes fatal instead of warning + shipping broken
  graph. new Error::PeerContextDivergence variant

## refactor
- aube_util::collections::FxMap / FxSet (foldhash-backed aliases) replace
  inline rustc_hash defs in 3 places
- aube_util::fs::cross_volume + set_not_content_indexed move out of
  inline cfg-gated blocks
- aube_util::io::ChunkReader generic mpsc-to-Read bridge, replaces
  inline impl in install/lifecycle
- import_tarball generic over R: Read via import_tarball_reader, &[u8]
  derefs naturally so existing callers unchanged
- patches::load_patches_for_linker bundles the (Patches, hashes) shape
  build, replaces 3 inline copies
- lifecycle::resolve_prewarm_shared bundles node_version + build_policy
  prep, replaces 2 inline copies
- lifecycle::run_import_on_blocking centralizes spawn_blocking + clone +
  into_diagnostic dance, replaces 2 inline copies
- materialize_channel + spawn_gvs_prewarm + MATERIALIZE_CHANNEL_CAPACITY
  consolidate the channel construction + tokio::spawn boilerplate
@jdx
Copy link
Copy Markdown
Contributor

jdx commented May 5, 2026

holy guacamole is all I got to say

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR overhaulds the cold-install pipeline for significant speed wins on Windows (1.8x–8.75x faster than bun), with Linux and macOS improvements as well. The streaming tarball pipeline (AUBE_TARBALL_STREAM=1), O_TMPFILE+linkat CAS fast path, foldhash migration, GVS-prewarm materializer, and bounded mpsc channel are the primary contributors.

  • Streaming pipeline: HTTP body chunks pipe through SHA-512 + gz + tar + CAS via ChunkReader mpsc bridge in spawn_blocking; gated behind AUBE_TARBALL_STREAM=1 with a AUBE_DISABLE_TARBALL_STREAM kill-switch pending full bats coverage.
  • Linux CAS fast path: O_TMPFILE+linkat via /proc/self/fd with EOPNOTSUPP/EPERM/EACCES graceful fallback and posix_fallocate pre-allocation for ext4 fragmentation avoidance.
  • Correctness fixes: apply_peer_contexts now returns Result and fatally rejects non-converging peer graphs; state::remove_state errors are propagated; synthetic tar entry injection is detected.

Confidence Score: 5/5

Safe to merge. The streaming path is opt-in and off by default, so the two minor edge cases found do not affect existing install behavior.

All new fast paths are cfg-gated or env-gated and fall back gracefully. Both findings are limited to the opt-in streaming path and do not affect the default code path.

The streaming tarball pipeline in lifecycle.rs and env-flag parsing in mod.rs are the areas to revisit before making AUBE_TARBALL_STREAM default-on.

Important Files Changed

Filename Overview
crates/aube/src/commands/install/lifecycle.rs Adds the streaming tarball fetch+import pipeline and GVS-prewarm helpers. SHA-512 correctly covers the full body. Network error during post-EOF drain discards a successfully completed import.
crates/aube/src/commands/install/mod.rs Major install pipeline restructure. AUBE_TARBALL_STREAM presence check treats =0 as opt-in.
crates/aube-store/src/lib.rs Adds generic import_tarball_reader, O_TMPFILE+linkat CAS path for Linux, posix_fallocate, and synthetic-entry injection check. CappedReader correctly retained.
crates/aube-resolver/src/peer_context.rs apply_peer_contexts now returns Result and propagates PeerContextDivergence. Propagation in resolve.rs is correct.
crates/aube-util/src/io.rs New ChunkReader mpsc-to-Read bridge. Uses blocking_recv correctly inside spawn_blocking.
crates/aube-util/src/fs.rs cross_volume and set_not_content_indexed helpers. Correctly cfg-gated.
crates/aube-registry/src/client.rs Adds fetch_single_version_metadata and start_tarball_stream with scheme validation and offline-mode check.

Reviews (4): Last reviewed commit: "fixes" | Re-trigger Greptile

Comment thread crates/aube/src/commands/install/mod.rs
@imjustprism imjustprism marked this pull request as draft May 5, 2026 20:17
@imjustprism imjustprism marked this pull request as ready for review May 5, 2026 20:42
Comment thread crates/aube-store/src/lib.rs
@jdx jdx merged commit 5fac325 into endevco:main May 5, 2026
16 checks passed
@greptile-apps greptile-apps Bot mentioned this pull request May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants