Skip to content

Rework CI benchmark suite for per-stage granularity and I/O elimination #8642

@Boshen

Description

@Boshen

Current State

What's measured:

  • bundle benchmark: end-to-end bundler.generate() (scan + link + generate combined)
  • scan benchmark: bundler.scan() (module loading + parsing + AST scanning)
  • Test cases: threejs, rome_ts, multi-duplicated-symbols, with sourcemap/minify variants
  • Micro-benchmarks: sourcemap joining, string concatenation

What's NOT measured independently:

  • Link stage (symbol binding, tree shaking, import/export resolution, cross-module optimization)
  • Generate stage (code splitting, cross-chunk linking, chunk rendering, minification)

Problems

1. I/O dominates measurements

Both the bundle and scan benchmarks read hundreds of files from disk. This introduces noise on real machines (disk cache variability) and unreliable instruction counts in CodSpeed (I/O syscalls). The Node.js CI benchmark has a 110% alert threshold, meaning up to 10% regressions go undetected.

2. No per-stage granularity

The bundler pipeline has 3 distinct stages — scan, link, generate — but only end-to-end and scan-only are benchmarked. When someone optimizes tree-shaking or code generation, the improvement is diluted across the full pipeline and often invisible.

3. Recent optimizations were undetectable

Examples from commit history:

  • HashMap → IndexVec/IndexBitSet for symbol tracking (link stage)
  • Flag-based convergence in include_statements (tree shaking, link stage)
  • String operation fast paths (generate stage)
  • IndexBitSet for skipped plugins checking (link stage)
  • Path allocation avoidance (scan stage, but diluted by I/O)

These are all sub-stage optimizations that get lost in end-to-end numbers.

4. Link and generate stages are pure computation but aren't benchmarked independently

The link stage takes NormalizedScanStageOutput and does symbol resolution, tree shaking, etc. entirely in memory (zero I/O). The generate stage is also mostly I/O-free. Yet neither is benchmarked independently, so we miss the chance for noise-free measurements.

5. Bundler is hardcoded to OsFileSystem

ScanStage, SharedResolver, Bundle, and prepare_build_context() all use OsFileSystem directly. A MemoryFileSystem implementation already exists in crates/rolldown_fs/src/memory.rs (feature-gated behind memory), but it can't be used because the entry points aren't generic over the FileSystem trait.

6. No benchmark cases targeting specific optimization patterns

No heavy tree-shaking scenario (large library, small subset used), no heavy code-splitting scenario (many entry points, shared deps), no deep re-export chain scenario (barrel files).

Key Code References

  • crates/bench/benches/bundle.rs — current end-to-end bundle benchmark
  • crates/bench/benches/scan.rs — current scan-only benchmark
  • crates/rolldown/src/bundle/bundle.rs:235-247bundle_up() showing scan → link → generate pipeline
  • crates/rolldown/src/stages/link_stage/mod.rs:197LinkStage::link() (pure computation)
  • crates/rolldown/src/stages/generate_stage/mod.rs:82GenerateStage::generate()
  • crates/rolldown_fs/src/memory.rs — existing MemoryFileSystem implementation
  • crates/rolldown/src/utils/prepare_build_context.rs — where OsFileSystem is hardcoded

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Priority

None yet

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions