Skip to content

Optimize apache-airflow (warm cache) #10344

@konstin

Description

@konstin

I've implemented a series of optimizations for the apache-airflow resolver case, focussed on the resolver thread. Each step targets something that showed up as slow during profiling. Since the benchmarks are path dependent (the speedup of commit B looks different depending on whether a commit A was applied before or not. You need to speed up the slowest item to really see the effect on optimizing the next smaller one), I'm sharing the benchmark numbers here and keep the PRs focused on explaining the code change.

Commits:

  1. Remove [u64; 4] from small version to move Arc to full version
  2. Speed up file pins
  3. Simplify requirements_for_extra (Note: Refactoring)
  4. Optimize requirements_for_extra
  5. Split determining and executing batch prefetch (Note: Refactoring)
  6. Split out BatchPrefetcherRunner (Note: Refactoring)
  7. Deactivate tracing for choose version
  8. Avoid overcounting versions in batch prefetcher
  9. Add Send bounds to cache deserialization (Note: Refactoring)

The commits are split into refactorings and speedups.

Benchmark 1: ./uv-main pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     444.1 ms ±  10.5 ms    [User: 626.4 ms, System: 190.7 ms]
  Range (min … max):   434.5 ms … 468.0 ms    10 runs
 
Benchmark 2: ./uv-1 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     396.1 ms ±   7.0 ms    [User: 554.1 ms, System: 171.3 ms]
  Range (min … max):   386.8 ms … 408.7 ms    10 runs
 
Benchmark 3: ./uv-2 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     344.2 ms ±   3.7 ms    [User: 481.9 ms, System: 167.7 ms]
  Range (min … max):   339.7 ms … 349.8 ms    10 runs
 
Benchmark 4: ./uv-3 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     344.4 ms ±   4.0 ms    [User: 488.3 ms, System: 154.4 ms]
  Range (min … max):   340.0 ms … 354.0 ms    10 runs
 
Benchmark 5: ./uv-4 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     325.2 ms ±   3.0 ms    [User: 467.4 ms, System: 159.7 ms]
  Range (min … max):   321.5 ms … 329.4 ms    10 runs
 
Benchmark 6: ./uv-5 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     329.4 ms ±   6.1 ms    [User: 474.2 ms, System: 157.8 ms]
  Range (min … max):   324.0 ms … 344.2 ms    10 runs
 
Benchmark 7: ./uv-6 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     328.5 ms ±   3.7 ms    [User: 466.4 ms, System: 162.5 ms]
  Range (min … max):   324.4 ms … 335.0 ms    10 runs
 
Benchmark 8: ./uv-7 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     321.1 ms ±   3.2 ms    [User: 453.4 ms, System: 163.0 ms]
  Range (min … max):   316.8 ms … 327.5 ms    10 runs
 
Benchmark 9: ./uv-8 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     213.2 ms ±   3.2 ms    [User: 300.7 ms, System: 115.9 ms]
  Range (min … max):   207.5 ms … 220.2 ms    14 runs
 
Benchmark 10: ./uv-9 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     210.9 ms ±   1.7 ms    [User: 295.8 ms, System: 118.4 ms]
  Range (min … max):   207.7 ms … 214.6 ms    14 runs
 
Summary
  ./uv-9 pip compile scripts/requirements/airflow.in ran
    1.01 ± 0.02 times faster than ./uv-8 pip compile scripts/requirements/airflow.in
    1.52 ± 0.02 times faster than ./uv-7 pip compile scripts/requirements/airflow.in
    1.54 ± 0.02 times faster than ./uv-4 pip compile scripts/requirements/airflow.in
    1.56 ± 0.02 times faster than ./uv-6 pip compile scripts/requirements/airflow.in
    1.56 ± 0.03 times faster than ./uv-5 pip compile scripts/requirements/airflow.in
    1.63 ± 0.02 times faster than ./uv-2 pip compile scripts/requirements/airflow.in
    1.63 ± 0.02 times faster than ./uv-3 pip compile scripts/requirements/airflow.in
    1.88 ± 0.04 times faster than ./uv-1 pip compile scripts/requirements/airflow.in
    2.11 ± 0.05 times faster than ./uv-main pip compile scripts/requirements/airflow.in

Methodology:

git switch main && cargo build --profile profiling && cp target/profiling/uv uv-main && git switch -
# [Rebase with stop at every commit]
cargo build --profile profiling && cp target/profiling/uv uv-1 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-2 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-3 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-4 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-5 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-6 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-7 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-8 && git rebase --continue
cargo build --profile profiling && cp target/profiling/uv uv-9 && git rebase --continue

uv venv -p 3.12 && hyperfine --warmup 2 \
    "./uv-main pip compile scripts/requirements/airflow.in" \
    "./uv-1 pip compile scripts/requirements/airflow.in" \
    "./uv-2 pip compile scripts/requirements/airflow.in" \
    "./uv-3 pip compile scripts/requirements/airflow.in" \
    "./uv-4 pip compile scripts/requirements/airflow.in" \
    "./uv-5 pip compile scripts/requirements/airflow.in" \
    "./uv-6 pip compile scripts/requirements/airflow.in" \
    "./uv-7 pip compile scripts/requirements/airflow.in" \
    "./uv-8 pip compile scripts/requirements/airflow.in" \
    "./uv-9 pip compile scripts/requirements/airflow.in"

Overall benchmarks:

# Scheduler: Performance
$ hyperfine --warmup 2 --runs 50 "./uv-main pip compile scripts/requirements/airflow.in" "./uv-8 pip compile scripts/requirements/airflow.in"
Benchmark 1: ./uv-main pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     439.7 ms ±   6.4 ms    [User: 616.1 ms, System: 200.6 ms]
  Range (min … max):   430.4 ms … 454.7 ms    50 runs
 
Benchmark 2: ./uv-8 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     214.1 ms ±   5.1 ms    [User: 297.9 ms, System: 123.9 ms]
  Range (min … max):   206.8 ms … 228.0 ms    50 runs
 
Summary
  ./uv-8 pip compile scripts/requirements/airflow.in ran
    2.05 ± 0.06 times faster than ./uv-main pip compile scripts/requirements/airflow.in
# Scheduler: Performance
$ taskset -c 0-1 hyperfine --warmup 2 "./uv-main pip compile scripts/requirements/airflow.in" "./uv-8 pip compile scripts/requirements/airflow.in"
Benchmark 1: ./uv-main pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     595.8 ms ±   4.5 ms    [User: 774.2 ms, System: 201.2 ms]
  Range (min … max):   590.8 ms … 603.8 ms    10 runs
 
Benchmark 2: ./uv-8 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     301.9 ms ±   2.5 ms    [User: 389.8 ms, System: 121.6 ms]
  Range (min … max):   299.4 ms … 306.9 ms    10 runs
 
Summary
  ./uv-8 pip compile scripts/requirements/airflow.in ran
    1.97 ± 0.02 times faster than ./uv-main pip compile scripts/requirements/airflow.in
# Scheduler: Powersave
$ hyperfine --warmup 1 "./uv-main pip compile scripts/requirements/airflow.in" "./uv-8 pip compile scripts/requirements/airflow.in"
Benchmark 1: ./uv-main pip compile scripts/requirements/airflow.in
  Time (mean ± σ):      1.222 s ±  0.222 s    [User: 1.718 s, System: 0.575 s]
  Range (min … max):    0.652 s …  1.424 s    10 runs
 
Benchmark 2: ./uv-8 pip compile scripts/requirements/airflow.in
  Time (mean ± σ):     729.3 ms ± 129.2 ms    [User: 1030.7 ms, System: 370.5 ms]
  Range (min … max):   549.4 ms … 961.6 ms    10 runs
 
Summary
  ./uv-8 pip compile scripts/requirements/airflow.in ran
    1.68 ± 0.42 times faster than ./uv-main pip compile scripts/requirements/airflow.in
# Scheduler: Default
$ hyperfine --warmup 1 "./uv-main pip compile scripts/requirements/airflow.in --refresh" "./uv-8 pip compile scripts/requirements/airflow.in --refresh"
Benchmark 1: ./uv-main pip compile scripts/requirements/airflow.in --refresh
  Time (mean ± σ):     819.6 ms ± 112.8 ms    [User: 737.7 ms, System: 387.5 ms]
  Range (min … max):   723.3 ms … 1116.1 ms    10 runs
 
Benchmark 2: ./uv-8 pip compile scripts/requirements/airflow.in --refresh
  Time (mean ± σ):     597.7 ms ± 106.0 ms    [User: 440.2 ms, System: 305.8 ms]
  Range (min … max):   494.6 ms … 835.9 ms    10 runs
 
Summary
  ./uv-8 pip compile scripts/requirements/airflow.in --refresh ran
    1.37 ± 0.31 times faster than ./uv-main pip compile scripts/requirements/airflow.in --refresh

Profile of the resolver thread of apache airflow warm cache pip resolve before:

cargo build --profile profiling
samply record --rate 20000 target/profiling/uv pip compile scripts/requirements/airflow.in > /dev/null

Image

Profile after:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential performance improvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions