Skip to content

PROJECT: Performance Benchmarks Suite for Numba, Dask, JAX (CPU/GPU) #264

@mmcky

Description

@mmcky

Overview

Build a comprehensive benchmarks suite for computational libraries commonly used in QuantEcon projects to help define optimal workflow and program structure guidelines.

Motivation

Different computational backends (Numba, Dask, JAX CPU, JAX GPU) have different performance characteristics and overheads:

  • JAX GPU has kernel launch overheads, making it only worthwhile for problems above a certain size
  • Numba has JIT compilation overhead on first call
  • Dask has scheduling overhead that may not pay off for small datasets
  • JAX CPU vs GPU crossover points vary by operation type

Currently, guidance on when to use each backend is based on rules of thumb. A systematic benchmarking suite would provide data-driven recommendations.

Proposed Features

1. Benchmark Categories

  • Array operations: matrix multiplication, element-wise ops, reductions
  • Numerical algorithms: linear solvers, optimization, root finding
  • Monte Carlo simulations: varying sample sizes
  • Dynamic programming: value function iteration, policy iteration
  • Common QuantEcon patterns: Markov chains, asset pricing, etc.

2. Problem Size Scaling

For each benchmark, test across problem sizes to identify:

  • Minimum problem size where GPU becomes beneficial
  • Crossover points between backends
  • Memory bandwidth vs compute bound regimes

3. Automated Results Updates

  • CI workflow that runs benchmarks on schedule (weekly/monthly)
  • Triggered on new releases of key libraries (JAX, Numba, etc.)
  • Results published to a dashboard or static site
  • Historical tracking to show performance changes over versions

4. Guidelines Generation

  • Automatically generate recommendations based on benchmark results
  • "Use JAX GPU when matrix size > N" type guidance
  • Integration with lecture documentation

Example Output

Matrix Multiplication Crossover Points (JAX 0.4.x, CUDA 12.x, A100)
-------------------------------------------------------------------
Size < 500x500:     CPU faster (GPU overhead dominates)
Size 500-2000:      Similar performance
Size > 2000x2000:   GPU 10-100x faster

Recommendation: Use JAX GPU for matrices larger than 1000x1000

Technical Approach

  • Use pytest-benchmark or similar for consistent measurement
  • Run on standardized hardware (GitHub Actions runners, cloud GPU instances)
  • Store results in structured format (JSON/Parquet)
  • Generate reports and visualizations automatically

Related

Questions to Explore

  1. What hardware configurations should we benchmark on?
  2. Should this be a standalone repo or part of an existing project?
  3. How to handle GPU availability in CI (cost considerations)?
  4. What's the update frequency that balances freshness vs compute cost?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions