Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Submit Results

CSV format

Create a CSV file named <device-slug>.csv with these columns:

device_id,kernel_id,dtype,input_shape,batch_size,impl_lang,latency_us,driver_version,toolchain,git_sha,submitter
nvidia-h100-sxm,softmax,f32,"[64, 1024]",1,cuda,12.3,CUDA 12.4,nvcc 12.4,abc1234,your-name

Steps

  1. Fork the pu-rs.org repo
  2. Add your CSV to submissions/
  3. Open a pull request
  4. CI validates format and sanity checks
  5. Maintainers review and merge

Requirements

  • Minimum 20 runs per (kernel, shape) pair
  • Report median latency
  • Include driver version and toolchain
  • Device must exist in db/seed_devices.sql (or add it in the same PR)

Running the benchmark

All benchmark scripts live in this repo under scripts/.

# Metal (Apple Silicon)
# Requires: ascend_metal_kernels Python module
#   (build: cd ascend-rs/crates/ascend_metal_py && maturin develop --release)
ASCEND_METAL_KERNELS=1 python3 scripts/bench_metal.py --device apple-m2-max-38
ASCEND_METAL_KERNELS=1 python3 scripts/bench_metal.py --device apple-m4-max-40 -o submissions/m4-max.csv

# Ascend NPU (Huawei 910B/910C)
# Requires: CANN SDK + ascend-rs repo cloned locally
bash scripts/bench_ascend.sh --device huawei-910b
bash scripts/bench_ascend.sh --device huawei-910c --only softmax --ascend-rs ~/ascend-rs

Supported backends

BackendScriptPrerequisites
Apple Metalscripts/bench_metal.pyascend_metal_kernels Python module (build instructions)
Huawei Ascendscripts/bench_ascend.shCANN SDK + ascend-rs repo
NVIDIA CUDAscripts/bench_cuda.pyPlanned
AMD ROCmscripts/bench_rocm.pyPlanned