Description
Add an opt-in deterministic execution mode for supported Warp atomic patterns so CUDA launches can produce bit-exact reproducible results across runs without requiring users to rewrite kernels.
The feature should cover common accumulation and allocator/counter patterns, including:
- floating-point accumulation atomics such as
wp.atomic_add, wp.atomic_sub, wp.atomic_min, and wp.atomic_max
- in-place
+= / -= lowering where it maps to supported atomics
- counter / allocator patterns that consume the return value of an atomic increment
- global, module-level, and kernel-level configuration controls
Motivation
Warp users need a supported way to trade performance for reproducibility when kernels use atomics whose execution order can vary on CUDA. This is especially useful for simulation pipelines and downstream projects that need stable run-to-run results for debugging, testing, and reproducible experiments.
Proposed implementation
PR #1355 implements this as wp.config.deterministic, with deterministic launch handling for supported atomic patterns:
- scatter/sort/reduce for floating-point accumulation atomics
- two-pass count/scan/execute for consumed-return counter patterns
- tests and documentation covering supported behavior and limitations
Tracking PR
Implemented by #1355.
Description
Add an opt-in deterministic execution mode for supported Warp atomic patterns so CUDA launches can produce bit-exact reproducible results across runs without requiring users to rewrite kernels.
The feature should cover common accumulation and allocator/counter patterns, including:
wp.atomic_add,wp.atomic_sub,wp.atomic_min, andwp.atomic_max+=/-=lowering where it maps to supported atomicsMotivation
Warp users need a supported way to trade performance for reproducibility when kernels use atomics whose execution order can vary on CUDA. This is especially useful for simulation pipelines and downstream projects that need stable run-to-run results for debugging, testing, and reproducible experiments.
Proposed implementation
PR #1355 implements this as
wp.config.deterministic, with deterministic launch handling for supported atomic patterns:Tracking PR
Implemented by #1355.