Skip to content

Add deterministic execution mode for atomic operations #1443

@eric-heiden

Description

@eric-heiden

Description

Add an opt-in deterministic execution mode for supported Warp atomic patterns so CUDA launches can produce bit-exact reproducible results across runs without requiring users to rewrite kernels.

The feature should cover common accumulation and allocator/counter patterns, including:

  • floating-point accumulation atomics such as wp.atomic_add, wp.atomic_sub, wp.atomic_min, and wp.atomic_max
  • in-place += / -= lowering where it maps to supported atomics
  • counter / allocator patterns that consume the return value of an atomic increment
  • global, module-level, and kernel-level configuration controls

Motivation

Warp users need a supported way to trade performance for reproducibility when kernels use atomics whose execution order can vary on CUDA. This is especially useful for simulation pipelines and downstream projects that need stable run-to-run results for debugging, testing, and reproducible experiments.

Proposed implementation

PR #1355 implements this as wp.config.deterministic, with deterministic launch handling for supported atomic patterns:

  • scatter/sort/reduce for floating-point accumulation atomics
  • two-pass count/scan/execute for consumed-return counter patterns
  • tests and documentation covering supported behavior and limitations

Tracking PR

Implemented by #1355.

Metadata

Metadata

Assignees

Labels

feature requestRequest for something to be added
No fields configured for Enhancement.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions