Add deterministic execution mode for atomic operations

### Description

Add an opt-in deterministic execution mode for supported Warp atomic patterns so CUDA launches can produce bit-exact reproducible results across runs without requiring users to rewrite kernels.

The feature should cover common accumulation and allocator/counter patterns, including:

- floating-point accumulation atomics such as `wp.atomic_add`, `wp.atomic_sub`, `wp.atomic_min`, and `wp.atomic_max`
- in-place `+=` / `-=` lowering where it maps to supported atomics
- counter / allocator patterns that consume the return value of an atomic increment
- global, module-level, and kernel-level configuration controls

### Motivation

Warp users need a supported way to trade performance for reproducibility when kernels use atomics whose execution order can vary on CUDA. This is especially useful for simulation pipelines and downstream projects that need stable run-to-run results for debugging, testing, and reproducible experiments.

### Proposed implementation

PR #1355 implements this as `wp.config.deterministic`, with deterministic launch handling for supported atomic patterns:

- scatter/sort/reduce for floating-point accumulation atomics
- two-pass count/scan/execute for consumed-return counter patterns
- tests and documentation covering supported behavior and limitations

### Tracking PR

Implemented by #1355.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deterministic execution mode for atomic operations #1443

Description

Motivation

Proposed implementation

Tracking PR

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add deterministic execution mode for atomic operations #1443

Description

Description

Motivation

Proposed implementation

Tracking PR

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions