Satisfy WGSL's memory coherency requirements

# Native shading languages guarantees

## MSL

> Memory in the thread address space has thread coherence, and memory in the threadgroup address space has threadgroup coherence. By default, memory in the device address space has threadgroup coherence.
> Metal 3.2 and later support the coherent(device) qualifiers for buffers and memory_coherence_device for textures to indicate that the object has device coherence, that is, memory operations are visible across threads on the device if you properly synchronize them.

from 4.8 Memory Coherency in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf

Textures have `memory_coherence_threadgroup` by default.
See 2.9 Textures in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf

**Conclusion**: MSL's threadgroup address space has threadgroup coherence. MSL has threadgroup coherent buffers and textures with optional device coherent buffers and textures.

## HLSL

> All prior g# memory reads or writes by this thread in program order are made visible to all threads in the thread group before any subsequent g# memory accesses by this thread.
>
> This applies to all of the current thread group's g# shared memory.

from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sync--sm5---asm-#_g

> RWByteAddressBuffer objects can be prefixed with the storage class globallycoherent. This storage class causes memory barriers and syncs to flush data across the entire GPU such that other groups can see writes. Without this specifier, a memory barrier or sync will flush a UAV only within the current group.

from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwbyteaddressbuffer

> You can prefix RWTexture2D objects with the storage class globallycoherent. This storage class causes memory barriers and syncs to flush data across the entire GPU such that other groups can see writes. Without this specifier, a memory barrier or sync will flush only an unordered access view (UAV) within the current group.

from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwtexture2d

**Conclusion**: HLSL's groupshared storage class has groupshared coherence. HLSL has group coherent buffers/textures and optionally global coherent buffers/textures.

## GLSL & SPIR-V

> Memory accesses to image variables declared using the coherent qualifier are performed coherently with similar accesses from other shader invocations. In particular, when reading a variable declared as coherent, the values returned will reflect the results of previously completed writes performed by other shader invocations. When writing a variable declared as coherent, the values written will be reflected in subsequent coherent reads performed by other shader invocations. As described in section 7.11 “Shader Memory Access” of the OpenGL Specification, shader memory reads and writes complete in a largely undefined order. The built-in function memoryBarrier() can be used if needed to guarantee the completion and relative ordering of memory accesses performed by a single shader invocation.
>
> When accessing memory using variables not declared as coherent, the memory accessed by a shader may be cached by the implementation to service future accesses to the same address. Memory stores may be cached in such a way that the values written might not be visible to other shader invocations accessing the same memory. The implementation may cache the values fetched by memory reads and return the same values to any shader invocation accessing the same memory, even if the underlying memory has been modified since the first memory read. While variables not declared as coherent might not be useful for communicating between shader invocations, using non-coherent accesses may result in higher performance.

from 4.10 Memory Qualifiers in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf

> Shared variables are implicitly coherent. That is, writes to shared variables from one shader invocation will eventually be seen by other invocations within the same local work group.

from 4.3.8 Shared Variables section in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf

Vulkan's coherent decoration has the same semantics as specified by the GLSL 4.50 spec since we use OpMemoryModel GLSL450 by default.

`SPV_KHR_vulkan_memory_model` exposes similar functionality as a replacement for the coherent decoration.

**Conclusion**: GLSL's shared variables & SPIR-V's workgroup storage class variables are implicitly coherent. GLSL's coherent qualifier & SPIR-V's coherent decoration can be used on buffer and texture variables to make them coherent.

# Satisfying WGSL's non-atomic memory coherency requirements

- WGSL non-atomic accesses to workgroup variables are required to be workgroup coherent. See https://www.w3.org/TR/WGSL/#private-vs-non-private.
  - All shading languages satisfy this by default.
- WGSL non-atomic buffer/texture accesses are required to be workgroup coherent. See https://www.w3.org/TR/WGSL/#private-vs-non-private.
  - MSL & HLSL satisfy this by default. GLSL variables need the coherent qualifier. SPIR-V variables need the coherent decoration.
- WGSL only has workgroup barriers. See https://www.w3.org/TR/WGSL/#sync-builtin-functions.
  - Memory that is not coherent can only be meaningfully observed after a barrier. See https://github.com/gpuweb/gpuweb/issues/1621#issuecomment-816879400.

**Conclusion**: If a compute shader contains a `storageBarrier()`/`textureBarrier()` barrier, the GLSL & SPIR-V backends need to qualify/decorate storage buffer/texture variables as coherent.

# Satisfying WGSL's atomic memory coherency requirements

WGSL's atomic built-in functions include atomic read/write and read-modify-write functions. See https://www.w3.org/TR/WGSL/#atomic-builtin-functions.

> The contents of the memory being updated by the atomic operation are guaranteed not to be modified by any other assignment or atomic memory function in any shader invocation between the time the original value is read and the time the new value is written.

from 8.11 Atomic Memory Functions of https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf

> If a Compute Shader thread in a given thread group needs to perform loads of data that was written by atomics or stores in another thread group, the UAV slot where the data resides must be tagged upon declaration in the shader as "globally coherent", so the implementation can ignore the local cache. Otherwise, this form of cross-thread group data sharing will produce undefined results. 

> Atomic read-modify-write operations do not have this constraint (even though a part of the operation is a read/load), because a byproduct of the hardware honoring atomicity is that the entire system sees the operation, whereas simple loads on some implementations may only go to a local cache that has no knowledge of external updates. 

> Importantly, for many scenarios where cross thread-group communication or reduction (such as histograms) can be accomplished using only atomic operations (no cross thread-group loads involved), there is no problem since atomic operations are implemented by all hardware in a globally coherent way, regardless of whether the UAV has been tagged as "globally coherent" or not.

from https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.14.4%20Global%20vs%20Group/Local%20Coherency%20on%20Non-Atomic%20UAV%20Reads

MSL has atomic types and atomic operations including load/store operations. See 6.15.4 Atomic Functions in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.

SPIR-V has atomic operations including load/store operations. See `OpAtomicLoad`/`OpAtomicStore` in https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html.

GLSL's coherent qualifier and HLSL's globallycoherent storage class are not generally needed if a variable is only accessed atomically.

**Conclusion**: GLSL and HLSL don't have atomic load/store operations, so if a WGSL variable uses atomic load/store operations the translation to GLSL and HLSL needs to do a normal load/store and add the qualifier/storage class to the variable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Satisfy WGSL's memory coherency requirements #9221

Native shading languages guarantees

MSL

HLSL

GLSL & SPIR-V

Satisfying WGSL's non-atomic memory coherency requirements

Satisfying WGSL's atomic memory coherency requirements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Satisfy WGSL's memory coherency requirements #9221

Description

Native shading languages guarantees

MSL

HLSL

GLSL & SPIR-V

Satisfying WGSL's non-atomic memory coherency requirements

Satisfying WGSL's atomic memory coherency requirements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions