Native shading languages guarantees
MSL
Memory in the thread address space has thread coherence, and memory in the threadgroup address space has threadgroup coherence. By default, memory in the device address space has threadgroup coherence.
Metal 3.2 and later support the coherent(device) qualifiers for buffers and memory_coherence_device for textures to indicate that the object has device coherence, that is, memory operations are visible across threads on the device if you properly synchronize them.
from 4.8 Memory Coherency in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
Textures have memory_coherence_threadgroup by default.
See 2.9 Textures in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
Conclusion: MSL's threadgroup address space has threadgroup coherence. MSL has threadgroup coherent buffers and textures with optional device coherent buffers and textures.
HLSL
All prior g# memory reads or writes by this thread in program order are made visible to all threads in the thread group before any subsequent g# memory accesses by this thread.
This applies to all of the current thread group's g# shared memory.
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sync--sm5---asm-#_g
RWByteAddressBuffer objects can be prefixed with the storage class globallycoherent. This storage class causes memory barriers and syncs to flush data across the entire GPU such that other groups can see writes. Without this specifier, a memory barrier or sync will flush a UAV only within the current group.
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwbyteaddressbuffer
You can prefix RWTexture2D objects with the storage class globallycoherent. This storage class causes memory barriers and syncs to flush data across the entire GPU such that other groups can see writes. Without this specifier, a memory barrier or sync will flush only an unordered access view (UAV) within the current group.
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwtexture2d
Conclusion: HLSL's groupshared storage class has groupshared coherence. HLSL has group coherent buffers/textures and optionally global coherent buffers/textures.
GLSL & SPIR-V
Memory accesses to image variables declared using the coherent qualifier are performed coherently with similar accesses from other shader invocations. In particular, when reading a variable declared as coherent, the values returned will reflect the results of previously completed writes performed by other shader invocations. When writing a variable declared as coherent, the values written will be reflected in subsequent coherent reads performed by other shader invocations. As described in section 7.11 “Shader Memory Access” of the OpenGL Specification, shader memory reads and writes complete in a largely undefined order. The built-in function memoryBarrier() can be used if needed to guarantee the completion and relative ordering of memory accesses performed by a single shader invocation.
When accessing memory using variables not declared as coherent, the memory accessed by a shader may be cached by the implementation to service future accesses to the same address. Memory stores may be cached in such a way that the values written might not be visible to other shader invocations accessing the same memory. The implementation may cache the values fetched by memory reads and return the same values to any shader invocation accessing the same memory, even if the underlying memory has been modified since the first memory read. While variables not declared as coherent might not be useful for communicating between shader invocations, using non-coherent accesses may result in higher performance.
from 4.10 Memory Qualifiers in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
Shared variables are implicitly coherent. That is, writes to shared variables from one shader invocation will eventually be seen by other invocations within the same local work group.
from 4.3.8 Shared Variables section in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
Vulkan's coherent decoration has the same semantics as specified by the GLSL 4.50 spec since we use OpMemoryModel GLSL450 by default.
SPV_KHR_vulkan_memory_model exposes similar functionality as a replacement for the coherent decoration.
Conclusion: GLSL's shared variables & SPIR-V's workgroup storage class variables are implicitly coherent. GLSL's coherent qualifier & SPIR-V's coherent decoration can be used on buffer and texture variables to make them coherent.
Satisfying WGSL's non-atomic memory coherency requirements
Conclusion: If a compute shader contains a storageBarrier()/textureBarrier() barrier, the GLSL & SPIR-V backends need to qualify/decorate storage buffer/texture variables as coherent.
Satisfying WGSL's atomic memory coherency requirements
WGSL's atomic built-in functions include atomic read/write and read-modify-write functions. See https://www.w3.org/TR/WGSL/#atomic-builtin-functions.
The contents of the memory being updated by the atomic operation are guaranteed not to be modified by any other assignment or atomic memory function in any shader invocation between the time the original value is read and the time the new value is written.
from 8.11 Atomic Memory Functions of https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
If a Compute Shader thread in a given thread group needs to perform loads of data that was written by atomics or stores in another thread group, the UAV slot where the data resides must be tagged upon declaration in the shader as "globally coherent", so the implementation can ignore the local cache. Otherwise, this form of cross-thread group data sharing will produce undefined results.
Atomic read-modify-write operations do not have this constraint (even though a part of the operation is a read/load), because a byproduct of the hardware honoring atomicity is that the entire system sees the operation, whereas simple loads on some implementations may only go to a local cache that has no knowledge of external updates.
Importantly, for many scenarios where cross thread-group communication or reduction (such as histograms) can be accomplished using only atomic operations (no cross thread-group loads involved), there is no problem since atomic operations are implemented by all hardware in a globally coherent way, regardless of whether the UAV has been tagged as "globally coherent" or not.
from https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.14.4%20Global%20vs%20Group/Local%20Coherency%20on%20Non-Atomic%20UAV%20Reads
MSL has atomic types and atomic operations including load/store operations. See 6.15.4 Atomic Functions in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
SPIR-V has atomic operations including load/store operations. See OpAtomicLoad/OpAtomicStore in https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html.
GLSL's coherent qualifier and HLSL's globallycoherent storage class are not generally needed if a variable is only accessed atomically.
Conclusion: GLSL and HLSL don't have atomic load/store operations, so if a WGSL variable uses atomic load/store operations the translation to GLSL and HLSL needs to do a normal load/store and add the qualifier/storage class to the variable.
Native shading languages guarantees
MSL
from 4.8 Memory Coherency in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
Textures have
memory_coherence_threadgroupby default.See 2.9 Textures in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
Conclusion: MSL's threadgroup address space has threadgroup coherence. MSL has threadgroup coherent buffers and textures with optional device coherent buffers and textures.
HLSL
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sync--sm5---asm-#_g
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwbyteaddressbuffer
from https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sm5-object-rwtexture2d
Conclusion: HLSL's groupshared storage class has groupshared coherence. HLSL has group coherent buffers/textures and optionally global coherent buffers/textures.
GLSL & SPIR-V
from 4.10 Memory Qualifiers in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
from 4.3.8 Shared Variables section in https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
Vulkan's coherent decoration has the same semantics as specified by the GLSL 4.50 spec since we use OpMemoryModel GLSL450 by default.
SPV_KHR_vulkan_memory_modelexposes similar functionality as a replacement for the coherent decoration.Conclusion: GLSL's shared variables & SPIR-V's workgroup storage class variables are implicitly coherent. GLSL's coherent qualifier & SPIR-V's coherent decoration can be used on buffer and texture variables to make them coherent.
Satisfying WGSL's non-atomic memory coherency requirements
Conclusion: If a compute shader contains a
storageBarrier()/textureBarrier()barrier, the GLSL & SPIR-V backends need to qualify/decorate storage buffer/texture variables as coherent.Satisfying WGSL's atomic memory coherency requirements
WGSL's atomic built-in functions include atomic read/write and read-modify-write functions. See https://www.w3.org/TR/WGSL/#atomic-builtin-functions.
from 8.11 Atomic Memory Functions of https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.50.pdf
from https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.14.4%20Global%20vs%20Group/Local%20Coherency%20on%20Non-Atomic%20UAV%20Reads
MSL has atomic types and atomic operations including load/store operations. See 6.15.4 Atomic Functions in https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf.
SPIR-V has atomic operations including load/store operations. See
OpAtomicLoad/OpAtomicStorein https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html.GLSL's coherent qualifier and HLSL's globallycoherent storage class are not generally needed if a variable is only accessed atomically.
Conclusion: GLSL and HLSL don't have atomic load/store operations, so if a WGSL variable uses atomic load/store operations the translation to GLSL and HLSL needs to do a normal load/store and add the qualifier/storage class to the variable.