[API Proposal]: Volatile barrier APIs

### Background and motivation

This API proposal exposes methods to perform non-atomic volatile memory operations. Our volatile semantics are explained in our [memory model](https://github.com/dotnet/runtime/blob/main/docs/design/specs/Memory-model.md#order-of-memory-operations), but I will outline the tl;dr of the relevant parts here:
- Usually memory accesses can be re-ordered and combined by the C# compiler, JIT, and CPU
- Volatile writes disallow the write to occur earlier than specified (i.e., prior memory accesses must complete before this executes)
- Volatile reads disallow the read to occur later than specified (i.e., subsequent memory accesses can only begin after this completes)
- Memory accesses to primitive types are always atomic if they're properly aligned (unless `unaligned.` is used), and either 1) the size of the type is at most the size of the pointer, or 2) a method on `Volatile` or `Interlocked` such as `Volatile.Write(double&, double)` has been called

Currently, we expose APIs on `Volatile.` for the atomic memory accesses, but there is no way to perform the equivalent operations for non-atomic types. If we have Volatile barrier APIs, they will be easy to write, and it should make it clear which memory operations can move past the barrier in which ways.


### API Proposal

```csharp
namespace System.Threading;

public static class Volatile
{
    public static void ReadBarrier();
    public static void WriteBarrier();
}
```

=== Desired semantics:

- `Volatile.ReadBarrier()`

Provides a  `Read-ReadWrite` barrier.
All reads preceding the barrier will need to complete before any subsequent memory operation. 

`Volatile.ReadBarrier()` matches the semantics of `Volatile.Read` in terms of ordering reads, relative to _all_ subsequent, in program order, operations. 

The important difference from `Volatile.Read(ref x)` is that `Volatile.ReadBarrier()` has effect on all preceeding reads and not just a particular single read of `x`. 


- `Volatile.WriteBarrier()`

Provides a  `ReadWrite-Write` barrier.
All memory operations preceding the barrier will need to complete before any subsequent write. 

`Volatile.WriteBarrier()` matches the semantics of `Volatile.Write` in terms of ordering writes, relative to _all_ preceeding, in program order, operations. 

The important difference from `Volatile.Write(ref x)` is that `Volatile.WriteBarrier()` has effect on all subsequent writes and not just a particular single write of `x`. 

The actual implementation will depend on underlying platform. 
- On TSO architectures (x86/x64) it would have only compiler optimization-preventing effects. 
- On weaker architectures some ordering instructions will be emitted in addition - as appropriate and available in the given ISA.

### API Usage

The runtime uses an internal API `Interlocked.ReadMemoryBarrier()` in 2 places ([here](https://github.com/dotnet/runtime/blob/89a18512f81a2c388e26871bdf9044c5b6c1a5c3/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastCache.cs#L175) and [here](https://github.com/dotnet/runtime/blob/89a18512f81a2c388e26871bdf9044c5b6c1a5c3/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/GenericCache.cs#L165)) to batch multiple reads on both CoreCLR and NativeAOT, and is supported on all platforms. This ability is also useful to third-party developers (such as me, in my example below), but is currently not possible to write efficiently.

An example where non-atomic volatile operations would be useful is as follows. Consider a game which wants to save its state, ideally while continuing to run; these are the most obvious options:
- Save on the main thread (can cause lag spikes if lots of data to save)
- Create a copy of the data and then save on a separate thread (less time spent on main thread, but potentially still non-trivial, and a large number of allocations most likely)
- Use locks around access to the memory (massive overhead)
- Just save it on another thread without any sync (saves an inconsistent state)

But there is actually another option which utilises non-atomic volatile semantics:
- Has low overhead, especially on x86 and x64
- Saves a consistent state

```csharp
//main threads sets IsSaving to true and increments SavingVersion before starting the saving thread, and to false once it's definitely done (e.g., on next frame)
//saving thread performs a full memory barrier before starting (when required, since starting a brand new thread every time isn't ideal), to ensure that _value is up-to-date
//memory synchronisation works because _value is always read before any saving information, and it's always written after the saving information
//if the version we read on the saving thread is not the current version, then our read from _value is correct, otherwise our read from _savingValue will be correct
//in the rare case that we loop to saving version == 0, then we can manually write all _savingVersion values to 0, skip to version == 1, and go from there (excluded from here though for clarity)

static class SavingState
{
    public static bool IsSaving { get; set; }
    public static nuint SavingVersion { get; set; }
}

struct SaveableHolder<T>
{
    nuint _savingVersion;
    T _value;
    T _savingValue;

    //Called only from main thread
    public T Value
    {
        get => _value;
        set
        {
            if (SavingState.IsSaving)
            {
                if (SavingVersion != SavingState.SavingVersion)
                {
                    _savingValue = _value;

                    //ensure the saving value is written before the saving version, so that we read it in the correct order
                    Volatile.Write(ref _savingVersion, SavingState.SavingVersion);
                }

                //_value can only become torn or incorrect after we have written our saving value and version
                Volatile.WriteBarrier();
                _value = value; //write must occur after prior writes
            }
            else
            {
                _value = value;
            }
        }
    }

    //Called only from saving thread while SavingState.IsSaving with a higher SavingState.SavingVersion than last time
    public T SavingValue
    {
        get
        {
            var value = Value; //read must occur before reads
            Volatile.ReadBarrier();

            //_savingVersion must be read after _value is, so if it's visibly changed/changing then we will either catch it here
            if (Volatile.Read(in _savingVersion) != SavingState.SavingVersion) return value;

            //volatile read on _savingVersion ensures we get an up-to-date _savingValue since it's written first
            return _savingValue;
        }
    }
}
```


### Alternative Designs

- We could expose read/write APIs instead:
```csharp
namespace System.Threading;

public static class Volatile
{
    public static T ReadNonAtomic<T>(ref readonly T location) where T : allows ref struct
    {
        //ldarg.0
        //volatile.
        //ldobj !!T
    }

    public static void WriteNonAtomic<T>(ref T location, T value) where T : allows ref struct
    {
        //ldarg.0
        //ldarg.1
        //volatile.
        //stobj !!T
    }
}
```

We do have IL instructions, but they're currently broken and not exposed, see https://github.com/dotnet/runtime/issues/91530 - the proposal here was originally to expose APIs for `volatile. ldobj` and `volatile. stobj` + the unaligned variants (as seen aobe), and fix the instructions (or implement these without the instructions and have the instructions call these APIs - not much of a difference really). It was changed based on feedback to expose barrier APIs, which can provide equivalent semantics, but also allow additional scenarios. It is also clearer which memory operations can be reordered with the barrier APIs.

- We could expose APIs on Interlocked instead:
```csharp
public static class Interlocked
{
    // Existing API
    public static void MemoryBarrier();
    // New APIs
    public static void MemoryBarrierAcquire(); //volatile read semantics
    public static void MemoryBarrierRelease(); //volatile write semantics
}
```
- We could expose the APIs on `Unsafe` instead:
```csharp
namespace System.Runtime.CompilerServices;

public static class Unsafe
{
    public static T ReadVolatile<T>(ref readonly T location) where T : allows ref struct;
    public static void WriteVolatile<T>(ref T location, T value) where T : allows ref struct;
}
```
- We could add unaligned overloads:
```csharp
namespace System.Runtime.CompilerServices;

public static class Unsafe
{
    public static T ReadVolatileUnaligned<T>(ref readonly byte location) where T : allows ref struct;
    public static void WriteVolatileUnaligned<T>(ref byte location, T value) where T : allows ref struct;
}
```
- We could also expose APIs for other operations which allow `volatile.` - `initblk` and `cpblk`, people may have use for these also:
```csharp
namespace System.Runtime.CompilerServices;

public static class Unsafe
{
    public static void CopyBlockVolatile(ref byte destination, ref readonly byte source, uint byteCount);
    public static void CopyBlockVolatileUnaligned(ref byte destination, ref readonly byte source, uint byteCount);
    public static void InitBlockVolatile(ref byte startAddress, byte value, uint byteCount);
    public static void InitBlockVolatileUnaligned(ref byte startAddress, byte value, uint byteCount);
}
```
- We could expose APIs similar to what C++ has: https://en.cppreference.com/w/cpp/atomic/memory_order


### Open Questions

There is a question as to whether we should have `Read-ReadWrite`/`ReadWrite-Write` barriers or `Read-Read`/`Write-Write` barriers. I was initially in favour of the former (as it matches our current memory model), but now think the latter is probably better, since there are many scenarios (including in my example API usage, and the runtime's uses too) where the additional guarantees provided by the former are unnecessary, and thus may cause unnecessary overhead. We could also just provide both if we think they're both useful.


### Risks

No more than other volatile/interlocked APIs really, other than potential misunderstanding of what they do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[API Proposal]: Volatile barrier APIs #98837

Background and motivation

API Proposal

API Usage

Alternative Designs

Open Questions

Risks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[API Proposal]: Volatile barrier APIs #98837

Description

Background and motivation

API Proposal

API Usage

Alternative Designs

Open Questions

Risks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions