-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Background and motivation
This API proposal exposes methods to perform non-atomic volatile memory operations. Our volatile semantics are explained in our memory model, but I will outline the tl;dr of the relevant parts here:
- Usually memory accesses can be re-ordered and combined by the C# compiler, JIT, and CPU
- Volatile writes disallow the write to occur earlier than specified (i.e., prior memory accesses must complete before this executes)
- Volatile reads disallow the read to occur later than specified (i.e., subsequent memory accesses can only begin after this completes)
- Memory accesses to primitive types are always atomic if they're properly aligned (unless
unaligned.is used), and either 1) the size of the type is at most the size of the pointer, or 2) a method onVolatileorInterlockedsuch asVolatile.Write(double&, double)has been called
Currently, we expose APIs on Volatile. for the atomic memory accesses, but there is no way to perform the equivalent operations for non-atomic types. If we have Volatile barrier APIs, they will be easy to write, and it should make it clear which memory operations can move past the barrier in which ways.
API Proposal
namespace System.Threading;
public static class Volatile
{
public static void ReadBarrier();
public static void WriteBarrier();
}=== Desired semantics:
Volatile.ReadBarrier()
Provides a Read-ReadWrite barrier.
All reads preceding the barrier will need to complete before any subsequent memory operation.
Volatile.ReadBarrier() matches the semantics of Volatile.Read in terms of ordering reads, relative to all subsequent, in program order, operations.
The important difference from Volatile.Read(ref x) is that Volatile.ReadBarrier() has effect on all preceeding reads and not just a particular single read of x.
Volatile.WriteBarrier()
Provides a ReadWrite-Write barrier.
All memory operations preceding the barrier will need to complete before any subsequent write.
Volatile.WriteBarrier() matches the semantics of Volatile.Write in terms of ordering writes, relative to all preceeding, in program order, operations.
The important difference from Volatile.Write(ref x) is that Volatile.WriteBarrier() has effect on all subsequent writes and not just a particular single write of x.
The actual implementation will depend on underlying platform.
- On TSO architectures (x86/x64) it would have only compiler optimization-preventing effects.
- On weaker architectures some ordering instructions will be emitted in addition - as appropriate and available in the given ISA.
API Usage
The runtime uses an internal API Interlocked.ReadMemoryBarrier() in 2 places (here and here) to batch multiple reads on both CoreCLR and NativeAOT, and is supported on all platforms. This ability is also useful to third-party developers (such as me, in my example below), but is currently not possible to write efficiently.
An example where non-atomic volatile operations would be useful is as follows. Consider a game which wants to save its state, ideally while continuing to run; these are the most obvious options:
- Save on the main thread (can cause lag spikes if lots of data to save)
- Create a copy of the data and then save on a separate thread (less time spent on main thread, but potentially still non-trivial, and a large number of allocations most likely)
- Use locks around access to the memory (massive overhead)
- Just save it on another thread without any sync (saves an inconsistent state)
But there is actually another option which utilises non-atomic volatile semantics:
- Has low overhead, especially on x86 and x64
- Saves a consistent state
//main threads sets IsSaving to true and increments SavingVersion before starting the saving thread, and to false once it's definitely done (e.g., on next frame)
//saving thread performs a full memory barrier before starting (when required, since starting a brand new thread every time isn't ideal), to ensure that _value is up-to-date
//memory synchronisation works because _value is always read before any saving information, and it's always written after the saving information
//if the version we read on the saving thread is not the current version, then our read from _value is correct, otherwise our read from _savingValue will be correct
//in the rare case that we loop to saving version == 0, then we can manually write all _savingVersion values to 0, skip to version == 1, and go from there (excluded from here though for clarity)
static class SavingState
{
public static bool IsSaving { get; set; }
public static nuint SavingVersion { get; set; }
}
struct SaveableHolder<T>
{
nuint _savingVersion;
T _value;
T _savingValue;
//Called only from main thread
public T Value
{
get => _value;
set
{
if (SavingState.IsSaving)
{
if (SavingVersion != SavingState.SavingVersion)
{
_savingValue = _value;
//ensure the saving value is written before the saving version, so that we read it in the correct order
Volatile.Write(ref _savingVersion, SavingState.SavingVersion);
}
//_value can only become torn or incorrect after we have written our saving value and version
Volatile.WriteBarrier();
_value = value; //write must occur after prior writes
}
else
{
_value = value;
}
}
}
//Called only from saving thread while SavingState.IsSaving with a higher SavingState.SavingVersion than last time
public T SavingValue
{
get
{
var value = Value; //read must occur before reads
Volatile.ReadBarrier();
//_savingVersion must be read after _value is, so if it's visibly changed/changing then we will either catch it here
if (Volatile.Read(in _savingVersion) != SavingState.SavingVersion) return value;
//volatile read on _savingVersion ensures we get an up-to-date _savingValue since it's written first
return _savingValue;
}
}
}Alternative Designs
- We could expose read/write APIs instead:
namespace System.Threading;
public static class Volatile
{
public static T ReadNonAtomic<T>(ref readonly T location) where T : allows ref struct
{
//ldarg.0
//volatile.
//ldobj !!T
}
public static void WriteNonAtomic<T>(ref T location, T value) where T : allows ref struct
{
//ldarg.0
//ldarg.1
//volatile.
//stobj !!T
}
}We do have IL instructions, but they're currently broken and not exposed, see #91530 - the proposal here was originally to expose APIs for volatile. ldobj and volatile. stobj + the unaligned variants (as seen aobe), and fix the instructions (or implement these without the instructions and have the instructions call these APIs - not much of a difference really). It was changed based on feedback to expose barrier APIs, which can provide equivalent semantics, but also allow additional scenarios. It is also clearer which memory operations can be reordered with the barrier APIs.
- We could expose APIs on Interlocked instead:
public static class Interlocked
{
// Existing API
public static void MemoryBarrier();
// New APIs
public static void MemoryBarrierAcquire(); //volatile read semantics
public static void MemoryBarrierRelease(); //volatile write semantics
}- We could expose the APIs on
Unsafeinstead:
namespace System.Runtime.CompilerServices;
public static class Unsafe
{
public static T ReadVolatile<T>(ref readonly T location) where T : allows ref struct;
public static void WriteVolatile<T>(ref T location, T value) where T : allows ref struct;
}- We could add unaligned overloads:
namespace System.Runtime.CompilerServices;
public static class Unsafe
{
public static T ReadVolatileUnaligned<T>(ref readonly byte location) where T : allows ref struct;
public static void WriteVolatileUnaligned<T>(ref byte location, T value) where T : allows ref struct;
}- We could also expose APIs for other operations which allow
volatile.-initblkandcpblk, people may have use for these also:
namespace System.Runtime.CompilerServices;
public static class Unsafe
{
public static void CopyBlockVolatile(ref byte destination, ref readonly byte source, uint byteCount);
public static void CopyBlockVolatileUnaligned(ref byte destination, ref readonly byte source, uint byteCount);
public static void InitBlockVolatile(ref byte startAddress, byte value, uint byteCount);
public static void InitBlockVolatileUnaligned(ref byte startAddress, byte value, uint byteCount);
}- We could expose APIs similar to what C++ has: https://en.cppreference.com/w/cpp/atomic/memory_order
Open Questions
There is a question as to whether we should have Read-ReadWrite/ReadWrite-Write barriers or Read-Read/Write-Write barriers. I was initially in favour of the former (as it matches our current memory model), but now think the latter is probably better, since there are many scenarios (including in my example API usage, and the runtime's uses too) where the additional guarantees provided by the former are unnecessary, and thus may cause unnecessary overhead. We could also just provide both if we think they're both useful.
Risks
No more than other volatile/interlocked APIs really, other than potential misunderstanding of what they do.