-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
I'm not an expert, but I believe dmb st is not useful for either the acquire or the release semantics of volatile.
I believe the ARMv8 acq/rel variants of load/store instructions are exactly what we want for ARM64.
From the ARMv8 Architecture Reference Manual https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf
A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the same
Observer if and only if RW1 appears in program order before RW2 and any of the
following cases apply:
• RW1 appears in program order before a DMB FULL that appears in program order
before RW2.
• RW1 is a write W1 generated by a Store-Release instruction and RW2 is a read R2
generated by a Load-Acquire instruction.
• RW1 is a read R1 and either:
— R1 appears in program order before a DMB LD that appears in program order
before RW2.
— R1 is generated by a Load-Acquire instruction.
• RW2 is a write W2 and either:
— RW1 is a write W1 appearing in program order before a DMB ST that appears in
program order before W2.
— W2 is generated by a Store-Release instruction.
— RW1 appears in program order before a write W3 generated by a Store-Release
instruction and W2 is Coherence-after W3.
If you read this carefully, you will notice that these sequences are functionally identical for our purposes.
Load-Acquire; Load~Load; DMB LD; LoadLoad-Acquire; Store~Load; DMB LD; StoreLoad; Store-Release~Load; DMB FULL; StoreStore; Store-Release~Store; DMB FULL; Store
There is one exception, but I am asserting it is not important for our purposes
- Ordered
Store-Release; Load-Acquire;!~ UnorderedDMB FULL; Store; Load; DMB LD
Therefore
Load-Acquire;~Load; DMB LD;Store-Release~DMB FULL; Store
However the Load-Acquire; and Store-Release are less flexible
- Only the most basic addressing form is supported i.e.
ldar xt, [xn]orstlr xt, [xn] - Must use aligned accesses
- No support for loading into floating point registers
- No sign extended forms
So I am proposing
- Replace
dmb {sy}withdmb ldwhen appropriate. This would be done by adding a parameter toinstGen_MemoryBarrier()which defaulted to full. - Use
ldar*/stlr*forms only when they are drop in replacements for theldr*/str*- Not contained (address in a register)
- Not loading into floating point registers
- Not sign extending
- Aligned.
2.1ldarb,stlrbbyte size forms
2.2 NotGTF_IND_UNALIGNED(if we believe it guarantees aligned access.)
Plan
I had been working on using load-acquire store release forms more extensively. This proposal represents my abandonment of that brute force attempt.
I will implement 1, 2.1 and then 2.2 if it works.
category:correctness
theme:barriers
skill-level:intermediate
cost:medium