Skip to content

[Arm64] Use Half barriers and Load-Acquire Store Release to Implement IL Volatile Prefix #8072

@sdmaclea

Description

@sdmaclea

@briansull @RussKeldorph

I'm not an expert, but I believe dmb st is not useful for either the acquire or the release semantics of volatile.
I believe the ARMv8 acq/rel variants of load/store instructions are exactly what we want for ARM64.

From the ARMv8 Architecture Reference Manual https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf

A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the same 
Observer if and only if RW1 appears in program order before RW2 and any of the 
following cases apply: 

    • RW1 appears in program order before a DMB FULL that appears in program order 
            before RW2. 
    • RW1 is a write W1 generated by a Store-Release instruction and RW2 is a read R2 
            generated  by a Load-Acquire instruction. 
    • RW1 is a read R1 and either: 
        — R1 appears in program order before a DMB LD that appears in program order 
                before RW2. 
        — R1 is generated by a Load-Acquire instruction. 
    • RW2 is a write W2 and either: 
        — RW1 is a write W1 appearing in program order before a DMB ST that appears in 
                program order before W2. 
        — W2 is generated by a Store-Release instruction. 
        — RW1 appears in program order before a write W3 generated by a Store-Release
                instruction and W2 is Coherence-after W3.

If you read this carefully, you will notice that these sequences are functionally identical for our purposes.

  • Load-Acquire; Load ~ Load; DMB LD; Load
  • Load-Acquire; Store ~ Load; DMB LD; Store
  • Load; Store-Release ~ Load; DMB FULL; Store
  • Store; Store-Release ~ Store; DMB FULL; Store

There is one exception, but I am asserting it is not important for our purposes

  • Ordered Store-Release; Load-Acquire; !~ Unordered DMB FULL; Store; Load; DMB LD

Therefore

  • Load-Acquire; ~ Load; DMB LD;
  • Store-Release ~ DMB FULL; Store

However the Load-Acquire; and Store-Release are less flexible

  • Only the most basic addressing form is supported i.e. ldar xt, [xn] or stlr xt, [xn]
  • Must use aligned accesses
  • No support for loading into floating point registers
  • No sign extended forms

So I am proposing

  1. Replace dmb {sy}with dmb ld when appropriate. This would be done by adding a parameter to instGen_MemoryBarrier() which defaulted to full.
  2. Use ldar*/stlr* forms only when they are drop in replacements for the ldr*/str*
    • Not contained (address in a register)
    • Not loading into floating point registers
    • Not sign extending
    • Aligned.
      2.1 ldarb, stlrb byte size forms
      2.2 Not GTF_IND_UNALIGNED (if we believe it guarantees aligned access.)

Plan

I had been working on using load-acquire store release forms more extensively. This proposal represents my abandonment of that brute force attempt.

I will implement 1, 2.1 and then 2.2 if it works.
category:correctness
theme:barriers
skill-level:intermediate
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    arch-arm64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions