Skip to content

JIT HWIntrinsics: Bad codegen with contained loads #12365

@saucecontrol

Description

@saucecontrol

Using current master (including dotnet/coreclr#23511) with the following test code:

using System;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

class Program
{
    struct vec
    {
        public float f1;
        public float f2;
        public float f3;
        public float f4;
    }

    static unsafe float fmaTest()
    {
        vec a;
        var b = Vector128.Create(1f);
        var c = Vector128.Create(2f);
        var d = Vector128.Create(3f);

        c = Fma.MultiplyAdd(Sse.LoadVector128((float*)&a), b, c);

        return Sse.Add(c, d).ToScalar();
    }

    static void Main(string[] args)
    {
        Console.WriteLine(fmaTest());
    }
}

fmaTest compiles as

; Assembly listing for method Program:fmaTest():float
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  1,  1   )  struct (16) [rsp+0x08]   do-not-enreg[XS] must-init addr-exposed ld-addr-op
;  V01 loc1         [V01,T01] (  2,  2   )  simd16  ->  mm0
;  V02 loc2         [V02,T00] (  4,  4   )  simd16  ->  mm1
;* V03 loc3         [V03    ] (  0,  0   )  simd16  ->  zero-ref
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V05 tmp1         [V05,T02] (  2,  2   )  simd16  ->  mm0         "Inline return value spill temp"
;  V06 tmp2         [V06,T03] (  2,  2   )  simd16  ->  mm0         "Inline stloc first use temp"
;  V07 tmp3         [V07,T04] (  2,  2   )  simd16  ->  mm1         "Inline return value spill temp"
;  V08 tmp4         [V08,T05] (  2,  2   )  simd16  ->  mm1         "Inline stloc first use temp"
;  V09 tmp5         [V09,T06] (  2,  2   )  simd16  ->  mm2         "Inline return value spill temp"
;  V10 tmp6         [V10,T07] (  2,  2   )  simd16  ->  mm2         "Inline stloc first use temp"
;
; Lcl frame size = 24

G_M51181_IG01:
       sub      rsp, 24
       vzeroupper
       xor      rax, rax
       mov      qword ptr [rsp+08H], rax
       mov      qword ptr [rsp+10H], rax

G_M51181_IG02:
       vmovss   xmm0, dword ptr [reloc @RWD00]
       vbroadcastss xmm0, xmm0
       vmovss   xmm1, dword ptr [reloc @RWD04]
       vbroadcastss xmm1, xmm1
       vmovss   xmm2, dword ptr [reloc @RWD08]
       vbroadcastss xmm2, xmm2
       vfmadd231ps xmm1, xmm0, xmmword ptr [0000H]
       vaddps   xmm0, xmm1, xmm2

G_M51181_IG03:
       add      rsp, 24
       ret

RWD00  dd       3F800000h
RWD04  dd       40000000h
RWD08  dd       40400000h

; Total bytes of code 77, prolog size 19 for method Program:fmaTest():float
; ============================================================

The contained address of vec a is coming out null.

Also of interest is that the broadcasts are still not using containment.

cc @CarolEidt

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions