Skip to content

Math.FusedMultiplyAdd suboptimal codegen #13129

@EgorBo

Description

@EgorBo

I was implementing a popular math Lerp function using FusedMultiplyAdd (see this post in Nvidia's blog) and noticed that MathF.FusedMultiplyAdd only generates vfmadd213ss and doesn't care about signs.

So my Lerp is:

static float Lerp(float v0, float v1, float t)
{
    return MathF.FusedMultiplyAdd(t, v1, MathF.FusedMultiplyAdd(-t, v0, v0));
}

RuyJIT:

; Method Lerp(float,float,float):float

G_M3843_IG01:
       vzeroupper 

G_M3843_IG02:
       vmovaps  xmm3, xmm2
       vmovss   xmm4, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm4
       vmovaps  xmm4, xmm0
       vfmadd213ss xmm2, xmm4, xmm0
       vmovaps  xmm0, xmm3
       vfmadd213ss xmm0, xmm1, xmm2

G_M3843_IG03:
       ret      
RWD00  dd	80000000h
; Total bytes of code: 38

Clang/gcc:

       vfnmadd213ss xmm0, xmm2, xmm0 # xmm0 = -(xmm2 * xmm0) + xmm0
       vfmadd231ss xmm0, xmm2, xmm1 # xmm0 = (xmm2 * xmm1) + xmm0
       ret

So even a simple:

MathF.FusedMultiplyAdd(a, -b, -c);

generates redundant xors:

       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm1, xmm3
       vmovss   xmm3, dword ptr [reloc @RWD00]
       vxorps   xmm2, xmm3
       vfmadd213ss xmm0, xmm1, xmm2

while could be just:

vfnmsub213ss

Depending on signs it can be:

vfnmadd213ss
vfnmsub213ss
vfmadd213ss
vfmsub213ss

PS: I think all C# game engines have Lerp function so probably it makes sense to add it to Math, e.g.:
OpenTK
Unity3d
Xenko

category:cq
theme:intrinsics
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions