-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
dotnet/coreclr
#27060Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization
Milestone
Description
I was implementing a popular math Lerp function using FusedMultiplyAdd (see this post in Nvidia's blog) and noticed that MathF.FusedMultiplyAdd only generates vfmadd213ss and doesn't care about signs.
So my Lerp is:
static float Lerp(float v0, float v1, float t)
{
return MathF.FusedMultiplyAdd(t, v1, MathF.FusedMultiplyAdd(-t, v0, v0));
}RuyJIT:
; Method Lerp(float,float,float):float
G_M3843_IG01:
vzeroupper
G_M3843_IG02:
vmovaps xmm3, xmm2
vmovss xmm4, dword ptr [reloc @RWD00]
vxorps xmm2, xmm4
vmovaps xmm4, xmm0
vfmadd213ss xmm2, xmm4, xmm0
vmovaps xmm0, xmm3
vfmadd213ss xmm0, xmm1, xmm2
G_M3843_IG03:
ret
RWD00 dd 80000000h
; Total bytes of code: 38
Clang/gcc:
vfnmadd213ss xmm0, xmm2, xmm0 # xmm0 = -(xmm2 * xmm0) + xmm0
vfmadd231ss xmm0, xmm2, xmm1 # xmm0 = (xmm2 * xmm1) + xmm0
retSo even a simple:
MathF.FusedMultiplyAdd(a, -b, -c);generates redundant xors:
vmovss xmm3, dword ptr [reloc @RWD00]
vxorps xmm1, xmm3
vmovss xmm3, dword ptr [reloc @RWD00]
vxorps xmm2, xmm3
vfmadd213ss xmm0, xmm1, xmm2while could be just:
vfnmsub213ssDepending on signs it can be:
vfnmadd213ss
vfnmsub213ss
vfmadd213ss
vfmsub213ssPS: I think all C# game engines have Lerp function so probably it makes sense to add it to Math, e.g.:
OpenTK
Unity3d
Xenko
category:cq
theme:intrinsics
skill-level:expert
cost:medium
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization