Description
Calling Vector128.Create(float, float, float, float) with constants that share the same value results in poor codegen, as demonstrated below.
Sharplab
using System;
using System.Runtime.Intrinsics;
public class C {
static Vector128<float> M1() {
return Vector128.Create(1f,2f,4f,4f);
}
static Vector128<float> M2() {
return Vector128.Create(1f,2f,3f,4f);
}
}
Output assembly (CoreCLR 6.0.21.52210 on amd64)
C.M1()
L0000: vzeroupper
L0003: vmovss xmm0, [0x7ffb22bb0480]
L000b: vmovss xmm1, [0x7ffb22bb0484]
L0013: vinsertps xmm0, xmm0, xmm1, 0x10
L0019: vmovss xmm1, [0x7ffb22bb0488]
L0021: vmovaps xmm2, xmm1
L0025: vinsertps xmm0, xmm0, xmm2, 0x20
L002b: vinsertps xmm0, xmm0, xmm1, 0x30
L0031: vmovupd [rcx], xmm0
L0035: mov rax, rcx
L0038: ret
C.M2()
L0000: vzeroupper
L0003: vmovupd xmm0, [0x7ffb22bb04c0]
L000b: vmovupd [rcx], xmm0
L000f: mov rax, rcx
L0012: ret
This seems to affect all other VectorXXX.Create() functions, but only for float and double.
Description
Calling
Vector128.Create(float, float, float, float)with constants that share the same value results in poor codegen, as demonstrated below.Sharplab
Output assembly (CoreCLR 6.0.21.52210 on amd64)
This seems to affect all other
VectorXXX.Create()functions, but only forfloatanddouble.