-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimization
Milestone
Description
As per the comment here: https://github.com/dotnet/corefx/pull/31779/files#r210758497
The SSE implementation of Matrix4x4.Transpose is doing:
var row1 = Sse.LoadVector128(&matrix.M11);
var row2 = Sse.LoadVector128(&matrix.M21);
var row3 = Sse.LoadVector128(&matrix.M31);
var row4 = Sse.LoadVector128(&matrix.M41);Which leads to the following codegen:
mov rax, rdx
vmovups xmm0, xmmword ptr [rax]
lea rax, bword ptr [rdx+16]
mov r8, rax
vmovups xmm1, xmmword ptr [r8]
lea r8, bword ptr [rdx+32]
mov r9, r8
vmovups xmm2, xmmword ptr [r9]
lea r9, bword ptr [rdx+48]
mov r10, r9
vmovups xmm3, xmmword ptr [r10]Ideally, we should be generating the following instead:
vmovups xmm0, xmmword ptr [rdx]
vmovups xmm1, xmmword ptr [rdx+16]
vmovups xmm2, xmmword ptr [rdx+32]
vmovups xmm3, xmmword ptr [rdx+48]4creators
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimization