Skip to content

Codegen for LoadVector128 for a field of a struct is "poor" #10915

@tannergooding

Description

@tannergooding

As per the comment here: https://github.com/dotnet/corefx/pull/31779/files#r210758497

The SSE implementation of Matrix4x4.Transpose is doing:

var row1 = Sse.LoadVector128(&matrix.M11);
var row2 = Sse.LoadVector128(&matrix.M21);
var row3 = Sse.LoadVector128(&matrix.M31);
var row4 = Sse.LoadVector128(&matrix.M41);

Which leads to the following codegen:

mov      rax, rdx
vmovups  xmm0, xmmword ptr [rax]
lea      rax, bword ptr [rdx+16]
mov      r8, rax
vmovups  xmm1, xmmword ptr [r8]
lea      r8, bword ptr [rdx+32]
mov      r9, r8
vmovups  xmm2, xmmword ptr [r9]
lea      r9, bword ptr [rdx+48]
mov      r10, r9
vmovups  xmm3, xmmword ptr [r10]

Ideally, we should be generating the following instead:

vmovups  xmm0, xmmword ptr [rdx]
vmovups  xmm1, xmmword ptr [rdx+16]
vmovups  xmm2, xmmword ptr [rdx+32]
vmovups  xmm3, xmmword ptr [rdx+48]

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimization

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions