-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue
Milestone
Description
In the code posted below RyuJIT loads double values in the loop from memory each iteration. This is not necessary and slow. They should be in registers. This appears to be either a bug or a missing optimization. I request that this issue be investigated and fixed if possible.
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
public static class CSharpTest
{
private const int ITERATIONS = 1000 * 1000 * 100;
public static void Main()
{
Test1();
}
private static void Test1()
{
Point a = new Point(1, 1);
Point b = new Point(1, 1);
for (int i = 0; i < ITERATIONS; i++)
a = AddByVal(a, b);
GC.KeepAlive(a.X + b.X + a.Y + b.Y); //consume result
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static Point AddByVal(Point a, Point b)
{
return new Point(a.X + b.Y, a.Y + b.X);
}
}
public struct Point
{
public Point(double x, double y) { X = x; Y = y; }
public readonly double X;
public readonly double Y;
}
Point a = new Point(1, 1);
00000000 sub rsp,48h
00000004 movaps xmmword ptr [rsp+30h],xmm6
00000009 movaps xmmword ptr [rsp+20h],xmm7
0000000e movsd xmm6,mmword ptr [0000000000000098h]
00000016 movsd xmm7,mmword ptr [00000000000000A0h]
for (int i = 0; i < ITERATIONS; i++)
0000001e xor ecx,ecx
a = AddByVal(a, b);
00000020 movaps xmm0,xmm6
00000023 addsd xmm0,mmword ptr [00000000000000A8h] //This should not exist
0000002b movsd xmm6,xmm0
0000002f movaps xmm0,xmm7
00000032 addsd xmm0,mmword ptr [00000000000000B0h] //This should not exist
0000003a movsd xmm7,xmm0
0000003e movsd xmm0,xmm6
00000042 movsd xmm6,xmm0
for (int i = 0; i < ITERATIONS; i++)
00000046 inc ecx
00000048 cmp ecx,5F5E100h
0000004e jl 0000000000000020
GC.KeepAlive(a.X + b.X + a.Y + b.Y); //consume result
00000050 mov rcx,7FFCC2205688h
0000005a call 000000005F65E850
0000005f mov rcx,rax
00000062 movaps xmm0,xmm6
00000065 addsd xmm0,mmword ptr [00000000000000B8h]
0000006d addsd xmm0,xmm7
00000071 addsd xmm0,mmword ptr [00000000000000C0h]
00000079 movsd mmword ptr [rcx+8],xmm0
0000007e call 000000005E63F020
00000083 nop
}
00000084 movaps xmm6,xmmword ptr [rsp+30h]
00000089 movaps xmm7,xmmword ptr [rsp+20h]
0000008e add rsp,48h
00000092 ret
Also, I think the loop should be unrolled at least 2 times. 2x unrolling seems like a conservative behavior that can be applied liberally. Especially in tight floating point loops.
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue