Skip to content

RyuJIT optimization problem with double constants in a loop #4440

@GSPP

Description

@GSPP

In the code posted below RyuJIT loads double values in the loop from memory each iteration. This is not necessary and slow. They should be in registers. This appears to be either a bug or a missing optimization. I request that this issue be investigated and fixed if possible.

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;

public static class CSharpTest
{
    private const int ITERATIONS = 1000 * 1000 * 100;

    public static void Main()
    {
        Test1();
    }

    private static void Test1()
    {
        Point a = new Point(1, 1);
        Point b = new Point(1, 1);

        for (int i = 0; i < ITERATIONS; i++)
            a = AddByVal(a, b);

        GC.KeepAlive(a.X + b.X + a.Y + b.Y); //consume result
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static Point AddByVal(Point a, Point b)
    {
        return new Point(a.X + b.Y, a.Y + b.X);
    }
}

public struct Point
{
    public Point(double x, double y) { X = x; Y = y; }
    public readonly double X;
    public readonly double Y;
}
        Point a = new Point(1, 1);
00000000  sub         rsp,48h 
00000004  movaps      xmmword ptr [rsp+30h],xmm6 
00000009  movaps      xmmword ptr [rsp+20h],xmm7 
0000000e  movsd       xmm6,mmword ptr [0000000000000098h] 
00000016  movsd       xmm7,mmword ptr [00000000000000A0h] 

        for (int i = 0; i < ITERATIONS; i++)
0000001e  xor         ecx,ecx 
            a = AddByVal(a, b);
00000020  movaps      xmm0,xmm6 
00000023  addsd       xmm0,mmword ptr [00000000000000A8h] //This should not exist
0000002b  movsd       xmm6,xmm0 
0000002f  movaps      xmm0,xmm7 
00000032  addsd       xmm0,mmword ptr [00000000000000B0h]  //This should not exist
0000003a  movsd       xmm7,xmm0 
0000003e  movsd       xmm0,xmm6 
00000042  movsd       xmm6,xmm0 
        for (int i = 0; i < ITERATIONS; i++)
00000046  inc         ecx 
00000048  cmp         ecx,5F5E100h 
0000004e  jl          0000000000000020 

        GC.KeepAlive(a.X + b.X + a.Y + b.Y); //consume result
00000050  mov         rcx,7FFCC2205688h 
0000005a  call        000000005F65E850 
0000005f  mov         rcx,rax 
00000062  movaps      xmm0,xmm6 
00000065  addsd       xmm0,mmword ptr [00000000000000B8h] 
0000006d  addsd       xmm0,xmm7 
00000071  addsd       xmm0,mmword ptr [00000000000000C0h] 
00000079  movsd       mmword ptr [rcx+8],xmm0 
0000007e  call        000000005E63F020 
00000083  nop 
    }
00000084  movaps      xmm6,xmmword ptr [rsp+30h] 
00000089  movaps      xmm7,xmmword ptr [rsp+20h] 
0000008e  add         rsp,48h 
00000092  ret 

Also, I think the loop should be unrolled at least 2 times. 2x unrolling seems like a conservative behavior that can be applied liberally. Especially in tight floating point loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions