Skip to content

JIT doesn't always fold 'add' instructions into 'lea' which immediately follows it #51599

@GrabYourPitchforks

Description

@GrabYourPitchforks

Repro code:

{
class Program
{
    static int ReadInt32Field(object @this, nuint offset)
    {
        ref byte b = ref Unsafe.As<RawObjData>(@this).Data;
        b = ref Unsafe.Subtract(ref b, IntPtr.Size);
        b = ref Unsafe.AddByteOffset(ref b, (nint)offset);
        return Unsafe.ReadUnaligned<int>(ref b);
    }
}

sealed class RawObjData
{
    public byte Data;
}
; Method ConsoleApp62.Program:ReadInt32Field(System.Object,long):int
G_M62650_IG01:
						;; bbWeight=1    PerfScore 0.00

G_M62650_IG02:
       cmp      dword ptr [rcx], ecx
       add      rcx, 8
       mov      eax, dword ptr [rcx+rdx-8]
						;; bbWeight=1    PerfScore 4.25

G_M62650_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 11

In this codegen, both the add and the mov instruction can be collapsed into a single mov eax, dword ptr [rcx + rdx] instruction, where the +8 and the -8 cancel each other out.

We see this sometimes with very low-level object manipulation in corelib. For example, the codegen listed at the top of #51548 (working on an Array.Clear optimization) has this pattern. We even have a specialized intrinsic as a temporary workaround in extremely hot code paths. (This intrinsic is used in Memory<T>.get_Span, for instance.)

It'd be great if JIT could somehow collapse these sequences where they occur. That would ideally result in better codegen and would allow us to remove this intrinsic from the code base.

Metadata

Metadata

Assignees

Labels

Priority:1Work that is critical for the release, but we could probably ship withoutarea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions