Skip to content

Jit double zeros for async int method returning ValueTask<T> #38070

@benaadams

Description

@benaadams

Follow up to #2325

An optimization was added in #36918 "Optimization to remove redundant zero initializations."; however it doesn't kick in for the initiation methods of async methods returning large ValueTask<T>; which still double zero.

Given

using System.Buffers;
using System.Threading.Tasks;

public class DoubleZero
{
    public async ValueTask<ReadResult> SingleMethod()
    {
        return default;

    }

    public readonly struct ReadResult
    {
        internal readonly ReadOnlySequence<byte> _resultBuffer;
        internal readonly ResultFlags _resultFlags;
    }

    public enum ResultFlags : byte { }
}

It produces the following asm which double zeros:

; Assembly listing for method DoubleZero:SingleMethod():System.Threading.Tasks.ValueTask`1[ReadResult]:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd
;  V01 RetBuf       [V01,T00] (  4,  4   )   byref  ->  rsi        
;  V02 loc0         [V02    ] (  4,  4   )  struct (48) [rsp+0x20]   do-not-enreg[XSFB] must-init addr-exposed ld-addr-op
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;* V04 tmp1         [V04    ] (  0,  0   )  struct (40) zero-ref    do-not-enreg[SB] ld-addr-op "Inline ldloca(s) first use temp"

G_M9604_IG01:
       push     rsi
       sub      rsp, 80
       vzeroupper 
       vxorps   xmm4, xmm4                  ; zeros
       vmovdqa  xmmword ptr [rsp+20H], xmm4 ;   |
       vmovdqa  xmmword ptr [rsp+30H], xmm4 ;   |
       vmovdqa  xmmword ptr [rsp+40H], xmm4 ;   v
       mov      rsi, rdx
						
G_M9604_IG02:
       xor      ecx, ecx                    ; re-zeros
       vxorps   xmm0, xmm0                  ;    |
       vmovdqu  xmmword ptr [rsp+28H], xmm0 ;    |
       vmovdqu  xmmword ptr [rsp+38H], xmm0 ;    |
       mov      qword ptr [rsp+48H], rcx    ;    V
       mov      dword ptr [rsp+20H], -1
       lea      rcx, [rsp+20H]
       call     System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start(byref)

Preferably it would be (skipping the second zeroing)

G_M9604_IG01:
       push     rsi
       sub      rsp, 80
       vzeroupper 
       vxorps   xmm4, xmm4                  ; zeros
       vmovdqa  xmmword ptr [rsp+20H], xmm4 ;   |
       vmovdqa  xmmword ptr [rsp+30H], xmm4 ;   |
       vmovdqa  xmmword ptr [rsp+40H], xmm4 ;   v
       mov      rsi, rdx
						
G_M9604_IG02:
       mov      dword ptr [rsp+20H], -1
       lea      rcx, [rsp+20H]
       call     System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start(byref)

Aside I've added an optimization to Roslyn to move storing parameters to the async statemachine later in the method, after the second zeroing, rather than before which may have blocked the optimization dotnet/roslyn#45262; however this issue occurs even when parameters are not passed as seen above.

/cc @erozenfeld

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions