-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Milestone
Description
Follow up to #2325
An optimization was added in #36918 "Optimization to remove redundant zero initializations."; however it doesn't kick in for the initiation methods of async methods returning large ValueTask<T>; which still double zero.
Given
using System.Buffers;
using System.Threading.Tasks;
public class DoubleZero
{
public async ValueTask<ReadResult> SingleMethod()
{
return default;
}
public readonly struct ReadResult
{
internal readonly ReadOnlySequence<byte> _resultBuffer;
internal readonly ResultFlags _resultFlags;
}
public enum ResultFlags : byte { }
}It produces the following asm which double zeros:
; Assembly listing for method DoubleZero:SingleMethod():System.Threading.Tasks.ValueTask`1[ReadResult]:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 this [V00 ] ( 0, 0 ) ref -> zero-ref this class-hnd
; V01 RetBuf [V01,T00] ( 4, 4 ) byref -> rsi
; V02 loc0 [V02 ] ( 4, 4 ) struct (48) [rsp+0x20] do-not-enreg[XSFB] must-init addr-exposed ld-addr-op
; V03 OutArgs [V03 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
;* V04 tmp1 [V04 ] ( 0, 0 ) struct (40) zero-ref do-not-enreg[SB] ld-addr-op "Inline ldloca(s) first use temp"
G_M9604_IG01:
push rsi
sub rsp, 80
vzeroupper
vxorps xmm4, xmm4 ; zeros
vmovdqa xmmword ptr [rsp+20H], xmm4 ; |
vmovdqa xmmword ptr [rsp+30H], xmm4 ; |
vmovdqa xmmword ptr [rsp+40H], xmm4 ; v
mov rsi, rdx
G_M9604_IG02:
xor ecx, ecx ; re-zeros
vxorps xmm0, xmm0 ; |
vmovdqu xmmword ptr [rsp+28H], xmm0 ; |
vmovdqu xmmword ptr [rsp+38H], xmm0 ; |
mov qword ptr [rsp+48H], rcx ; V
mov dword ptr [rsp+20H], -1
lea rcx, [rsp+20H]
call System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start(byref)Preferably it would be (skipping the second zeroing)
G_M9604_IG01:
push rsi
sub rsp, 80
vzeroupper
vxorps xmm4, xmm4 ; zeros
vmovdqa xmmword ptr [rsp+20H], xmm4 ; |
vmovdqa xmmword ptr [rsp+30H], xmm4 ; |
vmovdqa xmmword ptr [rsp+40H], xmm4 ; v
mov rsi, rdx
G_M9604_IG02:
mov dword ptr [rsp+20H], -1
lea rcx, [rsp+20H]
call System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start(byref)Aside I've added an optimization to Roslyn to move storing parameters to the async statemachine later in the method, after the second zeroing, rather than before which may have blocked the optimization dotnet/roslyn#45262; however this issue occurs even when parameters are not passed as seen above.
/cc @erozenfeld
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue