[WIP] Enable cast operand containment by mikedn · Pull Request #12676 · dotnet/coreclr

mikedn · 2017-07-07T16:43:58Z

This improves code generation for (u)int -> (u)long casts without overflow when the cast operand is a GT_IND node. Currently this is a two step process - first load the value from memory (and widen it to TYP_INT if it's a small int value) and then widen to TYP_LONG. This can be avoided by containing the cast operand.

Fixes #12595

mikedn · 2017-07-08T07:53:33Z

jit-diff summary:

Total bytes of diff: -10079 (-0.05% of base)
    diff is an improvement.
Top file improvements by size (bytes):
       -2377 : System.Private.CoreLib.dasm (-0.06% of base)
       -1220 : Microsoft.CodeAnalysis.CSharp.dasm (-0.06% of base)
       -1052 : System.Private.Xml.dasm (-0.03% of base)
        -851 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.04% of base)
        -840 : System.Reflection.Metadata.dasm (-0.58% of base)
78 total files with size differences (78 improved, 0 regressed), 51 unchanged.
Top method regressions by size (bytes):
          60 ( 2.97% of base) : System.Private.Xml.dasm - ReflectionXmlSerializationWriter:WriteStructMethod(ref,ref,ref,ref,bool,bool):this
           3 ( 0.37% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:MakeArguments(ref,struct,ref,ref,bool,struct,byref,byref,bool,ubyte):struct:this
           2 ( 7.14% of base) : System.Private.CoreLib.dasm - UIntPtr:ToUInt32():int:this
           2 ( 7.41% of base) : System.Private.CoreLib.dasm - UIntPtr:op_Explicit(long):int
Top method improvements by size (bytes):
        -103 (-0.78% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
        -101 (-0.96% of base) : System.Private.CoreLib.dasm - DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         -82 (-3.21% of base) : Microsoft.VisualBasic.dasm - OverloadResolution:MoreSpecificProcedure(ref,ref,ref,ref,int,byref,bool):ref
         -64 (-6.24% of base) : Microsoft.DotNet.ProjectModel.dasm - CommonCompilerOptions:Equals(ref):bool:this
         -63 (-2.89% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:MakeConversion(ref,ref,ref,ubyte,ref,bool,bool,bool,bool,ref,ref):ref:this
Top method regressions by size (percentage):
           2 ( 7.41% of base) : System.Private.CoreLib.dasm - UIntPtr:op_Explicit(long):int
           2 ( 7.14% of base) : System.Private.CoreLib.dasm - UIntPtr:ToUInt32():int:this
          60 ( 2.97% of base) : System.Private.Xml.dasm - ReflectionXmlSerializationWriter:WriteStructMethod(ref,ref,ref,ref,bool,bool):this
           3 ( 0.37% of base) : Microsoft.CodeAnalysis.CSharp.dasm - LocalRewriter:MakeArguments(ref,struct,ref,ref,bool,struct,byref,byref,bool,ubyte):struct:this
Top method improvements by size (percentage):
          -2 (-18.18% of base) : System.Net.HttpListener.dasm - HttpListenerTimeoutManager:get_MinSendBytesPerSecond():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_MessagesSent():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_MessagesReceived():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_ErrorsSent():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_ErrorsReceived():long:this
2220 total methods with size differences (2216 improved, 4 regressed), 110665 unchanged.

Sample diffs:

        movzx    r8, byte  ptr [rcx+17]
-       mov      r8d, r8d

        mov      edx, dword ptr [rsi+rcx+40]
-       mov      edx, edx

-       movsx    rcx, word  ptr [rsi+8]
-       movsxd   rax, ecx
+       movsx    rax, word  ptr [rsi+8]

-       mov      ecx, dword ptr [rsi]
-       movsxd   rcx, ecx
+       movsxd   rcx, dword ptr [rsi]

Code gen for the LUTMap function from the associated issue:

G_M55886_IG01:
       C5F877               vzeroupper
G_M55886_IG02:
       4963C1               movsxd   rax, r9d
       488D4401FC           lea      rax, [rcx+rax-4]
       483BC8               cmp      rcx, rax
       7750                 ja       SHORT G_M55886_IG04
G_M55886_IG03:
       440FB609             movzx    r9, byte  ptr [rcx]
       C4817A100488         vmovss   xmm0, dword ptr [r8+4*r9]
       C4E17A1102           vmovss   dword ptr [rdx], xmm0
       440FB64901           movzx    r9, byte  ptr [rcx+1]
       C4817A100488         vmovss   xmm0, dword ptr [r8+4*r9]
       C4E17A114204         vmovss   dword ptr [rdx+4], xmm0
       440FB64902           movzx    r9, byte  ptr [rcx+2]
       C4817A100488         vmovss   xmm0, dword ptr [r8+4*r9]
       C4E17A114208         vmovss   dword ptr [rdx+8], xmm0
       440FB64903           movzx    r9, byte  ptr [rcx+3]
       C4817A100488         vmovss   xmm0, dword ptr [r8+4*r9]
       C4E17A11420C         vmovss   dword ptr [rdx+12], xmm0
       4883C104             add      rcx, 4
       4883C210             add      rdx, 16
       483BC8               cmp      rcx, rax
       76B1                 jbe      SHORT G_M55886_IG03
G_M55886_IG04:
       C3                   ret

; Total bytes of code 96, prolog size 3 for method Program:Test(long,long,long,int)

mikedn · 2017-07-08T08:10:41Z

It's worth pointing out that casts involving lclvars have a similar issue:

static ulong Test(byte x) => x

generates

0FB6C1               movzx    rax, cl ; eax actually but it doesn't really matter
8BC0                 mov      eax, eax

but the IR causing this is different so a separate solution/fix will be needed:

N005 (  2,  2) [000001] ------------ t1 =    LCL_VAR   int    V00 arg0         u:2 rcx (last use) REG rcx $80
                                                 /--*  t1     int
N007 (  3,  4) [000005] ------------ t5 = *  CAST      int <- ubyte <- int REG rax $100
                                                 /--*  t5     int
N009 (  4,  6) [000002] ---------U-- t2 = *  CAST      long <- ulong <- uint REG rax $140
                                                 /--*  t2     long
N011 (  5,  7) [000003] ------------ *  RETURN    long   REG NA $180

mikedn · 2017-07-12T05:47:30Z

@CarolEidt Is it correct for a cast with overflow to store its result in the target register before checking for overflow?

I assume that it is because in case of exception the result can only be observed if we're in a try block and the destination is a lclvar. But if we're in a try block the lclvar won't be enregistered so the result of the cast won't be stored in the lclvar by the cast node itself but by a subsequent GT_STORE_LCL_VAR.

CarolEidt · 2017-07-12T15:05:40Z

Is it correct for a cast with overflow to store its result in the target register before checking for overflow?

I believe that it would only be incorrect, as you suggest, if the register is the home of the lclVar, and is live outside of the try block. But, as you say, that won't be the case in a try. Even when we enable write-thru of EH variables, as is currently being worked on, the memory location is what will be the definitive copy. If/when we consider keeping these lclVars in registers across exception edges, this would be a concern - but I don't see that happening in the near term.

karelz · 2017-08-30T18:00:27Z

What is status of this PR? What are the next steps to move it forward?

mikedn · 2017-08-30T18:01:40Z

Waiting for #13501 to be sorted out

CarolEidt · 2018-04-12T23:10:26Z

Waiting for #13501 to be sorted out

Now that #13501 is addressed, we should look at this for 2.1

sandreenko · 2018-09-13T22:09:24Z

@mikedn what about this one, Is it ready for review?

mikedn · 2018-09-14T05:11:36Z

@sandreenko Ha ha, as much as I would love for this particular one to be ready it still isn't. The ARM version is busted. For one thing, ARM64's emitter unexpectedly emits ldrsw instead of ldr. And then there appears to be some other issue that leads to asserts during crossgen.

mikedn · 2018-09-14T17:14:02Z

The ARM story is pretty hairy:

ARM32 asserts when attempting to emit ldrb/ldrsb/ldrh/ldrsh with EA_4BYTE. IMO the assert is bogus, the instruction size is really 4 bytes, ARM does not have 1/2 byte instructions like xarch has.
Lack of support for reg optional on ARM, the relevant code in genConsumeRegs is ifdefed out, even if SetRegOptional is available.
OptimizeConstCompare removes a cast node and does not call ClearContained on the cast operand. We end up with a compare instruction with a contained operand, inherited from the cast. Naturally, that fails on ARM.
Broken register requirements on casts with contained operand. The contained indir may require a temporary register to deal with unsupported address modes. BuildCast doesn't know that.
emitInsLoadStoreOp emits a 64 bit ldr when asked to emit a 32 bit ldr. There seems to be some confusion throughout ARM64 codegen - for TYP_INT indirs it emits sxtw instead of a 32 bit ldr. This deviates from X64 codegen and other ARM64 compilers.

I'll see what I can do about all this.

The last one is pretty dubious to fix. But it's probably a good idea to fix it rather than attempt to workaround it. Having emitInsLoadStoreOp unexpectedly emit a 64 bit ldr is asking for trouble.

mikedn · 2018-09-25T17:13:15Z

On hold again until #20126 is done.

This is only useful in code shared between ARMARCH and XARCH, otherwise it's simpler to just use the required instructions.

genCodeForArrIndex and genCodeForArrOffset emit a ldrsw on ARM64 but that's not necessary. Array lengths are always positive so just use ldr on both ARM32 and ARM64.

genCodeForIndexAddr's range check generation is messed up: - It uses ldrsw to load array length on ARM64. Not needed, the length is positive. - It uses sxtw to sign etxend the array length if the index is native int. But it was already extended by the load. - It creates IND and LEA nodes out of thin air. Just call the appropiate emitter functions. - It always generates a TYP_I_IMPL cmp, even when the index is TYP_INT. Well, that's just bogus.

The scaling of immediate operands is a property of the instruction and its encoding. It doesnt' make sense to throw the instruction size (emitAttr) into the mix, that's a codegen/emitter specific concept. On ARM32 it's also almost meaningless, at least in the case of integer types - all instructions are really EA_4BYTE, even ldrb, ldrh etc. The ARM64 emitter already extracts the scaling factor from the instruction. It can't use the instruction size as on ARM64 the size is used to select between 32 bit and 64 bit instructions so it's never EA_1BYTE/EA_2BYTE.

ARM64's ins_Load returns INS_ldrsw for TYP_INT but there's nothing in the JIT type system that requires sign extending TYP_INT values on load. The fact that an indir node has TYP_INT doesn't even imply that the value to load is signed, it may be unsigned and indir nodes will never have type TYP_UINT nor have the GTF_UNSIGNED flag set. XARCH uses a mov (rather than movsxd, the equivalent of ldrsw) so it zero extends. There's no reason for ARM64 to behave differently and doing so makes it more difficult to share codegen logic between XARCH and ARM64 Other ARM64 compilers also use ldr rather than ldrsw. This requires patching up emitInsAdjustLoadStoreAttr so EA_4BYTE loads don't end up using EA_8BYTE, which ldrsw requires.

In particular, cleanup selection of acquire/release instructions. The existing code starts by selecting a "normal" instruction, only to throw it away and redo the type/size logic in the volatile case. And get it wrong in the process, it required that "targetType" be TYP_UINT or TYP_LONG to use ldar. But there are no TYP_UINT indirs. Also rename "targetType" to "type", using "target" is misleading. The real target type would be genActualType(tree->TypeGet()).

- Require EA_4BYTE for byte/halfword instructions on ARM32. - Remove emitInsAdjustLoadStoreAttr on ARM64. Getting the correct instruction size simply requires using emitActualTypeSize, that will provide the correct size on both ARM32 and ARM64. - Replace emitTypeSize with emitActualTypeSize as needed.

karelz · 2019-05-04T03:15:23Z

What's the plan for this PR @RussKeldorph ? Maybe we should close it until it is ready - it's almost 2y old now, with last changes from 7 months ago ...

mikedn · 2019-05-04T06:14:20Z

OK, I'm going to close this until #20126 is merged. I should have probably ignored ARM and finish this for x86/64 but I wasn't expecting that it would take so long. Oh well, it's too late for 3.0 anyway.

dnfclas added the cla-already-signed label Jul 7, 2017

mikedn force-pushed the long-cast branch from 5a0f99d to 9da2f25 Compare July 7, 2017 20:12

jkotas added the area-CodeGen label Jul 8, 2017

mikedn force-pushed the long-cast branch from 9da2f25 to cc6a266 Compare July 8, 2017 07:34

mikedn force-pushed the long-cast branch from cc6a266 to 24477ea Compare July 12, 2017 05:18

mikedn force-pushed the long-cast branch 2 times, most recently from fb40135 to edbec97 Compare July 13, 2017 16:18

mikedn mentioned this pull request Jul 19, 2017

Refactor emitter::emitInsMov #12905

Merged

mikedn force-pushed the long-cast branch from edbec97 to ddbbaec Compare July 25, 2017 18:01

mikedn changed the title ~~[WIP] Improve int->long cast codegen~~ [WIP] Enable cast operand containment Jul 25, 2017

mikedn force-pushed the long-cast branch from ddbbaec to 02642b2 Compare July 26, 2017 21:23

mikedn force-pushed the long-cast branch 3 times, most recently from 8f3c0e4 to 9ee914e Compare August 20, 2017 20:47

karelz assigned CarolEidt and mikedn Aug 25, 2017

mikedn force-pushed the long-cast branch 2 times, most recently from 4e6ede9 to 8f44ae6 Compare August 26, 2017 22:35

karelz added the blocked Issue/PR is blocked on something - see comments label Aug 30, 2017

JosephTremoulet mentioned this pull request Sep 13, 2017

Undo more JIT layout workarounds #13961

Merged

mikedn force-pushed the long-cast branch from 8f44ae6 to 68f91ec Compare September 16, 2017 07:00

mikedn force-pushed the long-cast branch from 68f91ec to 885e1b5 Compare February 17, 2018 08:04

CarolEidt self-requested a review April 12, 2018 23:11

mikedn force-pushed the long-cast branch 3 times, most recently from b670617 to 0c5ac0c Compare August 17, 2018 16:35

mikedn mentioned this pull request Aug 17, 2018

Refactor integer cast codegen #19296

Merged

mikedn force-pushed the long-cast branch 2 times, most recently from b302389 to 3636ab6 Compare September 12, 2018 17:28

mikedn force-pushed the long-cast branch from 3636ab6 to 1246cbe Compare October 21, 2018 07:08

mikedn added 9 commits October 24, 2018 18:28

Enable cast operand containment

e49f8b2

ARM fixes

ab5bfb0

Avoid using ins_Load/ins_Store with constant type

1e1cdc4

This is only useful in code shared between ARMARCH and XARCH, otherwise it's simpler to just use the required instructions.

Use ldr to load array lengths

e6d6f98

genCodeForArrIndex and genCodeForArrOffset emit a ldrsw on ARM64 but that's not necessary. Array lengths are always positive so just use ldr on both ARM32 and ARM64.

mikedn force-pushed the long-cast branch from 1246cbe to 6c78aea Compare October 24, 2018 15:31

pentp mentioned this pull request Dec 16, 2018

[WIP] Eliminate array/Span index sign-extension #21553

Closed

RussKeldorph removed the cla-already-signed label Jan 11, 2019

AndyAyersMS mentioned this pull request Apr 3, 2019

[WIP] Performance improvements to Span.Slice #23665

Closed

mikedn closed this May 4, 2019

This was referenced Jan 31, 2020

Bad cast morphing dotnet/runtime#8749

Closed

[RyuJIT] Unsigned binary operation with overflow check dotnet/runtime#8783

Closed

Broken conv.ovf.i8.un decomposition dotnet/runtime#10345

Closed

Conversation

mikedn commented Jul 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikedn commented Jul 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikedn commented Jul 8, 2017

Uh oh!

mikedn commented Jul 12, 2017

Uh oh!

CarolEidt commented Jul 12, 2017

Uh oh!

karelz commented Aug 30, 2017

Uh oh!

mikedn commented Aug 30, 2017

Uh oh!

CarolEidt commented Apr 12, 2018

Uh oh!

sandreenko commented Sep 13, 2018

Uh oh!

mikedn commented Sep 14, 2018

Uh oh!

mikedn commented Sep 14, 2018

Uh oh!

mikedn commented Sep 25, 2018

Uh oh!

karelz commented May 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikedn commented May 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mikedn commented Jul 7, 2017 •

edited

Loading

mikedn commented Jul 8, 2017 •

edited

Loading

karelz commented May 4, 2019 •

edited

Loading