JIT: modify box/unbox/isinst/castclass expansions for fast jitting by AndyAyersMS · Pull Request #13188 · dotnet/coreclr

AndyAyersMS · 2017-08-03T02:54:15Z

When the jit is generating code in debug/minopts/rare-block/Tier0 modes,
we'd prefer it to generate code more quickly and worry less about overall
generated code performance.

Generally speaking smaller intermediate and final code should correlate
well with faster jitting, as to first order jit time is proportional to the
amount of IR it has to carry around.

This is preliminary work to experiment with different box expansion strategies
to see if there's any notable impact on jit throughput.

AndyAyersMS · 2017-08-03T02:55:50Z

Still exploring options here, but feedback welcome.

@adiaaida what's the magic phrase to test minopts TP?
cc @dotnet/jit-contrib

AndyAyersMS · 2017-08-03T03:02:33Z

Regular jit-diffs shows some small improvement from the rare block logic:

Total bytes of diff: -2598 (-0.02 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
         526 : Microsoft.CodeAnalysis.CSharp.dasm (0.02 % of base)
          38 : System.Security.Cryptography.X509Certificates.dasm (0.04 % of base)
          20 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00 % of base)
           4 : System.Collections.Concurrent.dasm (0.01 % of base)
           3 : System.Threading.dasm (0.02 % of base)
Top file improvements by size (bytes):
       -3121 : System.Private.CoreLib.dasm (-0.09 % of base)
         -41 : Microsoft.CodeAnalysis.dasm (-0.01 % of base)
         -20 : System.Collections.dasm (-0.02 % of base)
          -7 : System.Collections.Immutable.dasm (0.00 % of base)
9 total files with size differences (4 improved, 5 regressed), 70 unchanged.

This also helps clean up some of the craziness seen in #13187 as boxes of large structs on cold paths are now kept compact (while still doing way more work than needed to feed GetType).

Forcing on minopts and running jit-diffs shows larger wins, so hoping maybe these show up in the new minopts TP runs....

Total bytes of diff: -45002 (-0.27 % of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file regressions by size (bytes):
          24 : Microsoft.VisualBasic.dasm (0.01 % of base)

Top file improvements by size (bytes):
      -41354 : System.Private.CoreLib.dasm (-0.82 % of base)
        -993 : System.Linq.Parallel.dasm (-0.13 % of base)
        -633 : System.Collections.dasm (-0.39 % of base)
        -500 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.02 % of base)
        -453 : Microsoft.CodeAnalysis.dasm (-0.05 % of base)

23 total files with size differences (22 improved, 1 regressed), 56 unchanged.

Still some losses to look at. Small structs may benefit from inline expansion, and the best expansion depends both on how the box value is produced and on how the box is consumed.

AndyAyersMS · 2017-08-03T05:50:11Z

Trying to reverse engineer, looks like it might be

@dotnet-bot test Windows_NT x64 min_opt Throughput

(or maybe min_opts, seems like the PR side might be inconsistent?)

AndyAyersMS · 2017-08-03T05:50:40Z

Trying @dotnet-bot test Windows_NT x64 min_opts Throughput

AndyAyersMS · 2017-08-03T05:52:10Z

Trying @dotnet-bot test Windows_NT x64 Throughput

AndyAyersMS · 2017-08-04T20:11:32Z

The TP job data was inconclusive; not surprising given the relatively small expected impact.

To get a more detailed look, I used a variant of the instructions retired explorer to mine ETL on windows. Looking just at corelib,

Jit	Minopts	Inst Retired	Ratio to resp base	Ratio to base min
Base	Yes	1.096e10	--	--
Base	No	1.786e10	--	1.630
Diff	Yes	1.092e10	0.996	0.996
Diff	No	1.787e10	1.000	1.630

So there is no TP impact in regular opts, and a small win in minopts.

FWIW raw data looks like this:

DIFF MIN OPT
InstRetired for crossgen: 166592 events, 1.091777E+010 instrs
Jitting           : 00.00% (0 methods)
Jit-generated code: 00.00%
  Jitted code     : 00.00%

58.47%   6.38E+09   native  clrjit.dll
30.11%   3.29E+09   native  crossgen.exe
05.03%   5.49E+08   native  ntdll.dll
03.34%   3.64E+08   native  ucrtbase.dll
02.80%   3.06E+08   native  ntoskrnl.exe

With full opt, jit time almost doubles over minopts:

BASE FULL OPT
InstRetired for crossgen: 272597 events, 1.786492E+010 instrs
Jitting           : 00.00% (0 methods)
Jit-generated code: 00.00%
  Jitted code     : 00.00%

66.43%   1.19E+10   native  clrjit.dll
21.85%   3.9E+09    native  crossgen.exe
04.99%   8.92E+08   native  ntoskrnl.exe
04.20%   7.5E+08    native  ntdll.dll
02.34%   4.18E+08   native  ucrtbase.dll

AndyAyersMS · 2017-08-04T21:48:24Z

Added similar checks to the isinst/castclass helpers. These kick in a fair amount. Updated stats:

Jit	Minopts	Inst Retired	Ratio to resp base	Ratio to base min
Base	Yes	1.096e10	--	--
Base	No	1.786e10	--	1.630
Diff-Box	Yes	1.092e10	0.996	0.996
Diff-Box	No	1.787e10	1.000	1.630
Diff-Box+Cast	Yes	1.085e10	0.990	0.990
Diff-Box+Cast	No	1.781e10	1.000	1.630

So on coreclib, about a 1% TP improvement for minopts, with 1.9% smaller code.

Jit-diffs under-reports impact on other assemblies because R2R overrides all this logic already. Need to look at fragile prejitting for a broader take, or else assess on destktop.

We can extend this approach to UNBOX and possibly more. Still investigating.

CarolEidt

LGTM - and I really appreciate the restructuring and helpful comments and dumps.

CarolEidt · 2017-08-04T22:50:36Z

src/jit/importer.cpp

+    // case the call can construct its return value directly into the
+    // box payload, saving possibly some up-front zeroing.
+    //
+    // Currently primitive type boxes get inline expanded. We may


Just for clarity, you might want to say "get inline expanded when not in a constrained mode."

Primitive type boxes are always expanded inline currently. Will update the comment.

Not sure if this is always the best way when size constrained -- some more experimenting may be needed here.

AndyAyersMS · 2017-08-05T01:14:16Z

Added similar logic to unbox. This kicks in more broadly in jit-diffs since there is no R2R special casing. Net effect of all 3 on jit-diffs (with MINOPTS) is now:

Total bytes of diff: -146204 (-0.87 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
     -122880 : System.Private.CoreLib.dasm (-2.45 % of base)
      -10799 : System.Linq.Expressions.dasm (-1.45 % of base)
       -3921 : Microsoft.VisualBasic.dasm (-1.83 % of base)
       -1826 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.07 % of base)
       -1284 : Microsoft.CodeAnalysis.dasm (-0.14 % of base)

and the updated corelib TP data (reordered a bit from above) is:

Jit	Minopts	Inst Retired	Ratio to resp base	Ratio to base minopts
Base	Yes	1.096e10	--	--
Diff-Box	Yes	1.092e10	0.996	0.996
Diff-Box+Cast	Yes	1.085e10	0.990	0.990
Diff-Box+Cast+Unbox	Yes	1.081e10	0.986	0.986
Base	No	1.786e10	--	1.630
Diff-Box	No	1.787e10	1.000	1.630
Diff-Box+Cast	No	1.781e10	0.997	1.625
Diff-Box+Cast+Unbox	No	1.788e10	1.001	1.631

There is now a slight regression in non-MINOPTS tp, so it may be that simple unboxes (primitives and structs with 1 field, say) should always be expanded inline as we are doing for simple boxes.

AndyAyersMS · 2017-08-05T15:56:58Z

TP impact might be measurable now, so will give the job another shot. Also retesting the arm legs that should now pass.

@dotnet-bot retest Tizen armel Cross Release Build
@dotnet-bot retest Ubuntu arm Cross Release Build
@dotnet-bot test Windows_NT x64 Throughput

AndyAyersMS · 2017-08-05T16:42:48Z

TP job shows about a 1% net improvement for minopts.

Also a regression for non-minopts. The regression is a bit harder to understand as the only non-minopts impact should come from run rarely blocks. Looks like most of the regression cases are in small assemblies, so this could be in part an artifact of how the overall aggregate score is computed. Will pick one or two and drill in further.

There is a hypothesis of mine that (for minopts anyways, and even generally for maxopt) the time spent jitting is proportional to the size of the jit output. Let's see how well that holds up here.

Looking just at S.P.Corelib where we have detailed data, there is a net 2.45% decrease in native code size. Jit time is around 60% of crossgen time. So if jit time is directly proportional to native code size, we'd expect to see around a (2.45 * .6) = 1.5% improvement in overall throughput.

Inst retired data shows a 1.4% improvement. So the hypothesis that native code size ~ jit time seems to hold up pretty well here.

This can be used as a proxy for evaluating other similar approaches to decreasing minopts time -- or more generally trying to get the jit to generate code as fast as possible, eg for debuggable codegen or as a new Tier0 jitting mode.

Approaches include:

selectively choosing more compact forms during importation (like we do here)
selectively enabling simple optimizations that are likely to decrease code size
running simple optimizations early to try and eliminate code sooner (say, #10731)
introducing new helpers to capture common jit codegen sequences, or reusing existing helpers that have fallen into disuse (things like struct init and struct copy, for instance).

pgavlin · 2017-08-05T17:42:52Z

There is a hypothesis of mine that (for minopts anyways, and even generally for maxopt) the time spent jitting is proportional to the size of the jit output. Let's see how well that holds up here.

It would be interesting to see what the relationship is between the size of the IR at various points in the phase order and the size of the JIT output. I would guess that there is a pretty direct relationship there as well.

pgavlin · 2017-08-05T17:52:36Z

(note that we already have support in the CSV log for reporting the IR size at various phases if COMPlus_JitMeasureIR is set)

AndyAyersMS · 2017-08-05T18:17:21Z

From the TP job: chart of the full opt crossgen time differences. X axis (log scale) is millseconds to crossgen with the baseline jit. Not yet sure what this tells me...

Aggregate difference over these 95 assemblies is about 30ms or 0.1%. Geomean of individual ratios is more like 1.1%. So, losses are biased towards smaller assemblies.

Will post a similar one for minopts when I have time.

We currently expand `GT_INDEX` nodes during morph into an explicit bounds check followed by a load. For example, this tree: ``` [000059] ------------ /--* LCL_VAR int V09 loc6 [000060] R--XG------- /--* INDEX ref [000058] ------------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG------- * ASG ref [000061] D------N---- \--* LCL_VAR ref V10 loc7 ``` is expanded into this tree: ``` [000060] R--XG+------ /--* IND ref [000491] -----+------ | | /--* CNS_INT long 16 Fseq[#FirstElem] [000492] -----+------ | \--* ADD byref [000488] -----+-N---- | | /--* CNS_INT long 3 [000489] -----+------ | | /--* LSH long [000487] -----+------ | | | \--* CAST long <- int [000484] i----+------ | | | \--* LCL_VAR int V09 loc6 [000490] -----+------ | \--* ADD byref [000483] -----+------ | \--* LCL_VAR ref V00 arg0 [000493] ---XG+------ /--* COMMA ref [000486] ---X-+------ | \--* ARR_BOUNDS_CHECK_Rng void [000059] -----+------ | +--* LCL_VAR int V09 loc6 [000485] ---X-+------ | \--* ARR_LENGTH int [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` Even in this simple case where both the array object and the index are lclVars, this represents a rather large increase in the size of the IR. In the worst case, the JIT introduces and additional lclVar for both the array object and the index, adding several additional nodes to the tree. When optimizing, exposing the structure of the array access may be helpful, as it may allow the compiler to better analyze the program. When we are not optimizing, however, the expansion serves little purpose besides constraining the IR shapes that must be handled by the backend. Due to its need for lclVars in the worst case, this expansion may even bloat the size of the generated code, as all lclVar references are generated as loads/stores from/to the stack when we are not optimizing. In the case above, the expanded tree generates the following x64 assembly: ``` IN0018: 000092 mov rdi, gword ptr [V00 rbp-10H] IN0019: 000096 mov edi, dword ptr [rdi+8] IN001a: 000099 cmp dword ptr [V09 rbp-48H], edi IN001b: 00009C jae G_M5106_IG38 IN001c: 0000A2 mov rdi, gword ptr [V00 rbp-10H] IN001d: 0000A6 mov esi, dword ptr [V09 rbp-48H] IN001e: 0000A9 movsxd rsi, esi IN001f: 0000AC mov rdi, gword ptr [rdi+8*rsi+16] IN0020: 0000B1 mov gword ptr [V10 rbp-50H], rdi `` Inspired by other recent experiments (e.g. dotnet#13188), this change introduces a new node that replaces the above expansion in MinOpts. This node, `GT_INDEX_ADDR`, represents the bounds check and address computation involved in an array access, and returns the address of the element that is to be loaded or stored. Using this node, the example tree given above expands to the following: ``` [000489] a--XG+------ /--* IND ref [000059] -----+------ | | /--* LCL_VAR int V09 loc6 [000060] R--XG+--R--- | \--* INDEX_ADDR byref [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` This expansion requires only the addition of the `GT_IND` node that represents the memory access itself. This savings in IR size translates to about a 2% decrease in instructions retired during non-optimizing compilation.

AndyAyersMS · 2017-08-06T18:25:03Z

From the minopts TP job, geomean diff/base time ratio is 0.985, net diff/base ratio is 0.988. No obvious size-related bias in the results, though it appears smaller runs are noisier (not surprising).

Each data point here is the average of 5 runs. But for this kind of thing averaging can be misleading since we don't have any expectation as to the underlying "noise" distribution.

For example, the worst data point below comes from crossgenning System.Memory, which is reported as being 1.6x slower. The per-iteration times are

Jit	Run1	Run2	Run3	Run4	Run5
Diff	46.8424	45.77212	44.81503	45.83143	147.5008
Base	41.76613	41.76357	41.70683	41.68032	41.53919

So the diff Run5 trashes the diff results. If we'd used Median instead to summarize results, we'd have reported a 1.1x slowdown, which seems more representative.

We currently expand `GT_INDEX` nodes during morph into an explicit bounds check followed by a load. For example, this tree: ``` [000059] ------------ /--* LCL_VAR int V09 loc6 [000060] R--XG------- /--* INDEX ref [000058] ------------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG------- * ASG ref [000061] D------N---- \--* LCL_VAR ref V10 loc7 ``` is expanded into this tree: ``` [000060] R--XG+------ /--* IND ref [000491] -----+------ | | /--* CNS_INT long 16 Fseq[#FirstElem] [000492] -----+------ | \--* ADD byref [000488] -----+-N---- | | /--* CNS_INT long 3 [000489] -----+------ | | /--* LSH long [000487] -----+------ | | | \--* CAST long <- int [000484] i----+------ | | | \--* LCL_VAR int V09 loc6 [000490] -----+------ | \--* ADD byref [000483] -----+------ | \--* LCL_VAR ref V00 arg0 [000493] ---XG+------ /--* COMMA ref [000486] ---X-+------ | \--* ARR_BOUNDS_CHECK_Rng void [000059] -----+------ | +--* LCL_VAR int V09 loc6 [000485] ---X-+------ | \--* ARR_LENGTH int [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` Even in this simple case where both the array object and the index are lclVars, this represents a rather large increase in the size of the IR. In the worst case, the JIT introduces and additional lclVar for both the array object and the index, adding several additional nodes to the tree. When optimizing, exposing the structure of the array access may be helpful, as it may allow the compiler to better analyze the program. When we are not optimizing, however, the expansion serves little purpose besides constraining the IR shapes that must be handled by the backend. Due to its need for lclVars in the worst case, this expansion may even bloat the size of the generated code, as all lclVar references are generated as loads/stores from/to the stack when we are not optimizing. In the case above, the expanded tree generates the following x64 assembly: ``` IN0018: 000092 mov rdi, gword ptr [V00 rbp-10H] IN0019: 000096 mov edi, dword ptr [rdi+8] IN001a: 000099 cmp dword ptr [V09 rbp-48H], edi IN001b: 00009C jae G_M5106_IG38 IN001c: 0000A2 mov rdi, gword ptr [V00 rbp-10H] IN001d: 0000A6 mov esi, dword ptr [V09 rbp-48H] IN001e: 0000A9 movsxd rsi, esi IN001f: 0000AC mov rdi, gword ptr [rdi+8*rsi+16] IN0020: 0000B1 mov gword ptr [V10 rbp-50H], rdi ``` Inspired by other recent experiments (e.g. dotnet#13188), this change introduces a new node that replaces the above expansion in MinOpts. This node, `GT_INDEX_ADDR`, represents the bounds check and address computation involved in an array access, and returns the address of the element that is to be loaded or stored. Using this node, the example tree given above expands to the following: ``` [000489] a--XG+------ /--* IND ref [000059] -----+------ | | /--* LCL_VAR int V09 loc6 [000060] R--XG+--R--- | \--* INDEX_ADDR byref [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` This expansion requires only the addition of the `GT_IND` node that represents the memory access itself. This savings in IR size translates to about a 2% decrease in instructions retired during non-optimizing compilation. Furthermore, this expansion tends to generate smaller code; for example, the tree given above is generated in 29 rather than 35 bytes: ``` IN0018: 000092 mov edi, dword ptr [V09 rbp-48H] IN0019: 000095 mov rsi, gword ptr [V00 rbp-10H] IN001a: 000099 cmp rdi, qword ptr [rsi+8] IN001b: 00009D jae G_M5106_IG38 IN001c: 0000A3 lea rsi, bword ptr [rsi+8*rdi+16] IN001d: 0000A8 mov rdi, gword ptr [rsi] IN001e: 0000AB mov gword ptr [V10 rbp-50H], rdi ```

AndyAyersMS · 2017-08-07T23:01:50Z

Looked at the two slowest full-opts cases and the slowest min-opt case from the TP job. Could not repro the slowdowns locally.

Doing some desktop testing and if that looks good I'll remove the WIP.

AndyAyersMS · 2017-08-08T15:27:18Z

@dotnet-bot test Windows_NT minopts

When the jit is generating code in debug/minopts/rare-block modes, we'd prefer it to generate code more quickly and worry less about overall generated code performance. Generally speaking smaller intermediate and final code should correlate well with faster jitting. This change alters the expansions of box, unbox, isinst, and castclass when generating code for minopts, debug, or in rarely run blocks. In such modes the jit estimates whether an inline sequence or general helper call would result in more compact code, and then chooses the smaller sequence. This reduces generated code size around 2.5% in a variety of scenarios, and roughly translates to a 1.5% improvement in time spent jitting. Similar strategies can be applied to other complex operations during importation. That work is forthcoming.

AndyAyersMS · 2017-08-08T19:38:21Z

Desktop testing underway, no issues so far. Just pushed up a squashed version. Removing the WIP.

@dotnet/jit-contrib PTAL

AndyAyersMS · 2017-08-09T20:02:28Z

Desktop tests passed, but after looking over diffs I'm going to back away from using the box helper for rarely run blocks in full opt. It seems tricky to get the logic right here; using the helper call can break the box(value) == null optimization and also may push zeroinits into the prolog.

Since the value of this in full opt is less clear, it seems prudent to just defer for now.

isinst/castclass/unbox don't seem to have such issues, but will revisit just to be sure, once the diffs from unbox are out of the picture.

AndyAyersMS · 2017-08-09T23:25:37Z

No full-opt diffs in jit-diffs. Desktop now shows 23599 total methods with size differences (23530 improved, 69 regressed), 25 unchanged; diffs are minor.

Think this is ready to go.

@JosephTremoulet PTAL
@dotnet/jit-contrib any final thoughts?

JosephTremoulet

LGTM

We currently expand `GT_INDEX` nodes during morph into an explicit bounds check followed by a load. For example, this tree: ``` [000059] ------------ /--* LCL_VAR int V09 loc6 [000060] R--XG------- /--* INDEX ref [000058] ------------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG------- * ASG ref [000061] D------N---- \--* LCL_VAR ref V10 loc7 ``` is expanded into this tree: ``` [000060] R--XG+------ /--* IND ref [000491] -----+------ | | /--* CNS_INT long 16 Fseq[#FirstElem] [000492] -----+------ | \--* ADD byref [000488] -----+-N---- | | /--* CNS_INT long 3 [000489] -----+------ | | /--* LSH long [000487] -----+------ | | | \--* CAST long <- int [000484] i----+------ | | | \--* LCL_VAR int V09 loc6 [000490] -----+------ | \--* ADD byref [000483] -----+------ | \--* LCL_VAR ref V00 arg0 [000493] ---XG+------ /--* COMMA ref [000486] ---X-+------ | \--* ARR_BOUNDS_CHECK_Rng void [000059] -----+------ | +--* LCL_VAR int V09 loc6 [000485] ---X-+------ | \--* ARR_LENGTH int [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` Even in this simple case where both the array object and the index are lclVars, this represents a rather large increase in the size of the IR. In the worst case, the JIT introduces and additional lclVar for both the array object and the index, adding several additional nodes to the tree. When optimizing, exposing the structure of the array access may be helpful, as it may allow the compiler to better analyze the program. When we are not optimizing, however, the expansion serves little purpose besides constraining the IR shapes that must be handled by the backend. Due to its need for lclVars in the worst case, this expansion may even bloat the size of the generated code, as all lclVar references are generated as loads/stores from/to the stack when we are not optimizing. In the case above, the expanded tree generates the following x64 assembly: ``` IN0018: 000092 mov rdi, gword ptr [V00 rbp-10H] IN0019: 000096 mov edi, dword ptr [rdi+8] IN001a: 000099 cmp dword ptr [V09 rbp-48H], edi IN001b: 00009C jae G_M5106_IG38 IN001c: 0000A2 mov rdi, gword ptr [V00 rbp-10H] IN001d: 0000A6 mov esi, dword ptr [V09 rbp-48H] IN001e: 0000A9 movsxd rsi, esi IN001f: 0000AC mov rdi, gword ptr [rdi+8*rsi+16] IN0020: 0000B1 mov gword ptr [V10 rbp-50H], rdi ``` Inspired by other recent experiments (e.g. dotnet#13188), this change introduces a new node that replaces the above expansion in MinOpts. This node, `GT_INDEX_ADDR`, represents the bounds check and address computation involved in an array access, and returns the address of the element that is to be loaded or stored. Using this node, the example tree given above expands to the following: ``` [000489] a--XG+------ /--* IND ref [000059] -----+------ | | /--* LCL_VAR int V09 loc6 [000060] R--XG+--R--- | \--* INDEX_ADDR byref [000058] -----+------ | \--* LCL_VAR ref V00 arg0 [000062] -A-XG+------ * ASG ref [000061] D----+-N---- \--* LCL_VAR ref V10 loc7 ``` This expansion requires only the addition of the `GT_IND` node that represents the memory access itself. This savings in IR size translates to about a 2% decrease in instructions retired during non-optimizing compilation. Furthermore, this expansion tends to generate smaller code; for example, the tree given above is generated in 29 rather than 35 bytes: ``` IN0018: 000092 mov edi, dword ptr [V09 rbp-48H] IN0019: 000095 mov rsi, gword ptr [V00 rbp-10H] IN001a: 000099 cmp rdi, qword ptr [rsi+8] IN001b: 00009D jae G_M5106_IG38 IN001c: 0000A3 lea rsi, bword ptr [rsi+8*rdi+16] IN001d: 0000A8 mov rdi, gword ptr [rsi] IN001e: 0000AB mov gword ptr [V10 rbp-50H], rdi ```

dnfclas added the cla-already-signed label Aug 3, 2017

CarolEidt approved these changes Aug 4, 2017

View reviewed changes

pgavlin mentioned this pull request Aug 6, 2017

Use a smaller expansion of GT_INDEX in MinOpts. #13245

Merged

AndyAyersMS force-pushed the RestrictImporterExpansionInMinopts branch from 56d8362 to 6e33c44 Compare August 8, 2017 19:37

AndyAyersMS changed the title ~~[WIP] Experiment with box expansion strategy for fast jitting~~ JIT: modify box/unbox/isinst/castclass expansions for fast jitting Aug 8, 2017

AndyAyersMS added the JitThroughput label Aug 8, 2017

Keep box strategy as-is when optimizing. Fix diagnostic comment.

aa70c0c

briansull approved these changes Aug 9, 2017

View reviewed changes

JosephTremoulet approved these changes Aug 10, 2017

View reviewed changes

AndyAyersMS mentioned this pull request Aug 10, 2017

JIT: minopts codegen strategy in impRuntimeLookupToTree #13305

Closed

AndyAyersMS merged commit 47b38da into dotnet:master Aug 10, 2017

AndyAyersMS deleted the RestrictImporterExpansionInMinopts branch August 10, 2017 22:17

karelz modified the milestone: 2.1.0 Aug 28, 2017

Conversation

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 3, 2017

Uh oh!

AndyAyersMS commented Aug 4, 2017

Uh oh!

AndyAyersMS commented Aug 4, 2017

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

CarolEidt Aug 4, 2017

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Aug 5, 2017

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commented Aug 5, 2017

Uh oh!

AndyAyersMS commented Aug 5, 2017

Uh oh!

AndyAyersMS commented Aug 5, 2017

Uh oh!

pgavlin commented Aug 5, 2017

Uh oh!

pgavlin commented Aug 5, 2017

Uh oh!

AndyAyersMS commented Aug 5, 2017

Uh oh!

AndyAyersMS commented Aug 6, 2017

Uh oh!

AndyAyersMS commented Aug 7, 2017

Uh oh!

AndyAyersMS commented Aug 8, 2017

Uh oh!

AndyAyersMS commented Aug 8, 2017

Uh oh!

AndyAyersMS commented Aug 9, 2017

Uh oh!

AndyAyersMS commented Aug 9, 2017

Uh oh!

JosephTremoulet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants