Add SPMI benchmarks run collections for tiered and tiered pgo by AndyAyersMS · Pull Request #84483 · dotnet/runtime

AndyAyersMS · 2023-04-07T15:37:14Z

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo. So benchmark runs now have 3 separate collections.

The new ones are named "run_tiered" and "run_pgo", eg

benchmarks.run.windows.x64.checked.mch
benchmarks.run_tiered.windows.x64.checked.mch
benchmarks.run_pgo.windows.x64.checked.mch

Fixes #68179.

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo. So benchmark runs now have 3 separate collections. The new ones are named "run_tiered" and "run_pgo", eg ``` benchmarks.run.windows.x64.checked.mch benchmarks.run_tiered.windows.x64.checked.mch benchmarks.run_pgo.windows.x64.checked.mch ```

ghost · 2023-04-07T15:37:26Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo. So benchmark runs now have 3 separate collections.

The new ones are named "run_tiered" and "run_pgo", eg

benchmarks.run.windows.x64.checked.mch
benchmarks.run_tiered.windows.x64.checked.mch
benchmarks.run_pgo.windows.x64.checked.mch

Author:	AndyAyersMS
Assignees:	AndyAyersMS
Labels:	`area-CodeGen-coreclr`
Milestone:	-

EgorBo · 2023-04-07T15:40:16Z

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

AndyAyersMS · 2023-04-07T15:40:43Z

@BruceForstall PTAL
cc @dotnet/jit-contrib

Ran this on the internal pipeline with temp guid 16ec4bcb-fe04-4c0b-9a72-617e2a2a9aee. Here is a summary of results for one set of three:

;; "classic" collection with tiering disabled
;; mcs.exe -jitflags benchmarks.run.windows.x64.checked.mch

Grouped Flag Appearances (35507 contexts)

bits                count  percent  parsed
0000000048800000     5168   14.55%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000       11    0.03%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000000040800010    30033   84.58%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000002040800010        4    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
00000000c0800010      287    0.81%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED
0000000000800030        1    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.01%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   30328   85.41%  DEBUG_INFO
       4    0.01%  MIN_OPT
   35507  100.00%  SKIP_VERIFICATION
    5179   14.59%  IL_STUB
   35503   99.99%  BBOPT
     290    0.82%  FRAMED
      11    0.03%  PUBLISH_SECRET_PARAM
       4    0.01%  REVERSE_PINVOKE

;; "current default" collection with tiering enabled
;; mcs.exe -jitflags benchmarks.run_tiered.windows.x64.checked.mch

Grouped Flag Appearances (57701 contexts)

bits                count  percent  parsed
0000000048800000     5720    9.91%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000        7    0.01%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000008000800010    34989   60.64%  DEBUG_INFO SKIP_VERIFICATION TIER0
0000000040800010     8608   14.92%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000010040800010     7521   13.03%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1
0000002040800010        4    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
0000008080800010      288    0.50%  DEBUG_INFO SKIP_VERIFICATION FRAMED TIER0
00000100c0800010       36    0.06%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1
0000010040802010      510    0.88%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1
00000100c0802010       13    0.02%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1
0000000000800030        2    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.01%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   51974   90.07%  DEBUG_INFO
       5    0.01%  MIN_OPT
     523    0.91%  OSR
   57701  100.00%  SKIP_VERIFICATION
    5727    9.93%  IL_STUB
   22419   38.85%  BBOPT
     340    0.59%  FRAMED
       7    0.01%  PUBLISH_SECRET_PARAM
       4    0.01%  REVERSE_PINVOKE
   35277   61.14%  TIER0
    8080   14.00%  TIER1

;; "pgo" collection with tiering and tieredPgo enabled
;; mcs.exe -jitflags benchmarks.run_pgo.windows.x64.checked.mch

Grouped Flag Appearances (88881 contexts)

bits                count  percent  parsed
0000000048800000     5842    6.57%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000       14    0.02%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000008100800010    34987   39.36%  DEBUG_INFO SKIP_VERIFICATION BBINSTR_IF_LOOPS TIER0
0000008020800010    11041   12.42%  DEBUG_INFO SKIP_VERIFICATION BBINSTR TIER0
0000000040800010     8608    9.68%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000010040800010     2865    3.22%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1
8600010040800010        8    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
a400010040800010      240    0.27%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
c400010040800010    16514   18.58%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
c600010040800010      335    0.38%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
e400010040800010     6643    7.47%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
e600010040800010       28    0.03%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
0000002040800010        4    0.00%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
0000008180800010      287    0.32%  DEBUG_INFO SKIP_VERIFICATION FRAMED BBINSTR_IF_LOOPS TIER0
00000080a0800010       41    0.05%  DEBUG_INFO SKIP_VERIFICATION BBINSTR FRAMED TIER0
00000100c0800010       22    0.02%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1
c4000100c0800010      203    0.23%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
e4000100c0800010        5    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
c400010040802010      783    0.88%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
c600010040802010       74    0.08%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
e400010040802010      285    0.32%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
e600010040802010       33    0.04%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
c4000100c0802010       13    0.01%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
e4000100c0802010        1    0.00%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
0000000000800030        2    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   83025   93.41%  DEBUG_INFO
       5    0.01%  MIN_OPT
    1189    1.34%  OSR
   88881  100.00%  SKIP_VERIFICATION
    5856    6.59%  IL_STUB
   11082   12.47%  BBINSTR
   42520   47.84%  BBOPT
     575    0.65%  FRAMED
   35274   39.69%  BBINSTR_IF_LOOPS
      14    0.02%  PUBLISH_SECRET_PARAM
       4    0.00%  REVERSE_PINVOKE
   46356   52.16%  TIER0
   28052   31.56%  TIER1
     478    0.54%  HAS_METHOD_PROFILE
   25165   28.31%  HAS_DYNAMIC_PROFILE
    7235    8.14%  HAS_CLASS_PROFILE
   24917   28.03%  HAS_EDGE_PROFILE
   25165   28.31%  HAS_PGO

AndyAyersMS · 2023-04-07T15:57:55Z

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

AndyAyersMS · 2023-04-07T16:24:19Z

Also we are still collecting benchmarks runs with R2R disabled -- wonder if we should revisit that?

AndyAyersMS · 2023-04-07T16:24:39Z

I was wondering why with tiering enabled we see a reasonably high fraction of methods bypassing tiering. Not we don't have an explicit flag for this so you have to deduce it by the absence of other flags. One such bin is

0000000040800010     8608    9.68%  DEBUG_INFO SKIP_VERIFICATION BBOPT

@EgorBo reminded me that dynamic methods (like the classic regex) are still not eligible for tiered compilation.
See #73594

Sure enough regex seems to be a major contributor to this bin.

BruceForstall

Just a few requests.

Actually, I'm happy to see not many changes were required.

src/coreclr/scripts/superpmi.py

src/coreclr/scripts/superpmi_benchmarks.py

src/coreclr/scripts/superpmi_collect_setup.py

BruceForstall · 2023-04-07T16:35:42Z

superpmi_benchmarks.py includes:

collection_command = f"{dotnet_exe} {benchmarks_dll}  --filter \"*\" --corerun {os.path.join(core_root, corerun_exe)} --partition-count {partition_count} " \
                         f"--partition-index {partition_index} --envVars DOTNET_JitName:{shim_name} " \
                         " DOTNET_ZapDisable:1  DOTNET_ReadyToRun:0 " \
                         "--iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 --strategy ColdStart --logBuildOutput"

So, R2R is disabled.

Do the other BDN arguments make sense for these new collections?

AndyAyersMS · 2023-04-07T16:50:46Z

Do the other BDN arguments make sense for these new collections?

Hmm, good point. Let me see if we can afford to run the benchmarks "normally" or if it takes too long. I may leave things as is for nontiered as the codegen is not timing sensitive.

We are also using a checked runtime which tiers up aggressively and uses the wrong corelib (#60947). So these collections are not as representative of what actually happens as I'd like.

kunalspathak · 2023-04-07T23:36:52Z

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

EgorBo · 2023-04-08T00:25:26Z

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

Meaning that they don't tier up properly?

AndyAyersMS · 2023-04-08T02:20:40Z

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

Meaning that they don't tier up properly?

Certainly seems possible, though with the aggressive tiering up that a checked runtime does, perhaps they do?

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

AndyAyersMS · 2023-04-08T02:22:45Z

Another option is to just measure all release bits, but that probably makes the collection too brittle (?).

I suppose we could also enable the extra queries in release mode.

BruceForstall · 2023-04-08T03:15:59Z

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

Is it worth getting this checked in now, basically as-is, since it already is showing additional code variety, and trying to implement Release as a follow-up?

Another option is to just measure all release bits, but that probably makes the collection too brittle (?).
I suppose we could also enable the extra queries in release mode.

Enabling extra queries is Release is an interesting idea. It would be useful to do that, get collections, then ensure they can be replayed (with JitDisasm/JitDump).

AndyAyersMS · 2023-04-08T16:02:21Z

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

Is it worth getting this checked in now, basically as-is, since it already is showing additional code variety, and trying to implement Release as a follow-up?

Not a bad idea, as having something now is probably better than having nothing...

AndyAyersMS · 2023-04-10T01:34:46Z

@BruceForstall addressed your feedback, so take another look when you can.

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2023

ghost assigned AndyAyersMS Apr 7, 2023

AndyAyersMS requested a review from BruceForstall April 7, 2023 15:37

BruceForstall suggested changes Apr 7, 2023

View reviewed changes

src/coreclr/scripts/superpmi.py Show resolved Hide resolved

src/coreclr/scripts/superpmi_benchmarks.py Show resolved Hide resolved

src/coreclr/scripts/superpmi_collect_setup.py Show resolved Hide resolved

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 7, 2023

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 7, 2023

review feedback

d10ac60

BruceForstall approved these changes Apr 10, 2023

View reviewed changes

AndyAyersMS merged commit cf1c8b0 into dotnet:main Apr 10, 2023

ghost locked as resolved and limited conversation to collaborators May 11, 2023

Conversation

AndyAyersMS commented Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Apr 7, 2023

Uh oh!

EgorBo commented Apr 7, 2023

Uh oh!

AndyAyersMS commented Apr 7, 2023

Uh oh!

AndyAyersMS commented Apr 7, 2023

Uh oh!

AndyAyersMS commented Apr 7, 2023

Uh oh!

AndyAyersMS commented Apr 7, 2023

Uh oh!

BruceForstall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BruceForstall commented Apr 7, 2023

Uh oh!

AndyAyersMS commented Apr 7, 2023

Uh oh!

kunalspathak commented Apr 7, 2023

Uh oh!

EgorBo commented Apr 8, 2023

Uh oh!

AndyAyersMS commented Apr 8, 2023

Uh oh!

AndyAyersMS commented Apr 8, 2023

Uh oh!

BruceForstall commented Apr 8, 2023

Uh oh!

AndyAyersMS commented Apr 8, 2023

Uh oh!

AndyAyersMS commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndyAyersMS commented Apr 7, 2023 •

edited

Loading