Skip to content

Add SPMI benchmarks run collections for tiered and tiered pgo#84483

Merged
AndyAyersMS merged 2 commits intodotnet:mainfrom
AndyAyersMS:SPMIBenchmarksTieredPGO
Apr 10, 2023
Merged

Add SPMI benchmarks run collections for tiered and tiered pgo#84483
AndyAyersMS merged 2 commits intodotnet:mainfrom
AndyAyersMS:SPMIBenchmarksTieredPGO

Conversation

@AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Apr 7, 2023

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo. So benchmark runs now have 3 separate collections.

The new ones are named "run_tiered" and "run_pgo", eg

benchmarks.run.windows.x64.checked.mch
benchmarks.run_tiered.windows.x64.checked.mch
benchmarks.run_pgo.windows.x64.checked.mch

Fixes #68179.

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo.
So benchmark runs now have 3 separate collections.

The new ones are named "run_tiered" and "run_pgo", eg

```
benchmarks.run.windows.x64.checked.mch
benchmarks.run_tiered.windows.x64.checked.mch
benchmarks.run_pgo.windows.x64.checked.mch
```
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2023
@ghost ghost assigned AndyAyersMS Apr 7, 2023
@ghost
Copy link

ghost commented Apr 7, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Add two new run configurations for SPMI benchmarks: tiered and tiered pgo. So benchmark runs now have 3 separate collections.

The new ones are named "run_tiered" and "run_pgo", eg

benchmarks.run.windows.x64.checked.mch
benchmarks.run_tiered.windows.x64.checked.mch
benchmarks.run_pgo.windows.x64.checked.mch
Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Apr 7, 2023

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

@AndyAyersMS
Copy link
Member Author

@BruceForstall PTAL
cc @dotnet/jit-contrib

Ran this on the internal pipeline with temp guid 16ec4bcb-fe04-4c0b-9a72-617e2a2a9aee. Here is a summary of results for one set of three:

;; "classic" collection with tiering disabled
;; mcs.exe -jitflags benchmarks.run.windows.x64.checked.mch

Grouped Flag Appearances (35507 contexts)

bits                count  percent  parsed
0000000048800000     5168   14.55%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000       11    0.03%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000000040800010    30033   84.58%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000002040800010        4    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
00000000c0800010      287    0.81%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED
0000000000800030        1    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.01%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   30328   85.41%  DEBUG_INFO
       4    0.01%  MIN_OPT
   35507  100.00%  SKIP_VERIFICATION
    5179   14.59%  IL_STUB
   35503   99.99%  BBOPT
     290    0.82%  FRAMED
      11    0.03%  PUBLISH_SECRET_PARAM
       4    0.01%  REVERSE_PINVOKE

;; "current default" collection with tiering enabled
;; mcs.exe -jitflags benchmarks.run_tiered.windows.x64.checked.mch

Grouped Flag Appearances (57701 contexts)

bits                count  percent  parsed
0000000048800000     5720    9.91%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000        7    0.01%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000008000800010    34989   60.64%  DEBUG_INFO SKIP_VERIFICATION TIER0
0000000040800010     8608   14.92%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000010040800010     7521   13.03%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1
0000002040800010        4    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
0000008080800010      288    0.50%  DEBUG_INFO SKIP_VERIFICATION FRAMED TIER0
00000100c0800010       36    0.06%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1
0000010040802010      510    0.88%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1
00000100c0802010       13    0.02%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1
0000000000800030        2    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.01%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   51974   90.07%  DEBUG_INFO
       5    0.01%  MIN_OPT
     523    0.91%  OSR
   57701  100.00%  SKIP_VERIFICATION
    5727    9.93%  IL_STUB
   22419   38.85%  BBOPT
     340    0.59%  FRAMED
       7    0.01%  PUBLISH_SECRET_PARAM
       4    0.01%  REVERSE_PINVOKE
   35277   61.14%  TIER0
    8080   14.00%  TIER1

;; "pgo" collection with tiering and tieredPgo enabled
;; mcs.exe -jitflags benchmarks.run_pgo.windows.x64.checked.mch

Grouped Flag Appearances (88881 contexts)

bits                count  percent  parsed
0000000048800000     5842    6.57%  SKIP_VERIFICATION IL_STUB BBOPT
0000000248800000       14    0.02%  SKIP_VERIFICATION IL_STUB BBOPT PUBLISH_SECRET_PARAM
0000008100800010    34987   39.36%  DEBUG_INFO SKIP_VERIFICATION BBINSTR_IF_LOOPS TIER0
0000008020800010    11041   12.42%  DEBUG_INFO SKIP_VERIFICATION BBINSTR TIER0
0000000040800010     8608    9.68%  DEBUG_INFO SKIP_VERIFICATION BBOPT
0000010040800010     2865    3.22%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1
8600010040800010        8    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
a400010040800010      240    0.27%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
c400010040800010    16514   18.58%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
c600010040800010      335    0.38%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
e400010040800010     6643    7.47%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
e600010040800010       28    0.03%  DEBUG_INFO SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
0000002040800010        4    0.00%  DEBUG_INFO SKIP_VERIFICATION BBOPT REVERSE_PINVOKE
0000008180800010      287    0.32%  DEBUG_INFO SKIP_VERIFICATION FRAMED BBINSTR_IF_LOOPS TIER0
00000080a0800010       41    0.05%  DEBUG_INFO SKIP_VERIFICATION BBINSTR FRAMED TIER0
00000100c0800010       22    0.02%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1
c4000100c0800010      203    0.23%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
e4000100c0800010        5    0.01%  DEBUG_INFO SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
c400010040802010      783    0.88%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
c600010040802010       74    0.08%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
e400010040802010      285    0.32%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
e600010040802010       33    0.04%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_METHOD_PROFILE HAS_DYNAMIC_PROFILE
c4000100c0802010       13    0.01%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE
e4000100c0802010        1    0.00%  DEBUG_INFO OSR SKIP_VERIFICATION BBOPT FRAMED TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_CLASS_PROFILE HAS_DYNAMIC_PROFILE
0000000000800030        2    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION
0000000080800030        3    0.00%  DEBUG_INFO MIN_OPT SKIP_VERIFICATION FRAMED

Individual Flag Appearances

   83025   93.41%  DEBUG_INFO
       5    0.01%  MIN_OPT
    1189    1.34%  OSR
   88881  100.00%  SKIP_VERIFICATION
    5856    6.59%  IL_STUB
   11082   12.47%  BBINSTR
   42520   47.84%  BBOPT
     575    0.65%  FRAMED
   35274   39.69%  BBINSTR_IF_LOOPS
      14    0.02%  PUBLISH_SECRET_PARAM
       4    0.00%  REVERSE_PINVOKE
   46356   52.16%  TIER0
   28052   31.56%  TIER1
     478    0.54%  HAS_METHOD_PROFILE
   25165   28.31%  HAS_DYNAMIC_PROFILE
    7235    8.14%  HAS_CLASS_PROFILE
   24917   28.03%  HAS_EDGE_PROFILE
   25165   28.31%  HAS_PGO

@AndyAyersMS
Copy link
Member Author

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

@AndyAyersMS
Copy link
Member Author

Also we are still collecting benchmarks runs with R2R disabled -- wonder if we should revisit that?

@AndyAyersMS
Copy link
Member Author

I was wondering why with tiering enabled we see a reasonably high fraction of methods bypassing tiering. Not we don't have an explicit flag for this so you have to deduce it by the absence of other flags. One such bin is

0000000040800010     8608    9.68%  DEBUG_INFO SKIP_VERIFICATION BBOPT

@EgorBo reminded me that dynamic methods (like the classic regex) are still not eligible for tiered compilation.
See #73594

Sure enough regex seems to be a major contributor to this bin.

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few requests.

Actually, I'm happy to see not many changes were required.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 7, 2023
@BruceForstall
Copy link
Contributor

superpmi_benchmarks.py includes:

collection_command = f"{dotnet_exe} {benchmarks_dll}  --filter \"*\" --corerun {os.path.join(core_root, corerun_exe)} --partition-count {partition_count} " \
                         f"--partition-index {partition_index} --envVars DOTNET_JitName:{shim_name} " \
                         " DOTNET_ZapDisable:1  DOTNET_ReadyToRun:0 " \
                         "--iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 --strategy ColdStart --logBuildOutput"

So, R2R is disabled.

Do the other BDN arguments make sense for these new collections?

@AndyAyersMS
Copy link
Member Author

Do the other BDN arguments make sense for these new collections?

Hmm, good point. Let me see if we can afford to run the benchmarks "normally" or if it takes too long. I may leave things as is for nontiered as the codegen is not timing sensitive.

We are also using a checked runtime which tiers up aggressively and uses the wrong corelib (#60947). So these collections are not as representative of what actually happens as I'd like.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 7, 2023
@kunalspathak
Copy link
Contributor

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

@EgorBo
Copy link
Member

EgorBo commented Apr 8, 2023

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

Meaning that they don't tier up properly?

@AndyAyersMS
Copy link
Member Author

Is there a list of benchmarks we run to get a collection? afair the full run of them takes 8 hours or so

The benchmarks get split into 30 sub pieces and farmed out to helix, so I think we may run them all?

Yes, we run them all with -iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 so they complete quickly.

Meaning that they don't tier up properly?

Certainly seems possible, though with the aggressive tiering up that a checked runtime does, perhaps they do?

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

@AndyAyersMS
Copy link
Member Author

Another option is to just measure all release bits, but that probably makes the collection too brittle (?).

I suppose we could also enable the extra queries in release mode.

@BruceForstall
Copy link
Contributor

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

Is it worth getting this checked in now, basically as-is, since it already is showing additional code variety, and trying to implement Release as a follow-up?

Another option is to just measure all release bits, but that probably makes the collection too brittle (?).
I suppose we could also enable the extra queries in release mode.

Enabling extra queries is Release is an interesting idea. It would be useful to do that, get collections, then ensure they can be replayed (with JitDisasm/JitDump).

@AndyAyersMS
Copy link
Member Author

Let me first try and get this revamped so we are measuring a release runtime/SPC with a checked jit, and then we can see if we can afford to run the benchmarks more realistically.

Is it worth getting this checked in now, basically as-is, since it already is showing additional code variety, and trying to implement Release as a follow-up?

Not a bad idea, as having something now is probably better than having nothing...

@AndyAyersMS
Copy link
Member Author

@BruceForstall addressed your feedback, so take another look when you can.

@AndyAyersMS AndyAyersMS merged commit cf1c8b0 into dotnet:main Apr 10, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SuperPMI benchmarks run should enable tiered compilation and pgo

4 participants