Skip to content

Analyzer driver perf optimization#38793

Merged
mavasani merged 5 commits intodotnet:masterfrom
mavasani:AnalyzerPerfOptimization
Oct 8, 2019
Merged

Analyzer driver perf optimization#38793
mavasani merged 5 commits intodotnet:masterfrom
mavasani:AnalyzerPerfOptimization

Conversation

@mavasani
Copy link
Contributor

@mavasani mavasani commented Sep 22, 2019

This change optimizes the analyzer driver to avoid redundant analyzer callbacks in presence of source file based suppressions to improve the overall performance for analyzer execution. Current editorconfig design allows user to configure diagnostic severities for specific source file(s). If user has suppressed all diagnostics that an analyzer can report on a specific file then executing the analyzer on such a file is redundant and can be avoided by the driver, except for the below cases:

  1. Non-configurable analyzers: If an analyzer reports any non-configurable diagnostic, skipping any callbacks for this analyzer is not possible. User configurations will anyway not be respected and reported diagnostics cannot be suppressed.
  2. Stateful analyzers (Compilation Start/End analyzers and Symbol Start/End analyzers): These analyzers have an analysis scope that is broader than a single file. They are handled separately based on whether or not they have registered to analyzer generate code:
    1. Registered to analyze generated code: Skipping any analyzer callbacks for these analyzers is not possible, it will break the analyzers as they need to analyze all parts of the symbol/compilation to function properly. See analyzer API for configuring generated code callbacks.
    2. Not registered to analyze generated code: Skipping analyzer callbacks for suppressed files for these analyzers is fine. By definition, such analyzers do not require to analyze all files in compilation as it anyways skips generated code.
  3. Symbol analyzers: Symbol analysis callbacks can only be skipped if the analyzer is suppressed on all files corresponding to symbol's declaration locations.

@mavasani mavasani marked this pull request as ready for review September 25, 2019 16:33
@mavasani mavasani requested a review from a team as a code owner September 25, 2019 16:33
@mavasani mavasani requested a review from sharwell September 25, 2019 16:33
@mavasani mavasani requested a review from genlu September 25, 2019 16:33
@mavasani mavasani added this to the 16.4 milestone Sep 25, 2019
@agocke
Copy link
Member

agocke commented Oct 2, 2019

Have you performed any perf analysis on this change?

@mavasani
Copy link
Contributor Author

mavasani commented Oct 2, 2019

Have you performed any perf analysis on this change?

I have verified that build time improves in presence of top level suppression entries (at repo root) for some expensive DFA analyzers. Previously, we executed all callbacks for these analyzers, but now we just do the initialization.

@agocke agocke self-assigned this Oct 3, 2019
@agocke
Copy link
Member

agocke commented Oct 3, 2019

I have verified that build time improves in presence of top level suppression entries (at repo root) for some expensive DFA analyzers. Previously, we executed all callbacks for these analyzers, but now we just do the initialization.

Have you measured if there is an effect on the compile time when no analyzers are suppressed?

@mavasani
Copy link
Contributor Author

mavasani commented Oct 3, 2019

Have you measured if there is an effect on the compile time when no analyzers are suppressed?

I did not measure that. I can do it today.

@agocke
Copy link
Member

agocke commented Oct 3, 2019

@mavasani I just completed some perf tests for analyzer testing. I can run them for you on this PR if you want

@mavasani
Copy link
Contributor Author

mavasani commented Oct 3, 2019

I just completed some perf tests for analyzer testing. I can run them for you on this PR if you want

Great, that would be very helpful! Thanks.

@agocke
Copy link
Member

agocke commented Oct 4, 2019

Got the perf numbers. The timing doesn't seem particularly different, within noise thresholds on my machine, but the allocation numbers are a little worrying. Without this change we allocate 1.48 GB, with it we allocate 1.54 GB. I think this is worth looking at traces to see what's causing such a big jump.

@mavasani
Copy link
Contributor Author

mavasani commented Oct 4, 2019

@agocke Thanks. Would you be able to share both the before and after traces?

@agocke
Copy link
Member

agocke commented Oct 4, 2019

The benchmark by default doesn't gather traces, so I don't have any at the moment. You should be able to generate them yourself though by following the instructions at https://github.com/dotnet/roslyn/wiki/Measuring-Compiler-Performance and passing an extra -- -p etw parameter to the dotnet run command in the instructions.

@mavasani
Copy link
Contributor Author

mavasani commented Oct 8, 2019

@agocke I repeated your perf testing and confirmed the allocation jump with the commits in this PR - thanks, the benchmarks are really handy.

I have pushed an additional commit that brings down the allocations back to original for my local testing. Perf numbers from my 3 test runs:

  1. Perf numbers for this branch with all commits in this PR reverted:
|                      Method | UnrollFactor |        Mean |    Error |   StdDev |      Median |         Min |         Max |       Gen 0 |      Gen 1 |     Gen 2 |  Allocated |
|---------------------------- |------------- |------------:|---------:|---------:|------------:|------------:|------------:|------------:|-----------:|----------:|-----------:|
|       CompileMethodsAndEmit |           16 |  4,519.2 ms | 12.84 ms | 12.01 ms |  4,518.5 ms |  4,502.7 ms |  4,546.4 ms |  82000.0000 | 22000.0000 |         - |   500.3 MB |
|           SerializeMetadata |           16 |    260.2 ms |  2.56 ms |  3.76 ms |    259.7 ms |    255.1 ms |    269.7 ms |   3000.0000 |  1000.0000 |         - |    33.2 MB |
|              GetDiagnostics |            1 |  3,394.6 ms | 19.15 ms | 16.97 ms |  3,389.2 ms |  3,365.5 ms |  3,423.3 ms |  75000.0000 | 19000.0000 |         - |  382.22 MB |
| GetDiagnosticsWithAnalyzers |            1 | 12,561.9 ms | 70.38 ms | 62.39 ms | 12,542.6 ms | 12,485.6 ms | 12,720.5 ms | 255000.0000 | 70000.0000 | 3000.0000 | 1528.55 MB |
  1. Perf numbers with all but the last commit (state when we first did perf testing) - confirms ~600 MB allocation regression.
|                      Method | UnrollFactor |        Mean |     Error |    StdDev |      Median |         Min |         Max |       Gen 0 |      Gen 1 |     Gen 2 |  Allocated |
|---------------------------- |------------- |------------:|----------:|----------:|------------:|------------:|------------:|------------:|-----------:|----------:|-----------:|
|       CompileMethodsAndEmit |           16 |  4,756.6 ms |  16.79 ms |  14.02 ms |  4,757.3 ms |  4,731.3 ms |  4,778.0 ms |  82000.0000 | 22000.0000 |         - |  499.53 MB |
|           SerializeMetadata |           16 |    278.1 ms |   2.75 ms |   6.03 ms |    277.0 ms |    265.9 ms |    292.7 ms |   3000.0000 |  1000.0000 |         - |    33.2 MB |
|              GetDiagnostics |            1 |  3,927.2 ms | 101.67 ms | 298.17 ms |  3,980.9 ms |  3,384.3 ms |  4,678.1 ms |  76000.0000 | 20000.0000 |         - |  382.22 MB |
| GetDiagnosticsWithAnalyzers |            1 | 12,799.7 ms |  71.95 ms |  67.31 ms | 12,783.7 ms | 12,713.2 ms | 12,922.1 ms | 265000.0000 | 72000.0000 | 3000.0000 | 1588.16 MB |
  1. Perf numbers after my latest commit that fixes allocations:
|                      Method | UnrollFactor |        Mean |     Error |    StdDev |      Median |         Min |         Max |       Gen 0 |      Gen 1 |     Gen 2 |  Allocated |
|---------------------------- |------------- |------------:|----------:|----------:|------------:|------------:|------------:|------------:|-----------:|----------:|-----------:|
|       CompileMethodsAndEmit |           16 |  4,567.2 ms |  17.05 ms |  14.24 ms |  4,562.1 ms |  4,549.9 ms |  4,598.3 ms |  82000.0000 | 23000.0000 |         - |  499.53 MB |
|           SerializeMetadata |           16 |    269.1 ms |   2.17 ms |   2.03 ms |    269.1 ms |    265.3 ms |    272.6 ms |   3000.0000 |  1000.0000 |         - |   33.21 MB |
|              GetDiagnostics |            1 |  3,409.7 ms |  34.93 ms |  32.68 ms |  3,400.8 ms |  3,368.1 ms |  3,496.5 ms |  75000.0000 | 19000.0000 |         - |  382.36 MB |
| GetDiagnosticsWithAnalyzers |            1 | 12,612.7 ms | 124.58 ms | 110.44 ms | 12,570.0 ms | 12,482.5 ms | 12,908.0 ms | 255000.0000 | 70000.0000 | 3000.0000 | 1527.76 MB |

Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'm glad the benchmarks were helpful. New numbers LGTM

@mavasani
Copy link
Contributor Author

mavasani commented Oct 8, 2019

@agocke Is there plan to make these benchmarks run by default on all PRs, or atleast the ones that touch the compiler code?

@agocke
Copy link
Member

agocke commented Oct 8, 2019

@mavasani Unfortunately VMs are way too flaky and we don't have physical machines. Maybe Helix will change this in the future.

@mavasani mavasani merged commit 8a97dda into dotnet:master Oct 8, 2019
@mavasani mavasani deleted the AnalyzerPerfOptimization branch October 8, 2019 20:18
@sharwell
Copy link
Contributor

@agocke @mavasani Thank you both for everything here. Great work. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants