Remove the use of Parallel.ForEach in ClsComplianceChecker by sharwell · Pull Request #23519 · dotnet/roslyn

sharwell · 2017-12-01T17:05:59Z

Customer scenario

Running analyzer during a build is slower than it should be, with the analyzer driver contributing substantial overhead even when the analyzers themselves are lightweight.

Bugs this fixes

Fixes #23459

Workarounds, if any

None needed

Risk

Low.

Performance impact

15-20% reduction in allocations for running IDE analyzers. The benefits extend to other analyzers that call GetDiagnostics, with a 30-70% reduction in execution time for analyzers that depend on compiler diagnostics.

Is this a regression from a previous update?

No.

Root cause analysis

AnalyzerRunner is a new tool for helping us test analyzer performance in isolation.

How was the bug found?

AnalyzerRunner.

Test documentation updated?

No.

mavasani · 2017-12-01T17:36:42Z

src/Compilers/CSharp/Portable/Compiler/ClsComplianceChecker.cs

Should we consider using a hybrid approach by changing the original check to:
if (_compilation.Options.ConcurrentBuild && symbol.IsGlobalNamespace)?

We also need matching changes in VB: http://source.roslyn.io/#Microsoft.CodeAnalysis.VisualBasic/Compilation/ClsComplianceChecker.vb,97

I also noted that we are kicking off parallel foreach for every single named type: http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/Compiler/ClsComplianceChecker.cs,207. I think that is compounding the allocations.

We ran into similar issues when trying to make the analyzer driver concurrently process compilation events. Attempting to process all events concurrently caused huge allocation and perf hit, and we saw optimal performance with when using a worker queue limited by a certain max worker count: http://source.roslyn.io/#Microsoft.CodeAnalysis/DiagnosticAnalyzer/AnalyzerDriver.cs,799. Probably something to consider here?

Should we consider using a hybrid approach

I did investigate an approach where the VisitNamespace method was not changed but VisitNamedType was. The result showed no improvement over baseline. I stopped looking for hybrid approaches because I couldn't find a scenario that was worse than the one I have here.

➡️ Updated the Visual Basic version of the same item.

So now we are completely removing the parallelism in this CLS compliance visitor? Seems fine to me as long as there is no build time regression, but probably someone from @dotnet/roslyn-compiler should take a look.

Perhaps we should prioritize this: #3679

@tmat I left a comment on that issue, suggesting that other scenarios would benefit from such an approach.

heejaechang · 2017-12-04T19:43:18Z

I am not sure whether the issue is allocation or task being starved. but I believe caller of this method is probably already running in parallel (when concurrent build option is on) so, probably no reason for its nested works to be parallel again. especially using Parallel.ForEach since it is blocking API and it can take quite a bit of time if there is another parallel running since it can't find task to run some of its work. block waiting task thread to available.

my experience of making some of our feature concurrent is that, if one do multiple level of concurrent using Parallel extension. outcome is slower rather than faster. most of time, doing concurrent from outer most is better.

sharwell · 2017-12-05T15:58:49Z

@Pilchie for ask mode

sharwell · 2017-12-05T16:05:55Z

@dotnet/roslyn-compiler I need reviews from you as well

jasonmalinowski · 2017-12-05T18:14:15Z

src/Compilers/Core/Portable/InternalUtilities/PerformanceSensitiveAttribute.cs

+    /// </remarks>
+    [Conditional("EMIT_CODE_ANALYSIS_ATTRIBUTES")]
+    [AttributeUsage(AttributeTargets.Constructor | AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Field, AllowMultiple = true, Inherited = false)]
+    internal sealed class PerformanceSensitiveAttribute : Attribute


sharwell · 2017-12-06T17:42:56Z

@dotnet/roslyn-compiler for reviews

jcouv · 2017-12-08T18:41:33Z

src/Compilers/Core/Portable/InternalUtilities/PerformanceSensitiveAttribute.cs

+    /// </remarks>
+    [Conditional("EMIT_CODE_ANALYSIS_ATTRIBUTES")]
+    [AttributeUsage(AttributeTargets.Constructor | AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Field, AllowMultiple = true, Inherited = false)]
+    internal sealed class PerformanceSensitiveAttribute : Attribute


PerformanceSensitiveAttribute [](start = 26, length = 29)

It looks like this file already exists in the repo at that same location. Is it just a case of concurrent PRs?

➡️ It's the same commit. If we want to wait for a new set of builds I can tell the PR to update itself and this addition will disappear from the diff.

Ok, that's fine as long as you merge and don't squash ;-)

jcouv · 2017-12-08T18:50:26Z

There is a possibility for a performance regression for compilation-only scenarios with no analyzers involved, for very large projects on concurrent systems. However, during testing I was unable to find a solution for which the parallelization benefits outweighed the allocation overhead of the concurrent approach.

Thanks for the analysis.
Could you share more details about how the Parallel.ForEach creates those overheads. I understand that it would cause more allocations (Tasks), but it is surprising that it would be that much allocations and that on-net the runtime would be reduced by running serially.

jcouv

LGTM (with question to understand why Parallel.ForEach has such negative impact here)

AlekseyTs · 2017-12-08T19:04:23Z

src/Compilers/CSharp/Portable/Compiler/ClsComplianceChecker.cs

                CheckMemberDistinctness(symbol);
            }

-            if (_compilation.Options.ConcurrentBuild)


if (_compilation.Options.ConcurrentBuild) [](start = 12, length = 41)

I am not comfortable with removing this parallelization completely. Even if we didn't add it based on specific scenario, similar approach was definitely proven to improve throughput for other similar cases (when we need to visit all namespaces and types recursively and do some analysis for them). Could we use some heuristic to disable parallelization only when it is likely cause a problem? For example, only when we have cancellable _cancellationToken and probably non-null _filterTree. #Closed

AlekseyTs

I think we should find a way to keep parallelism for compilation only scenarios. See #23519 (comment) for some approaches to consider.

AlekseyTs · 2017-12-08T19:16:57Z

Done with review pass (iteration 2). #Closed

sharwell · 2017-12-08T19:28:59Z

Could you share more details about how the Parallel.ForEach creates those overheads.

For a reference run which is 1/4 of the size of the run described in #23582, here is some of the overhead caused by Parallel.ForEach:

ParallelForReplicaTask: 911MB
IndexRange[]: 258MB
System.Threading.Tasks.Shared<long>: 257MB
ContingentProperties: 151MB
Action<object>: 128MB
Action<Symbol>: 122MB
<>c__DisplayClass17_0<object>: 104MB
ParallelForReplicatingTask: 88MB
Action<int>: 76MB
<>c__DisplayClass31_0<Symbol, Object>: 64MB
Action: 63MB
CancellationCallbackInfo: 60MB
<>c__DisplayClass176_0: 54MB
ParallelOptions: 43MB
RangeManager: 40MB
<>c__DisplayClass6_0<Symbol>: 32MB
ParallelLoopStateFlags32: 23MB
SetOnInvokeMres: 2MB

sharwell · 2017-12-11T17:47:34Z

Over the weekend I ran elapsed time evaluations to compare four approaches:

Baseline
_filterTree: Skip parallelization when _filterTree is not null
Sequential: No parallelization
_filterTree+NSOnly: Only parallelize across namespaces, and skip parallelization when _filterTree is not null

Since these are time measurements (more subject to noise than allocation measurements), I used /iter:10 instead of the normal /iter:4 to help stabilize the results. Here are the results, with all times is milliseconds:

Analyzer	Baseline	_filterTree	Sequential	_filterTree+NSOnly
CSharpAddAccessibilityModifiersDiagnosticAnalyzer:	1635	1930	1735	1888
CSharpAddBracesDiagnosticAnalyzer:	9272	8368	8176	8165
CSharpAsAndNullCheckDiagnosticAnalyzer:	25340	22453	19569	20745
CSharpInlineDeclarationDiagnosticAnalyzer:	225978	171636	159502	168476
CSharpIsAndCastCheckDiagnosticAnalyzer:	685	705	306	492
CSharpNamingStyleDiagnosticAnalyzer:	120882	110507	90558	114473
CSharpOrderModifiersDiagnosticAnalyzer:	2650	2879	3057	2994
CSharpPreferFrameworkTypeDiagnosticAnalyzer:	36416	35503	29867	35437
CSharpQualifyMemberAccessDiagnosticAnalyzer:	38703	33185	28413	33537
CSharpRemoveUnnecessaryCastDiagnosticAnalyzer:	27258	25254	23712	24008
CSharpRemoveUnnecessaryImportsDiagnosticAnalyzer:	194789	45457	35945	44266
CSharpRemoveUnreachableCodeDiagnosticAnalyzer:	178197	38627	29539	39477
CSharpSimplifyTypeNamesDiagnosticAnalyzer:	2036460	1871650	1721344	1812623
CSharpUnboundIdentifiersDiagnosticAnalyzer:	2800	2489	2554	2927
CSharpUseCoalesceExpressionDiagnosticAnalyzer:	3020	2369	1919	2146
CSharpUseCoalesceExpressionForNullableDiagnosticAnalyzer:	3244	2953	2525	3232
CSharpUseCollectionInitializerDiagnosticAnalyzer:	45378	40457	38683	42377
CSharpUseDeconstructionDiagnosticAnalyzer:	35481	33847	30305	34423
CSharpUseDefaultLiteralDiagnosticAnalyzer:	4423	3960	4042	3993
CSharpUseExplicitTypeDiagnosticAnalyzer:	97618	91163	87976	88354
CSharpUseImplicitTypeDiagnosticAnalyzer:	20463	18624	16858	17985
CSharpUseInferredMemberNameDiagnosticAnalyzer:	4727	3695	3982	3885
CSharpUseIsNullCheckDiagnosticAnalyzer:	66877	64729	58624	67451
CSharpUseLocalFunctionDiagnosticAnalyzer:	10653	9208	7224	9177
CSharpUseNullPropagationDiagnosticAnalyzer:	5658	6074	5488	5726
CSharpUseObjectInitializerDiagnosticAnalyzer:	10848	10756	9884	9812
CSharpUseThrowExpressionDiagnosticAnalyzer:	23309	21823	20907	20100
CSharpValidateFormatStringDiagnosticAnalyzer:	20877	20127	19458	19776
InvokeDelegateWithConditionalAccessAnalyzer:	11093	9241	8540	9133
UseExpressionBodyDiagnosticAnalyzer:	50178	51688	40819	48971
Total	3314912	2761357	2511511	2696049

The wall-clock elapsed time numbers lead to a similar, though less pronounced, conclusion:

Baseline	_filterTree	Sequential	_filterTree+NSOnly
233517	223185	216334	218797

heejaechang · 2017-12-11T20:47:40Z

@sharwell can you be little bit more specific on "Sequential: No parallelization".

is it no parallelization between analyzers and for a analyzer? for example, running 1 analyzer at a time and run all actions sequentially for 1 analyzer?

or are you talking about analyzers are running concurrently, but each analyzer run sequentially?

or are you talking about analyzer runs sequentially but each analyzer actions run concurrently?

or are you talking about specifically this ClsCompilanceChecker being sequential or parallel?

sharwell · 2017-12-11T20:51:39Z

@heejaechang Sequential is the current state of the pull request.

mavasani · 2017-12-11T20:52:34Z

or are you talking about specifically this ClsCompilanceChecker being sequential or parallel?

I also feel that we need to get the before and after results in batch compilation with CLS compliance checker but without any analyzers. I presume this is the case that compiler team cares the most about.

Fixes dotnet#23459

sharwell · 2018-01-31T16:52:29Z

@AlekseyTs @VSadov Please review the updated change. I made the following changes:

Re-enable parallelization across types
Disable parallelization when information is requested only for a single syntax tree
Use the lightweight fork-join approach from MethodCompiler instead of using Parallel.ForEach

The resulting numbers are similar to the _filterTree+NSOnly numbers, but with lower overall allocations.

VSadov · 2018-01-31T21:31:50Z

It is likely that the last change improves the batch compiler perf as well. ParallelFor was a bit too heavy tool for the job there.

I am not sure by how much – CLS checks do not take a lot of E2E time, as long as they run concurrently, and from the look of the change that behavior is preserved while implementation is now lighter. It might be interesting to run compiler benchmark – just to see if we got any wins on E2E time. Probably not much if at all, but there should not be any regressions either.

The new change does not make me worried.

VSadov

LGTM

sharwell · 2018-01-31T21:37:31Z

src/Compilers/VisualBasic/Portable/Compilation/ClsComplianceChecker.vb

-                Next
-            End If
-
+            For Each m In symbol.Modules


❗️ This needs to be updated to match the update in the C# compiler.

heejaechang · 2018-01-31T21:44:49Z

src/Compilers/CSharp/Portable/Compiler/ClsComplianceChecker.cs

        {
            var queue = new ConcurrentQueue<Diagnostic>();
            var checker = new ClsComplianceChecker(compilation, filterTree, filterSpanWithinTree, queue, cancellationToken);
+            if (compilation.Options.ConcurrentBuild)


why not check the option in side of constructor of ClsCompilanceChecker? seems it is okay to create the stack in the ctor with readonly?

I was following the pattern of MethodCompiler. I see that the VB implementation of the same feature initializes the collection in the constructor. I'll move it there for the C# one.

…nefits from C# compiler

AlekseyTs · 2018-02-01T18:28:25Z

src/Compilers/CSharp/Portable/Compiler/ClsComplianceChecker.cs


            _declaredOrInheritedCompliance = new ConcurrentDictionary<Symbol, Compliance>();
+
+            if (compilation.Options.ConcurrentBuild)


compilation.Options.ConcurrentBuild [](start = 16, length = 35)

Consider also checking if _filterTree is null.

➡️ I thought about it, but figured it risked regression if the conditions ever changed. The one thing we can count on is it's only needed during concurrent builds.

📝 If this is important to you, let me know and I'll extract the condition to a property.

let me know and I'll extract the condition to a property

This feels like the right thing to do. Why allocate when we don't need to?

In reply to: 165447222 [](ancestors = 165447222)

Obsolete

AlekseyTs

LGTM (iteration 3)

…king

jinujoseph · 2018-02-02T07:36:09Z

Approved to merge via link
Merge pending Val +RPS verification

sharwell · 2018-02-02T18:04:19Z

Validation builds passed too 👍

sharwell added this to the 15.6 milestone Dec 1, 2017

sharwell self-assigned this Dec 1, 2017

sharwell requested review from a team December 1, 2017 17:05

mavasani reviewed Dec 1, 2017

View reviewed changes

heejaechang approved these changes Dec 4, 2017

View reviewed changes

sharwell force-pushed the optimize-clscompliant branch from 43e626f to 6537a7b Compare December 5, 2017 12:56

sharwell requested a review from a team as a code owner December 5, 2017 12:56

mavasani approved these changes Dec 5, 2017

View reviewed changes

jasonmalinowski reviewed Dec 5, 2017

View reviewed changes

sharwell added the Urgency-Soon label Dec 6, 2017

sharwell added the Area-Compilers label Dec 8, 2017

jcouv reviewed Dec 8, 2017

View reviewed changes

jcouv approved these changes Dec 8, 2017

View reviewed changes

AlekseyTs reviewed Dec 8, 2017

View reviewed changes

AlekseyTs previously requested changes Dec 8, 2017

View reviewed changes

sharwell added 2 commits January 31, 2018 07:55

Remove the use of Parallel.ForEach in ClsComplianceChecker

317dba8

Fixes dotnet#23459

Parallelize namespace members when analyzing the entire compilation

b99d0b3

sharwell force-pushed the optimize-clscompliant branch from 6537a7b to b99d0b3 Compare January 31, 2018 16:45

sharwell requested a review from a team as a code owner January 31, 2018 16:45

sharwell changed the base branch from master to dev15.6.x January 31, 2018 16:50

VSadov approved these changes Jan 31, 2018

View reviewed changes

sharwell commented Jan 31, 2018

View reviewed changes

heejaechang reviewed Jan 31, 2018

View reviewed changes

Update VB compiler implementation of ClsComplianceChecker to match be…

783590d

…nefits from C# compiler

jinujoseph added the Needs Shiproom Approval label Feb 1, 2018

AlekseyTs reviewed Feb 1, 2018

View reviewed changes

AlekseyTs approved these changes Feb 1, 2018

View reviewed changes

Extract a property for the condition that enables concurrent CLS chec…

8e26e17

…king

jinujoseph added Approved to merge and removed Needs Shiproom Approval labels Feb 2, 2018

sharwell merged commit 644cb4f into dotnet:dev15.6.x Feb 2, 2018

sharwell deleted the optimize-clscompliant branch February 2, 2018 18:04

sharwell mentioned this pull request Feb 13, 2018

Improve "Time to lightbulb menu" #22399

Closed


		_declaredOrInheritedCompliance = new ConcurrentDictionary<Symbol, Compliance>();

		if (compilation.Options.ConcurrentBuild)

Conversation

sharwell commented Dec 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Customer scenario

Bugs this fixes

Workarounds, if any

Risk

Performance impact

Is this a regression from a previous update?

Root cause analysis

How was the bug found?

Test documentation updated?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heejaechang commented Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharwell commented Dec 5, 2017

Uh oh!

sharwell commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharwell commented Dec 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcouv commented Dec 8, 2017

Uh oh!

jcouv left a comment

Choose a reason for hiding this comment

Uh oh!

AlekseyTs Dec 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlekseyTs left a comment

Choose a reason for hiding this comment

Uh oh!

AlekseyTs commented Dec 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharwell commented Dec 8, 2017

Uh oh!

sharwell commented Dec 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heejaechang commented Dec 11, 2017

Uh oh!

sharwell commented Dec 11, 2017

Uh oh!

mavasani commented Dec 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sharwell commented Jan 31, 2018

Uh oh!

VSadov commented Jan 31, 2018

Uh oh!

VSadov left a comment

Choose a reason for hiding this comment

Uh oh!

sharwell commented Dec 1, 2017 •

edited

Loading

heejaechang commented Dec 4, 2017 •

edited

Loading

AlekseyTs Dec 8, 2017 •

edited

Loading

AlekseyTs commented Dec 8, 2017 •

edited

Loading

sharwell commented Dec 11, 2017 •

edited

Loading

mavasani commented Dec 11, 2017 •

edited

Loading