Remove the use of Parallel.ForEach in ClsComplianceChecker#23519
Remove the use of Parallel.ForEach in ClsComplianceChecker#23519sharwell merged 4 commits intodotnet:dev15.6.xfrom
Conversation
There was a problem hiding this comment.
Should we consider using a hybrid approach by changing the original check to:
if (_compilation.Options.ConcurrentBuild && symbol.IsGlobalNamespace)?
There was a problem hiding this comment.
We also need matching changes in VB: http://source.roslyn.io/#Microsoft.CodeAnalysis.VisualBasic/Compilation/ClsComplianceChecker.vb,97
There was a problem hiding this comment.
I also noted that we are kicking off parallel foreach for every single named type: http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/Compiler/ClsComplianceChecker.cs,207. I think that is compounding the allocations.
We ran into similar issues when trying to make the analyzer driver concurrently process compilation events. Attempting to process all events concurrently caused huge allocation and perf hit, and we saw optimal performance with when using a worker queue limited by a certain max worker count: http://source.roslyn.io/#Microsoft.CodeAnalysis/DiagnosticAnalyzer/AnalyzerDriver.cs,799. Probably something to consider here?
There was a problem hiding this comment.
Should we consider using a hybrid approach
I did investigate an approach where the VisitNamespace method was not changed but VisitNamedType was. The result showed no improvement over baseline. I stopped looking for hybrid approaches because I couldn't find a scenario that was worse than the one I have here.
There was a problem hiding this comment.
➡️ Updated the Visual Basic version of the same item.
There was a problem hiding this comment.
So now we are completely removing the parallelism in this CLS compliance visitor? Seems fine to me as long as there is no build time regression, but probably someone from @dotnet/roslyn-compiler should take a look.
There was a problem hiding this comment.
@tmat I left a comment on that issue, suggesting that other scenarios would benefit from such an approach.
|
I am not sure whether the issue is allocation or task being starved. but I believe caller of this method is probably already running in parallel (when concurrent build option is on) so, probably no reason for its nested works to be parallel again. especially using Parallel.ForEach since it is blocking API and it can take quite a bit of time if there is another parallel running since it can't find task to run some of its work. block waiting task thread to available. my experience of making some of our feature concurrent is that, if one do multiple level of concurrent using Parallel extension. outcome is slower rather than faster. most of time, doing concurrent from outer most is better. |
43e626f to
6537a7b
Compare
|
@Pilchie for ask mode |
|
@dotnet/roslyn-compiler I need reviews from you as well |
| /// </remarks> | ||
| [Conditional("EMIT_CODE_ANALYSIS_ATTRIBUTES")] | ||
| [AttributeUsage(AttributeTargets.Constructor | AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Field, AllowMultiple = true, Inherited = false)] | ||
| internal sealed class PerformanceSensitiveAttribute : Attribute |
|
@dotnet/roslyn-compiler for reviews |
| /// </remarks> | ||
| [Conditional("EMIT_CODE_ANALYSIS_ATTRIBUTES")] | ||
| [AttributeUsage(AttributeTargets.Constructor | AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Field, AllowMultiple = true, Inherited = false)] | ||
| internal sealed class PerformanceSensitiveAttribute : Attribute |
There was a problem hiding this comment.
PerformanceSensitiveAttribute [](start = 26, length = 29)
It looks like this file already exists in the repo at that same location. Is it just a case of concurrent PRs?
There was a problem hiding this comment.
➡️ It's the same commit. If we want to wait for a new set of builds I can tell the PR to update itself and this addition will disappear from the diff.
There was a problem hiding this comment.
Ok, that's fine as long as you merge and don't squash ;-)
Thanks for the analysis. |
jcouv
left a comment
There was a problem hiding this comment.
LGTM (with question to understand why Parallel.ForEach has such negative impact here)
| CheckMemberDistinctness(symbol); | ||
| } | ||
|
|
||
| if (_compilation.Options.ConcurrentBuild) |
There was a problem hiding this comment.
if (_compilation.Options.ConcurrentBuild) [](start = 12, length = 41)
I am not comfortable with removing this parallelization completely. Even if we didn't add it based on specific scenario, similar approach was definitely proven to improve throughput for other similar cases (when we need to visit all namespaces and types recursively and do some analysis for them). Could we use some heuristic to disable parallelization only when it is likely cause a problem? For example, only when we have cancellable _cancellationToken and probably non-null _filterTree. #Closed
AlekseyTs
left a comment
There was a problem hiding this comment.
I think we should find a way to keep parallelism for compilation only scenarios. See #23519 (comment) for some approaches to consider.
|
Done with review pass (iteration 2). #Closed |
For a reference run which is 1/4 of the size of the run described in #23582, here is some of the overhead caused by
|
|
Over the weekend I ran elapsed time evaluations to compare four approaches:
Since these are time measurements (more subject to noise than allocation measurements), I used
The wall-clock elapsed time numbers lead to a similar, though less pronounced, conclusion:
|
|
@sharwell can you be little bit more specific on "Sequential: No parallelization". is it no parallelization between analyzers and for a analyzer? for example, running 1 analyzer at a time and run all actions sequentially for 1 analyzer? or are you talking about analyzers are running concurrently, but each analyzer run sequentially? or are you talking about analyzer runs sequentially but each analyzer actions run concurrently? or are you talking about specifically this ClsCompilanceChecker being sequential or parallel? |
|
@heejaechang Sequential is the current state of the pull request. |
I also feel that we need to get the before and after results in batch compilation with CLS compliance checker but without any analyzers. I presume this is the case that compiler team cares the most about. |
6537a7b to
b99d0b3
Compare
|
@AlekseyTs @VSadov Please review the updated change. I made the following changes:
The resulting numbers are similar to the _filterTree+NSOnly numbers, but with lower overall allocations. |
|
It is likely that the last change improves the batch compiler perf as well. ParallelFor was a bit too heavy tool for the job there. I am not sure by how much – CLS checks do not take a lot of E2E time, as long as they run concurrently, and from the look of the change that behavior is preserved while implementation is now lighter. It might be interesting to run compiler benchmark – just to see if we got any wins on E2E time. Probably not much if at all, but there should not be any regressions either. The new change does not make me worried. |
| Next | ||
| End If | ||
|
|
||
| For Each m In symbol.Modules |
There was a problem hiding this comment.
❗️ This needs to be updated to match the update in the C# compiler.
| { | ||
| var queue = new ConcurrentQueue<Diagnostic>(); | ||
| var checker = new ClsComplianceChecker(compilation, filterTree, filterSpanWithinTree, queue, cancellationToken); | ||
| if (compilation.Options.ConcurrentBuild) |
There was a problem hiding this comment.
why not check the option in side of constructor of ClsCompilanceChecker? seems it is okay to create the stack in the ctor with readonly?
There was a problem hiding this comment.
I was following the pattern of MethodCompiler. I see that the VB implementation of the same feature initializes the collection in the constructor. I'll move it there for the C# one.
…nefits from C# compiler
|
|
||
| _declaredOrInheritedCompliance = new ConcurrentDictionary<Symbol, Compliance>(); | ||
|
|
||
| if (compilation.Options.ConcurrentBuild) |
There was a problem hiding this comment.
compilation.Options.ConcurrentBuild [](start = 16, length = 35)
Consider also checking if _filterTree is null.
There was a problem hiding this comment.
➡️ I thought about it, but figured it risked regression if the conditions ever changed. The one thing we can count on is it's only needed during concurrent builds.
There was a problem hiding this comment.
📝 If this is important to you, let me know and I'll extract the condition to a property.
There was a problem hiding this comment.
let me know and I'll extract the condition to a property
This feels like the right thing to do. Why allocate when we don't need to?
In reply to: 165447222 [](ancestors = 165447222)
|
Approved to merge via link |
|
Validation builds passed too 👍 |
Customer scenario
Running analyzer during a build is slower than it should be, with the analyzer driver contributing substantial overhead even when the analyzers themselves are lightweight.
Bugs this fixes
Fixes #23459
Workarounds, if any
None needed
Risk
Low.
Performance impact
15-20% reduction in allocations for running IDE analyzers. The benefits extend to other analyzers that call GetDiagnostics, with a 30-70% reduction in execution time for analyzers that depend on compiler diagnostics.
Is this a regression from a previous update?
No.
Root cause analysis
AnalyzerRunner is a new tool for helping us test analyzer performance in isolation.
How was the bug found?
AnalyzerRunner.
Test documentation updated?
No.