Cache identical source generator results by sharwell · Pull Request #50171 · dotnet/roslyn

sharwell · 2020-12-30T17:20:16Z

Avoid reparsing a SourceText produced by a source generator when it uses the same ParseOptions and hint name.

This change automatically optimizes IDE scenarios for source generators that cache their results, e.g. the form in #50172.

For the source generator in Roslyn.sln, this change addresses ~65% of the overhead for repeated invocations of the source generator.

CyrusNajmabadi · 2020-12-30T21:01:35Z

+            s_parsedGeneratedSources.Add(input.Text, tree)
+#End If
+
+            Return tree


minor nit. but it seems like we could have an abstract base class for these that has the logic once.

I moved the logic to the base class.

jaredpar · 2021-01-04T15:46:14Z

    /// </remarks>
    public abstract class GeneratorDriver
    {
+        private static readonly ConditionalWeakTable<SourceText, SyntaxTree> s_parsedGeneratedSources = new();


Consider that recently we evolved the AnalyzerDriver from a model where we implemented caching via ConditionalWeakTable to an explicit provider model. Why are we choosing to use ConditionalWeakTable here instead of an explicit provider? The latter seems to be the direction we want to go to give more control to the host (IDE or compiler) on how caching should be implemented.

I wasn't aware of the other work (for reference it was completed in #46508). I used a SyntaxTreeProvider, but it's not clear to me how the IDE would use a different path from compilation considering these are internal APIs.

jaredpar · 2021-01-04T15:46:54Z

+        {
+            if (s_parsedGeneratedSources.TryGetValue(input.Text, out var existingTree)
+                && Equals(_state.ParseOptions, existingTree.Options)
+                && Equals(fileName, existingTree.FilePath))


Please use an explicit named comparer when comparing string values that represent file paths.

➡️ This is now fixed.

mavasani · 2021-01-14T17:05:33Z

Where do we explicitly clear this cache/CWT? For semantic model provider used by the analyzer driver, we have:

roslyn/src/Compilers/Core/Portable/DiagnosticAnalyzer/CachingSemanticModelProvider.cs

Lines 37 to 48 in 6352ca4

internal void ClearCache(SyntaxTree tree, Compilation compilation)

{

if (_providerCache.TryGetValue(compilation, out var provider))

{

provider.ClearCachedSemanticModel(tree);

}

}

internal void ClearCache(Compilation compilation)

{

_providerCache.Remove(compilation);

}

There is a clear lifetime when all compilation events have finished processing for each compilation unit, and so we drop the cached semantic model, which provides much stronger control over the cache size. Unless we can do something similar for this caching provider, this seems no better then just using a CWT directly in the generator driver.

Where do we explicitly clear this cache/CWT?

We don't, which is the basis for this cache working.

... this seems no better then just using a CWT directly in the generator driver.

Functionally, it's exactly the same right now. A difference would only apply if we disabled the cache altogether for command line build scenarios (which doesn't seem to matter but could still be allowed).

I don't see how a provider based approach helps for your scenario then. There are dual purposes of semantic model provider being hooked up to compilation for analyzer execution - one is that the public API Compilation.GetSemanticModel shares semantic models (so all analyzers share models), other that the analyzer driver itself can clear the cache when events for each tree are fully processed. If there is just a single client for the cache and no way to clear cache (which seems to be your scenario), then using a CWT directly inside generator driver seems better than having this intermediate provider.

I don't see how a provider based approach helps for your scenario then

It addresses the blocking change request in #50171 (comment). I found the result to be a bit less readable than before but hopefully we can use it as a step towards a conclusion.

Stating more empirically, #46508 that introduced semantic model provider showed a clear performance win for various analyzer execution benchmarks. I believe any approach taken by this PR needs to demonstrate similar performance wins to justify it over the current checked in implementation or any alternate approaches.

It addresses the blocking change request in #50171 (comment).

I believe @jaredpar's request was not purely in terms of changing just the implementation to use a provider, but actually mimicing the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider, which optimized cache size and gave real performance improvements over a purely CWT based approach. Your change seems to just introduce an artificial provider, which uses a CWT under the covers but without any of the above benefits, so I am not sure if addresses the core concern.

For me the issue is about policy and who is controlling it.

From the perspective of batch compilation this cache isn't really needed. The addition of the cache is being driven by IDE needs. The IDE is in the best place to dictate what the lifetime of this cache is, when it should be evicted, etc ...

Pretty much every time we've put a CWT in the compiler layer to do a cache for IDE purposes we seem to regret it. Instead it seems better to structure it such that the IDE can be in control and make better decisions about their caches. That is why I had my recommendation about looking at the provider approach that we've taken elsewhere in the code.

the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider

I'm not sure that scenario is relevant to this change. Semantic models never need to carry forward from one Compilation to another, while the entire value of this pull request is allowing SyntaxTree implementations to carry forward. The only good time to remove an item from the cache is when the SG is no longer going to provide the same SourceText instance.

To restate: this is an IDE cache we're adding into the compiler layer. The IDE has no direct, or even indirect, control over the lifetime or semantics of it. From a conceptual level that feels wrong. All the bugs here (cache too big, elements live too long, etc ...) will be IDE bugs and we'll end up fixing everything in the compiler.

That is why my mind went back to the semantic model provider solution. It's putting the control, the policy, etc ... for an IDE cache in the hands of the IDE team.

I'm not 100% set on this on a path forward but my feelings around compiler dictating IDE caches keep coming up.

My understanding of the plan now is we use this simple approach to mitigate the performance problems in 16.9, and then work to replace it with #50490 for 16.10.

sharwell · 2021-01-29T16:06:49Z

Closing this since I don't want to take it through QB for 16.9

sharwell requested a review from a team as a code owner December 30, 2020 17:20

Dotnet-GitSync-Bot added the Area-Compilers label Dec 30, 2020

sharwell marked this pull request as draft December 30, 2020 17:22

sharwell mentioned this pull request Dec 30, 2020

Avoid recomputing generated sources for the same input #50172

Merged

Avoid reparsing the same generated source

7fd7fc9

sharwell force-pushed the cache-sg branch from ce4b14c to 7fd7fc9 Compare December 30, 2020 17:24

sharwell marked this pull request as ready for review December 30, 2020 17:25

CyrusNajmabadi reviewed Dec 30, 2020

View reviewed changes

CyrusNajmabadi approved these changes Dec 30, 2020

View reviewed changes

Move parsed tree caching to GeneratorDriver

c826253

jaredpar requested changes Jan 4, 2021

View reviewed changes

sharwell force-pushed the cache-sg branch from 319d817 to 8661957 Compare January 14, 2021 02:13

mavasani reviewed Jan 14, 2021

View reviewed changes

Fix handling of file paths in GeneratorDriver caching

be60917

sharwell force-pushed the cache-sg branch from 8661957 to be60917 Compare January 14, 2021 23:00

sharwell mentioned this pull request Jan 14, 2021

Replace SyntaxTree caching with IDE document state preservation #50490

Merged

sharwell closed this Jan 29, 2021

sharwell deleted the cache-sg branch January 29, 2021 16:06

	internal void ClearCache(SyntaxTree tree, Compilation compilation)
	{
	if (_providerCache.TryGetValue(compilation, out var provider))
	{
	provider.ClearCachedSemanticModel(tree);
	}
	}

	internal void ClearCache(Compilation compilation)
	{
	_providerCache.Remove(compilation);
	}

Conversation

sharwell commented Dec 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaredpar Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharwell Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mavasani Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharwell commented Jan 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sharwell commented Dec 30, 2020 •

edited

Loading

jaredpar Jan 4, 2021 •

edited

Loading

sharwell Jan 14, 2021 •

edited

Loading

mavasani Jan 14, 2021 •

edited

Loading