Skip to content

Cache identical source generator results#50171

Closed
sharwell wants to merge 3 commits intodotnet:masterfrom
sharwell:cache-sg
Closed

Cache identical source generator results#50171
sharwell wants to merge 3 commits intodotnet:masterfrom
sharwell:cache-sg

Conversation

@sharwell
Copy link
Copy Markdown
Contributor

@sharwell sharwell commented Dec 30, 2020

Avoid reparsing a SourceText produced by a source generator when it uses the same ParseOptions and hint name.

This change automatically optimizes IDE scenarios for source generators that cache their results, e.g. the form in #50172.

For the source generator in Roslyn.sln, this change addresses ~65% of the overhead for repeated invocations of the source generator.

@sharwell sharwell marked this pull request as ready for review December 30, 2020 17:25
s_parsedGeneratedSources.Add(input.Text, tree)
#End If

Return tree
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit. but it seems like we could have an abstract base class for these that has the logic once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the logic to the base class.

/// </remarks>
public abstract class GeneratorDriver
{
private static readonly ConditionalWeakTable<SourceText, SyntaxTree> s_parsedGeneratedSources = new();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider that recently we evolved the AnalyzerDriver from a model where we implemented caching via ConditionalWeakTable to an explicit provider model. Why are we choosing to use ConditionalWeakTable here instead of an explicit provider? The latter seems to be the direction we want to go to give more control to the host (IDE or compiler) on how caching should be implemented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of the other work (for reference it was completed in #46508). I used a SyntaxTreeProvider, but it's not clear to me how the IDE would use a different path from compilation considering these are internal APIs.

{
if (s_parsedGeneratedSources.TryGetValue(input.Text, out var existingTree)
&& Equals(_state.ParseOptions, existingTree.Options)
&& Equals(fileName, existingTree.FilePath))
Copy link
Copy Markdown
Member

@jaredpar jaredpar Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an explicit named comparer when comparing string values that represent file paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This is now fixed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we explicitly clear this cache/CWT? For semantic model provider used by the analyzer driver, we have:

internal void ClearCache(SyntaxTree tree, Compilation compilation)
{
if (_providerCache.TryGetValue(compilation, out var provider))
{
provider.ClearCachedSemanticModel(tree);
}
}
internal void ClearCache(Compilation compilation)
{
_providerCache.Remove(compilation);
}

There is a clear lifetime when all compilation events have finished processing for each compilation unit, and so we drop the cached semantic model, which provides much stronger control over the cache size. Unless we can do something similar for this caching provider, this seems no better then just using a CWT directly in the generator driver.

Copy link
Copy Markdown
Contributor Author

@sharwell sharwell Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we explicitly clear this cache/CWT?

We don't, which is the basis for this cache working.

... this seems no better then just using a CWT directly in the generator driver.

Functionally, it's exactly the same right now. A difference would only apply if we disabled the cache altogether for command line build scenarios (which doesn't seem to matter but could still be allowed).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how a provider based approach helps for your scenario then. There are dual purposes of semantic model provider being hooked up to compilation for analyzer execution - one is that the public API Compilation.GetSemanticModel shares semantic models (so all analyzers share models), other that the analyzer driver itself can clear the cache when events for each tree are fully processed. If there is just a single client for the cache and no way to clear cache (which seems to be your scenario), then using a CWT directly inside generator driver seems better than having this intermediate provider.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how a provider based approach helps for your scenario then

It addresses the blocking change request in #50171 (comment). I found the result to be a bit less readable than before but hopefully we can use it as a step towards a conclusion.

Copy link
Copy Markdown
Contributor

@mavasani mavasani Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating more empirically, #46508 that introduced semantic model provider showed a clear performance win for various analyzer execution benchmarks. I believe any approach taken by this PR needs to demonstrate similar performance wins to justify it over the current checked in implementation or any alternate approaches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It addresses the blocking change request in #50171 (comment).

I believe @jaredpar's request was not purely in terms of changing just the implementation to use a provider, but actually mimicing the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider, which optimized cache size and gave real performance improvements over a purely CWT based approach. Your change seems to just introduce an artificial provider, which uses a CWT under the covers but without any of the above benefits, so I am not sure if addresses the core concern.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the issue is about policy and who is controlling it.

From the perspective of batch compilation this cache isn't really needed. The addition of the cache is being driven by IDE needs. The IDE is in the best place to dictate what the lifetime of this cache is, when it should be evicted, etc ...

Pretty much every time we've put a CWT in the compiler layer to do a cache for IDE purposes we seem to regret it. Instead it seems better to structure it such that the IDE can be in control and make better decisions about their caches. That is why I had my recommendation about looking at the provider approach that we've taken elsewhere in the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider

I'm not sure that scenario is relevant to this change. Semantic models never need to carry forward from one Compilation to another, while the entire value of this pull request is allowing SyntaxTree implementations to carry forward. The only good time to remove an item from the cache is when the SG is no longer going to provide the same SourceText instance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To restate: this is an IDE cache we're adding into the compiler layer. The IDE has no direct, or even indirect, control over the lifetime or semantics of it. From a conceptual level that feels wrong. All the bugs here (cache too big, elements live too long, etc ...) will be IDE bugs and we'll end up fixing everything in the compiler.

That is why my mind went back to the semantic model provider solution. It's putting the control, the policy, etc ... for an IDE cache in the hands of the IDE team.

I'm not 100% set on this on a path forward but my feelings around compiler dictating IDE caches keep coming up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the plan now is we use this simple approach to mitigate the performance problems in 16.9, and then work to replace it with #50490 for 16.10.

@sharwell
Copy link
Copy Markdown
Contributor Author

Closing this since I don't want to take it through QB for 16.9

@sharwell sharwell closed this Jan 29, 2021
@sharwell sharwell deleted the cache-sg branch January 29, 2021 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants