Cache identical source generator results#50171
Cache identical source generator results#50171sharwell wants to merge 3 commits intodotnet:masterfrom
Conversation
| s_parsedGeneratedSources.Add(input.Text, tree) | ||
| #End If | ||
|
|
||
| Return tree |
There was a problem hiding this comment.
minor nit. but it seems like we could have an abstract base class for these that has the logic once.
There was a problem hiding this comment.
I moved the logic to the base class.
| /// </remarks> | ||
| public abstract class GeneratorDriver | ||
| { | ||
| private static readonly ConditionalWeakTable<SourceText, SyntaxTree> s_parsedGeneratedSources = new(); |
There was a problem hiding this comment.
Consider that recently we evolved the AnalyzerDriver from a model where we implemented caching via ConditionalWeakTable to an explicit provider model. Why are we choosing to use ConditionalWeakTable here instead of an explicit provider? The latter seems to be the direction we want to go to give more control to the host (IDE or compiler) on how caching should be implemented.
There was a problem hiding this comment.
I wasn't aware of the other work (for reference it was completed in #46508). I used a SyntaxTreeProvider, but it's not clear to me how the IDE would use a different path from compilation considering these are internal APIs.
| { | ||
| if (s_parsedGeneratedSources.TryGetValue(input.Text, out var existingTree) | ||
| && Equals(_state.ParseOptions, existingTree.Options) | ||
| && Equals(fileName, existingTree.FilePath)) |
There was a problem hiding this comment.
Please use an explicit named comparer when comparing string values that represent file paths.
There was a problem hiding this comment.
➡️ This is now fixed.
There was a problem hiding this comment.
Where do we explicitly clear this cache/CWT? For semantic model provider used by the analyzer driver, we have:
There is a clear lifetime when all compilation events have finished processing for each compilation unit, and so we drop the cached semantic model, which provides much stronger control over the cache size. Unless we can do something similar for this caching provider, this seems no better then just using a CWT directly in the generator driver.
There was a problem hiding this comment.
Where do we explicitly clear this cache/CWT?
We don't, which is the basis for this cache working.
... this seems no better then just using a CWT directly in the generator driver.
Functionally, it's exactly the same right now. A difference would only apply if we disabled the cache altogether for command line build scenarios (which doesn't seem to matter but could still be allowed).
There was a problem hiding this comment.
I don't see how a provider based approach helps for your scenario then. There are dual purposes of semantic model provider being hooked up to compilation for analyzer execution - one is that the public API Compilation.GetSemanticModel shares semantic models (so all analyzers share models), other that the analyzer driver itself can clear the cache when events for each tree are fully processed. If there is just a single client for the cache and no way to clear cache (which seems to be your scenario), then using a CWT directly inside generator driver seems better than having this intermediate provider.
There was a problem hiding this comment.
I don't see how a provider based approach helps for your scenario then
It addresses the blocking change request in #50171 (comment). I found the result to be a bit less readable than before but hopefully we can use it as a step towards a conclusion.
There was a problem hiding this comment.
Stating more empirically, #46508 that introduced semantic model provider showed a clear performance win for various analyzer execution benchmarks. I believe any approach taken by this PR needs to demonstrate similar performance wins to justify it over the current checked in implementation or any alternate approaches.
There was a problem hiding this comment.
It addresses the blocking change request in #50171 (comment).
I believe @jaredpar's request was not purely in terms of changing just the implementation to use a provider, but actually mimicing the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider, which optimized cache size and gave real performance improvements over a purely CWT based approach. Your change seems to just introduce an artificial provider, which uses a CWT under the covers but without any of the above benefits, so I am not sure if addresses the core concern.
There was a problem hiding this comment.
For me the issue is about policy and who is controlling it.
From the perspective of batch compilation this cache isn't really needed. The addition of the cache is being driven by IDE needs. The IDE is in the best place to dictate what the lifetime of this cache is, when it should be evicted, etc ...
Pretty much every time we've put a CWT in the compiler layer to do a cache for IDE purposes we seem to regret it. Instead it seems better to structure it such that the IDE can be in control and make better decisions about their caches. That is why I had my recommendation about looking at the provider approach that we've taken elsewhere in the code.
There was a problem hiding this comment.
the stronger lifetime control over the cache that is done is analyzer driver's semantic model provider
I'm not sure that scenario is relevant to this change. Semantic models never need to carry forward from one Compilation to another, while the entire value of this pull request is allowing SyntaxTree implementations to carry forward. The only good time to remove an item from the cache is when the SG is no longer going to provide the same SourceText instance.
There was a problem hiding this comment.
To restate: this is an IDE cache we're adding into the compiler layer. The IDE has no direct, or even indirect, control over the lifetime or semantics of it. From a conceptual level that feels wrong. All the bugs here (cache too big, elements live too long, etc ...) will be IDE bugs and we'll end up fixing everything in the compiler.
That is why my mind went back to the semantic model provider solution. It's putting the control, the policy, etc ... for an IDE cache in the hands of the IDE team.
I'm not 100% set on this on a path forward but my feelings around compiler dictating IDE caches keep coming up.
There was a problem hiding this comment.
My understanding of the plan now is we use this simple approach to mitigate the performance problems in 16.9, and then work to replace it with #50490 for 16.10.
|
Closing this since I don't want to take it through QB for 16.9 |
Avoid reparsing a
SourceTextproduced by a source generator when it uses the sameParseOptionsand hint name.This change automatically optimizes IDE scenarios for source generators that cache their results, e.g. the form in #50172.
For the source generator in Roslyn.sln, this change addresses ~65% of the overhead for repeated invocations of the source generator.