Conversation
|
@dotnet/roslyn-compiler for review please |
| private readonly Action<SourceGeneratorContext> _onExecute; | ||
| private readonly string _sourceOpt; | ||
|
|
||
| public CallbackGenerator(Action<InitializationContext> onInit, Action<SourceGeneratorContext> onExecute, string sourceOpt = "") |
There was a problem hiding this comment.
Rather than sourceOpt why not use string? source = "" and let nullability enforce the optional ness? #Resolved
| { | ||
| private readonly Action<InitializationContext> _onInit; | ||
| private readonly Action<SourceGeneratorContext> _onExecute; | ||
| private readonly string _sourceOpt; |
There was a problem hiding this comment.
Rather than take the source as a parameter to the type why not change _onExecute to be Func<SourceGeneratorContext, string?>? #WontFix
There was a problem hiding this comment.
Most of the uses of this generator don't actually add source. We just increment a counter or whatever to check it was called. Making the callback return source just makes it noiser
In reply to: 434194957 [](ancestors = 434194957)
| public InMemoryAdditionalText(string path, string content) | ||
| { | ||
| Path = path; | ||
| _content = SourceText.From(content); |
There was a problem hiding this comment.
In all other SourceText.From you explicitly passed Encoding.Utf8. Why is this usage different? #Resolved
There was a problem hiding this comment.
Oversight. I'll fix it, but it only actually matters when we're emitting, so we don't hit this in the tests. We have an open tracking note on the board about how to simplify this.
In reply to: 434195219 [](ancestors = 434195219)
| using System.Collections.Generic; | ||
| using System.Text; | ||
| using Microsoft.CodeAnalysis.Text; | ||
| using System.IO; |
src/Compilers/Core/Portable/SourceGeneration/SourceGeneratorContext.cs
Outdated
Show resolved
Hide resolved
| // We pass it to the generators, which will realize any symbols they require. | ||
| compilation = RunGenerators(compilation, Arguments.ParseOptions, generators, additionalTexts, diagnostics); | ||
|
|
||
| // https://github.com/dotnet/roslyn/issues/44087 |
There was a problem hiding this comment.
Why did we remove this comment? Feel like this code still needs to be cleaned up once we have a way to determine if SyntaxTree instances are generated or not.
There was a problem hiding this comment.
So, I started down that path, but it turns out we don't actually need it. @agocke work removing the diagnostics from the syntax tree means we won't have to parse with the config present. Because the syntax trees are deterministically ordered, we should always just be able to count beyond the length of the Arguments.SyntaxTrees to know which were added by generators?
In reply to: 434196389 [](ancestors = 434196389)
There was a problem hiding this comment.
Admittedly, It feels like there is a lot of 'spaghetti' tangle in the way we handle this, various different levels of the compiler adjust the embedded texts and the config options based on their own rules and knowledge of what's coming later. I don't particularly like it, and it feels ripe for a refactor/cleanup pass but this PR at least doesn't make it worse.
In reply to: 434210846 [](ancestors = 434210846,434196389)
There was a problem hiding this comment.
K. Have we checked with @jasonmalinowski on whether or not he feels the IDE team will need to be able to recognize generated SyntaxTree instances over non-generated ones?
There was a problem hiding this comment.
We'll almost definitely need to do it in the future, its just not needed for this particular feature. (The IDE already works with the embedded texts and doesn't need to know they're generated)
In reply to: 434221150 [](ancestors = 434221150)
| if (!sourceFileAnalyzerConfigOptions.IsDefault) | ||
| { | ||
| sourceFileAnalyzerConfigOptions = sourceFileAnalyzerConfigOptions.AddRange(generatedSyntaxTrees.Select(f => analyzerConfigSet.GetOptionsForSourcePath(f.FilePath))); | ||
| } |
There was a problem hiding this comment.
| } | |
| } | |
| ``` #Resolved |
|
|
||
| // Clean up temp files | ||
| CleanupAllGeneratedFiles(src.Path); | ||
| } |
There was a problem hiding this comment.
Need to test some more cases:
- Multiple generated files
- Multiple generated files with identical hint values
- Case where an empty file is returned
Generally do we have tests where the hint path contains known invalid file name characters?
There was a problem hiding this comment.
Added more tests, but ran into #42628, which I've now addresses as part of this PR too.
We aren't currently putting these down on disk, just using them as identifiers in the PDB, so we should be good from a characters perspective. We'll need to address it when we put them to disk though (presumably some form of encoding?)
In reply to: 434220765 [](ancestors = 434220765)
| // We pass it to the generators, which will realize any symbols they require. | ||
| compilation = RunGenerators(compilation, Arguments.ParseOptions, generators, additionalTexts, diagnostics); | ||
|
|
||
| // https://github.com/dotnet/roslyn/issues/44087 |
There was a problem hiding this comment.
K. Have we checked with @jasonmalinowski on whether or not he feels the IDE team will need to be able to recognize generated SyntaxTree instances over non-generated ones?
…he generator name. Add more tests
| [InlineData("partial class D {}", "file1.cs", "partial class D {}", "file2.cs")] // same files, different names | ||
| [InlineData("partial class D {}", "file1.cs", "partial class D {}", "file1.cs")] // same files, same names | ||
| [InlineData("partial class D {}", "file1.cs", "", "file2.cs")] // empty second file | ||
| [InlineData("partial class D {}", "file1.cs", "", " \\ / : * ? \" < > | ")] // illegal path name chars |
There was a problem hiding this comment.
Do we just silently drop the illegal path name characters?
There was a problem hiding this comment.
No, they're allows as a valid hint name. We're just embedding it into the PDB not putting it to disk at the moment. We'll need to encode when we put to disk (filenames, especially on windows, look like a minefield :/).
I'll add text to the second file to make it clear we're using it.
In reply to: 434736894 [](ancestors = 434736894)
There was a problem hiding this comment.
@tmat will portable PDB's handle path names with illegal characters okay?
There was a problem hiding this comment.
@tmat can confirm, but the PDB itself seems fine. Interestingly VS chokes a bit when hitting a file with them: it steps in, and you can actually step over each line and see the output, but no file opens in the IDE itself (no crash or other visible errors though)
In reply to: 434767181 [](ancestors = 434767181)
There was a problem hiding this comment.
I think we should file a follow up item here to resolve for 16.8. Need to take into account the following:
- How does hint path impact the physical file on disk when we're in dump to disk mode
- How do we deal with generators where added files hint paths differ by only illegal file system characters
- How do illegal characters in the path impact other parts of the experience
Don't think we necessarily need to solve this for 16.7 though. One simple approach though is just ban illegal file system characters in hint paths because at the end of the day this is meant to represent a file name on disk 😄
There was a problem hiding this comment.
Portable PDB has no requirements on path strings. Note that all characters except for \0 are valid path characters on Linux.
There was a problem hiding this comment.
BTW, interesting test could be unmatched Unicode surrogates. PDB should handle them as well (and preserve their value). Not sure what happens elsewhere in the system though.
| } | ||
| } | ||
|
|
||
| internal class SimpleGenerator2 : SimpleGenerator |
There was a problem hiding this comment.
Can you explain why we need this class here? It doesn't seem to add anything over SimpleGenerator
|
|
||
| internal GeneratorDriver(GeneratorDriverState state) | ||
| { | ||
| Debug.Assert(state.Generators.GroupBy(s => s.GetType()).Count() == state.Generators.Length); // ensure we don't have duplicate generator types |
There was a problem hiding this comment.
Why do you want to restrict generator types to be unique? Consider the case where someone authors a really handy generator library and two different generators consume it. That seems like a valid case where we would have duplicate generator types.
There was a problem hiding this comment.
So the rationale here was how they're actually loaded from the command line: we pass in a set of references and create the instance ourselves. That means there can only ever be one type per driver instance. That means that we can safely use the type name of the generator as a prefix for disambiguation.
Now we can of course allow multiple of the same type when constructing in tests, but it doesn't actually reflect the way user consume it from the outside. I put it in as a Debug.Assert to ensure that what we're testing is actually what happens in the real case. SourceGeneratorDriver is public and users are free to put the same type in twice if they're using it as an API.
I wasn't sure about this tbh: I'd obviously prefer that we allow multiple of the same type in the tests (it make it much easier) but doesn't reflect the real situation, and gets into weirdness with the prefix. We could make generators provide a unique prefix? But still doesn't stop collisions.
In reply to: 434739560 [](ancestors = 434739560)
There was a problem hiding this comment.
Although writing that I realize that the full name of the type isn't actually enough to be unique right? I can have two types with the same full name in two different assemblies?
In reply to: 434750141 [](ancestors = 434750141,434739560)
There was a problem hiding this comment.
That means there can only ever be one type per driver instance.
Consider the case where I have authored a library Generator.Utility.dll which has an accessible type SimpleGenerator that implements ISourceGenerator. Two different generators could reference this library and expose that type.
Hmmm ... Or do we scan the generator assembly directly and look for derivations of ISourceGenerator? That wouldn't have this problem but if we had a factory model we would have this issue.
There was a problem hiding this comment.
We scan for a combo of derived form ISourceGenerator and marked with the SourceGeneratorAttribute. I'll have to double check, but I believe in your example we'd still only end up with one generator (we use the same mechanism as analyzer discovery, so its what ever that does)
In reply to: 434766641 [](ancestors = 434766641)
|
|
||
| if (AdditionalSources.Contains(hintName)) | ||
| { | ||
| throw new ArgumentException(CodeAnalysisResources.HintNameUniquePerGenerator, nameof(hintName)); |
There was a problem hiding this comment.
Can we file a bug to track this? I don't think this is sustainable long term.
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes. Generators essentially represent a form of distributed development. Any aspect of the implementation which requires them to coordinate, such as not having conflicting hint names, makes authoring reliable generators difficult. The only safe approach would be to always use a GUID at which point it's not much of a hint.
There was a problem hiding this comment.
No, they only have to be unique within a generator. Different generators can have the same hint name.
In reply to: 434765041 [](ancestors = 434765041)
There was a problem hiding this comment.
Gotcha. Assume we have tests for that?
There was a problem hiding this comment.
Yeah Generator_HintName_MustBe_Unique and Generator_HintName_Is_Appended_With_GeneratorName
In reply to: 434770864 [](ancestors = 434770864)
Ensure ordering of syntax trees
|
Well, that ended up being a bigger rabbit hole than I intended to go down, but it did end up fixing a couple of extra bugs (fixes #42627 and #42593). I've ended up going the route that we limit hint names to a specific set of characters ( |
| for (int i = 0; i < _sourcesAdded.Count; i++) | ||
| { | ||
| // check the hashcodes, as thats the comparison we're using to put the names into _hintNames | ||
| if (_sourcesAdded[i].HintName.GetHashCode() == hintName.GetHashCode()) |
There was a problem hiding this comment.
I considered making the GeneratedSourceTexts implement equals and hashcode in terms of the hintName, but they're only equal in the context of a single generator (which is equivalent to a single AdditionalSourcesCollection)
I suppose we could use a custom comparer just in this class and check the hintName? That would allows us to get rid of the two collections and locks? EDIT: we can, but it means we don't get O(1) lookup for things in the collection, because we have to iterate over each time (and thus still need to lock anyway) so i'm not sure it's worth it?
There was a problem hiding this comment.
This seems a bit complicated. Why not just use a Dictionary<string, GeneratedSourceText> instead of having parallel collections here? Could event use a ConcurrentDictionary and avoid all of the lock usage.
There was a problem hiding this comment.
The reason was that Dictionary isn't ordered. This ensures that we get syntax trees back in the order in which they were added. Presumably we could use an ordered dictionary and sort by the hintName, which would at least be consistent, but not the order they were added in. Do you think it matters, as long as its deterministic?
In reply to: 435565199 [](ancestors = 435565199)
There was a problem hiding this comment.
I ended up simplifying and just keeping an array builder. That way we'll keep the ordering as they're added, but don't need the locks of the second collection. It means Remove() and Contains() (and by extension Add()) are all O(N) operations now, as we have to check the whole collection for collisions. If it turns out to be a perf bottleneck we can fix it later on.
This also makes us not-thread safe as a generator could theoretically create a race condition that gets two additions into the collection with the same hintName. This could cause the compiler to crash further on (when adding to the PDBs for instance), but for now I think we're willing to live with that. I mean they could just Environment.FailFast if that was their aim.
In reply to: 435637087 [](ancestors = 435637087,435565199)
| internal AdditionalSourcesCollection() | ||
| { | ||
| _sourcesAdded = PooledDictionary<string, SourceText>.GetInstance(); | ||
| _hintNames = PooledHashSet<string>.GetInstance(); |
There was a problem hiding this comment.
Should we provide an explicit string comparer here? #Resolved
src/Compilers/Core/Portable/SourceGeneration/AdditionalSourcesCollection.cs
Show resolved
Hide resolved
| @@ -13,44 +14,108 @@ namespace Microsoft.CodeAnalysis | |||
| { | |||
| internal sealed class AdditionalSourcesCollection | |||
There was a problem hiding this comment.
Is this a per generator collection?
There was a problem hiding this comment.
Yes, we create a new one for every invocation of every generator.
In reply to: 435563892 [](ancestors = 435563892)
| for (int i = 0; i < _sourcesAdded.Count; i++) | ||
| { | ||
| // check the hashcodes, as thats the comparison we're using to put the names into _hintNames | ||
| if (_sourcesAdded[i].HintName.GetHashCode() == hintName.GetHashCode()) |
There was a problem hiding this comment.
This seems a bit complicated. Why not just use a Dictionary<string, GeneratedSourceText> instead of having parallel collections here? Could event use a ConcurrentDictionary and avoid all of the lock usage.
| for (int i = 0; i < _sourcesAdded.Count; i++) | ||
| { | ||
| // check the hashcodes, as thats the comparison we're using to put the names into _hintNames | ||
| if (_sourcesAdded[i].HintName.GetHashCode() == hintName.GetHashCode()) |
There was a problem hiding this comment.
Can't compare HashCode here because you can get false matches.
There was a problem hiding this comment.
But we're comparing the hash of the string, it'll always be the same if they're considered equal, no? Otherwise the HashSet wouldn't work either?
In reply to: 435565433 [](ancestors = 435565433)
|
|
||
| private static string AppendExtensionIfRequired(string hintName) | ||
| { | ||
| if (!Path.GetExtension(hintName).Equals(".cs", StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
The code for comparing hint names in general is ordinal equality but the path is ignore case. Should be consistent here.
There was a problem hiding this comment.
Good catch. I've made all the hint path comparison be case insensitive as we'll need them to be to support windows anyway. Added more test case theory data to cover it.
In reply to: 435565786 [](ancestors = 435565786)
- Simplify AdditionalSourceCollection - Ensure we always compare case-insensitively for hintPaths
| private readonly PooledDictionary<string, SourceText> _sourcesAdded; | ||
| private readonly ArrayBuilder<GeneratedSourceText> _sourcesAdded; | ||
|
|
||
| private readonly StringComparer _hintNameComparer = StringComparer.OrdinalIgnoreCase; |
There was a problem hiding this comment.
Consider making static readonly #Resolved
| { | ||
| if (_hintNameComparer.Equals(_sourcesAdded[i].HintName, hintName)) | ||
| { | ||
| _sourcesAdded.Remove(_sourcesAdded[i]); |
There was a problem hiding this comment.
| _sourcesAdded.Remove(_sourcesAdded[i]); | |
| _sourcesAdded.RemoveAt(i); | |
| ``` #Resolved |
src/Compilers/Core/Portable/SourceGeneration/AdditionalSourcesCollection.cs
Show resolved
Hide resolved
| // https://github.com/dotnet/roslyn/issues/42627: This needs to be consistently ordered | ||
| ArrayBuilder<GeneratedSourceText> builder = ArrayBuilder<GeneratedSourceText>.GetInstance(); | ||
| foreach (var (hintName, sourceText) in _sourcesAdded) | ||
| if (!Path.GetExtension(hintName).Equals(".cs", StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Did you consider using hintName.EndsWith(...) here? Using GetExtension means that operations like Contains end up being allocating. #Resolved
|
I don't understand why the tests keep passing Encoding.UTF8 to |
|
Becuase the PDB code checks that a SourceText with a file path must have an encoding, or they'll fail at emit. We've talked about making it a default, but not done yet. In reply to: 639759958 [](ancestors = 639759958) |
|
Got it, so the purpose is to write out the source text as UTF-8 when embedding into the PDB? Is there a place where we validate that every incoming source text has an encoding? I couldn't easily find it in this PR. |
|
The actual check happens here in Compilation http://sourceroslyn.io/#Microsoft.CodeAnalysis/Compilation/Compilation.cs,2076 We have another issue about making this better, but it's not actually directly related to the embedded requirement. We're thinking we'll just make SourceText.From use UTF8 by default if it's not specified. In reply to: 639827040 [](ancestors = 639827040) |
That's fine. But what we should avoid is only giving a diagnostic when the sources are embedded as we don't want generators to function when embedding is turned off, then start to break when it's suddenly turned on. |
Nah, it was doing it before anyway, so no change there. Thanks for the review! |
Embed generated source texts in the portable PDB. Allows stepping in etc. in the IDE as well as a dump debugging after compilation.
Also some minor code cleanup that I noticed as I was implementing.