Do not use SourceText indexer when parsing#61662
Do not use SourceText indexer when parsing#61662Neme12 wants to merge 21 commits intodotnet:mainfrom
SourceText indexer when parsing#61662Conversation
| token = Lex(" <<<<<<< ").First(); | ||
| Assert.Equal(SyntaxKind.LessThanLessThanToken, token.Kind()); | ||
| Assert.True(token.HasLeadingTrivia); | ||
| Assert.True(token.LeadingTrivia.Single().Kind() == SyntaxKind.WhitespaceTrivia); |
There was a problem hiding this comment.
Note: I added these lines because the other 4 lines above don't actually verify the check that the conflict marker must be at the beginning of a line, because even if it was at the beginning of the line there, it still wouldn't parse as a conflict marker because of the missing space at the end.
|
|
||
| // Keep the new line check last because TextWindow.Reset might need to recopy the buffer, | ||
| // although that's rare in practice. | ||
| return TextWindow.Position == 0 || SyntaxFacts.IsNewLine(getLastChar()); |
There was a problem hiding this comment.
Really rare. At first, I kept this check at the top and tried debugging when parsing Roslyn's Syntax.xml.Internal.Generated.cs and a few other large files, and never hit the breakpoint in TextWindow.Reset where it had to copy stuff (and the benchmark results were the same). So I could copy the check back to the top if someone feels strongly that that is better.
|
Can you extract the nrt work into its own PR? |
|
|
| { | ||
| var position = TextWindow.Position; | ||
| var text = TextWindow.Text; | ||
| if (position == 0 || SyntaxFacts.IsNewLine(text[position - 1])) |
There was a problem hiding this comment.
can you keep the logic the same, but just use the sliding window? this seems more complex to understand.
There was a problem hiding this comment.
The logic is the same and there is even less lines of code here. Which specific part do you feel is more complex?
There was a problem hiding this comment.
the logic before was to check that we were after a newline. Now the logic checks the character we are on first. I'd prefer to keep the old logic as it makes the most sense to me. These markers must be after newlines, so that makes sense as the first thing to check.
src/Compilers/VisualBasic/Portable/CommandLine/CommandLineDiagnosticFormatter.vb
Outdated
Show resolved
Hide resolved
|
Done with pass. |
|
NO need to force push. Just remove those commits as followups. We'll be squashing whatever ends up finally being approved. |
|
@RikkiGibson @jcouv Could I get another pair of eyes on this? Thanks. |
|
i can look at this next week :) |
|
@Neme12 do you still want to talk this through? |
|
Moving ot draft for now. @Neme12 If you'd like to pick this back up again, let us know! |
Currently,
IsConflictMarkerTriviais the only place in the lexer that uses the indexer ofSourceTextas opposed toPeekChar()and other methods onSlidingTextWindow. Depending on the implementation ofSourceText, indexing might not be optimal - that's whySlidingTextWindoweven exists and why it copies the data from theSourceTextinto its own buffer. This matters especially forStringBuilderText, which is what's returned bySyntaxNode.GetText(), which is what source generators sometimes use to get aSourceTextfrom a compilation root they built up using the syntax APIs.This PR removes the usage of the indexer in the first commit.
In the second commit, I enabled nullability in a few related files. In the 4th and 5th commits, I fixed some minor bugs related to conflict marker parsing that I noticed in the code. I recommend reviewing commit by commit.EDIT: I reverted the nullability changes and other changes to put them in separate PRs based on feedback.
Here's a benchmark showing the difference in calling
CSharpSyntaxTree.ParseTexton aStringBuilderTextfrom Roslyn's Syntax.xml.Internal.Generated.cs file. The difference isn't huge, but it's noticable.Source code of the benchmark
Note: for the purpose of this benchmark, I added a boolean flag to
Lexerto switch between the old and the new implementation ofIsConflictMarkerTriviato be able to compare them in a single benchmark.