Skip to content

Do not use SourceText indexer when parsing#61662

Closed
Neme12 wants to merge 21 commits intodotnet:mainfrom
Neme12:sourceTextIndexer
Closed

Do not use SourceText indexer when parsing#61662
Neme12 wants to merge 21 commits intodotnet:mainfrom
Neme12:sourceTextIndexer

Conversation

@Neme12
Copy link
Copy Markdown
Contributor

@Neme12 Neme12 commented Jun 2, 2022

Currently, IsConflictMarkerTrivia is the only place in the lexer that uses the indexer of SourceText as opposed to PeekChar() and other methods on SlidingTextWindow. Depending on the implementation of SourceText, indexing might not be optimal - that's why SlidingTextWindow even exists and why it copies the data from the SourceText into its own buffer. This matters especially for StringBuilderText, which is what's returned by SyntaxNode.GetText(), which is what source generators sometimes use to get a SourceText from a compilation root they built up using the syntax APIs.

This PR removes the usage of the indexer in the first commit. In the second commit, I enabled nullability in a few related files. In the 4th and 5th commits, I fixed some minor bugs related to conflict marker parsing that I noticed in the code. I recommend reviewing commit by commit.
EDIT: I reverted the nullability changes and other changes to put them in separate PRs based on feedback.

Here's a benchmark showing the difference in calling CSharpSyntaxTree.ParseText on a StringBuilderText from Roslyn's Syntax.xml.Internal.Generated.cs file. The difference isn't huge, but it's noticable.

|   Method |                  Job |              Runtime |     Mean |   Error |  StdDev | Ratio | RatioSD |
|--------- |--------------------- |--------------------- |---------:|--------:|--------:|------:|--------:|
| ParseOld |             .NET 6.0 |             .NET 6.0 | 125.6 ms | 2.43 ms | 2.49 ms |  1.00 |    0.00 |
| ParseNew |             .NET 6.0 |             .NET 6.0 | 117.6 ms | 2.13 ms | 1.99 ms |  0.93 |    0.03 |
|          |                      |                      |          |         |         |       |         |
| ParseOld |        .NET Core 3.1 |        .NET Core 3.1 | 138.5 ms | 1.91 ms | 1.79 ms |  1.00 |    0.00 |
| ParseNew |        .NET Core 3.1 |        .NET Core 3.1 | 131.3 ms | 2.06 ms | 1.93 ms |  0.95 |    0.02 |
|          |                      |                      |          |         |         |       |         |
| ParseOld | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 143.8 ms | 2.87 ms | 2.82 ms |  1.00 |    0.00 |
| ParseNew | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 137.8 ms | 1.60 ms | 1.49 ms |  0.96 |    0.01 |
Source code of the benchmark

Note: for the purpose of this benchmark, I added a boolean flag to Lexer to switch between the old and the new implementation of IsConflictMarkerTrivia to be able to compare them in a single benchmark.

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.Text;

BenchmarkRunner.Run<ParseTextBenchmark>();

[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net472)]
public class ParseTextBenchmark
{
    private readonly SourceText _sourceText;

    public ParseTextBenchmark()
    {
        SourceText sourceText;

        using (var fileStream = File.OpenRead(@"C:\Users\{username}\source\roslyn\src\Compilers\CSharp\Portable\Generated\CSharpSyntaxGenerator\CSharpSyntaxGenerator.SourceGenerator\Syntax.xml.Internal.Generated.cs"))
            sourceText = SourceText.From(fileStream);

        var syntaxTree = CSharpSyntaxTree.ParseText(sourceText);
        _sourceText = syntaxTree.GetRoot().GetText(Encoding.UTF8);
    }

    [Benchmark(Baseline = true)]
    public SyntaxTree ParseOld()
    {
        return CSharpSyntaxTree.ParseText(
            _sourceText,
            options: null,
            path: "",
            diagnosticOptions: null,
            isGeneratedCode: null,
            toggleUseTextWindow: false,
            cancellationToken: default);
    }

    [Benchmark]
    public SyntaxTree ParseNew()
    {
        return CSharpSyntaxTree.ParseText(
            _sourceText,
            options: null,
            path: "",
            diagnosticOptions: null,
            isGeneratedCode: null,
            toggleUseTextWindow: true,
            cancellationToken: default);
    }
}

@Neme12 Neme12 requested a review from a team as a code owner June 2, 2022 19:05
@ghost ghost added Community The pull request was submitted by a contributor who is not a Microsoft employee. Area-Compilers labels Jun 2, 2022
token = Lex(" <<<<<<< ").First();
Assert.Equal(SyntaxKind.LessThanLessThanToken, token.Kind());
Assert.True(token.HasLeadingTrivia);
Assert.True(token.LeadingTrivia.Single().Kind() == SyntaxKind.WhitespaceTrivia);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I added these lines because the other 4 lines above don't actually verify the check that the conflict marker must be at the beginning of a line, because even if it was at the beginning of the line there, it still wouldn't parse as a conflict marker because of the missing space at the end.


// Keep the new line check last because TextWindow.Reset might need to recopy the buffer,
// although that's rare in practice.
return TextWindow.Position == 0 || SyntaxFacts.IsNewLine(getLastChar());
Copy link
Copy Markdown
Contributor Author

@Neme12 Neme12 Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really rare. At first, I kept this check at the top and tried debugging when parsing Roslyn's Syntax.xml.Internal.Generated.cs and a few other large files, and never hit the breakpoint in TextWindow.Reset where it had to copy stuff (and the benchmark results were the same). So I could copy the check back to the top if someone feels strongly that that is better.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

Can you extract the nrt work into its own PR?

@Neme12
Copy link
Copy Markdown
Contributor Author

Neme12 commented Jun 2, 2022

Ok, I'll have to rebase and do a force push then, give me a second. Ok never mind, you started reviewing anyway so I shouldn't force push right now.

{
var position = TextWindow.Position;
var text = TextWindow.Text;
if (position == 0 || SyntaxFacts.IsNewLine(text[position - 1]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you keep the logic the same, but just use the sliding window? this seems more complex to understand.

Copy link
Copy Markdown
Contributor Author

@Neme12 Neme12 Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is the same and there is even less lines of code here. Which specific part do you feel is more complex?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic before was to check that we were after a newline. Now the logic checks the character we are on first. I'd prefer to keep the old logic as it makes the most sense to me. These markers must be after newlines, so that makes sense as the first thing to check.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

Done with pass.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

NO need to force push. Just remove those commits as followups. We'll be squashing whatever ends up finally being approved.

@Neme12
Copy link
Copy Markdown
Contributor Author

Neme12 commented Jun 3, 2022

@RikkiGibson @jcouv Could I get another pair of eyes on this? Thanks.

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

i can look at this next week :)

@jcouv jcouv requested a review from CyrusNajmabadi January 11, 2023 03:15
@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

@Neme12 do you still want to talk this through?

@CyrusNajmabadi
Copy link
Copy Markdown
Contributor

Moving ot draft for now. @Neme12 If you'd like to pick this back up again, let us know!

@CyrusNajmabadi CyrusNajmabadi marked this pull request as draft February 3, 2023 18:34
@Neme12 Neme12 closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants