Do not use `SourceText` indexer when parsing by Neme12 · Pull Request #61662 · dotnet/roslyn

Neme12 · 2022-06-02T19:05:12Z

Currently, IsConflictMarkerTrivia is the only place in the lexer that uses the indexer of SourceText as opposed to PeekChar() and other methods on SlidingTextWindow. Depending on the implementation of SourceText, indexing might not be optimal - that's why SlidingTextWindow even exists and why it copies the data from the SourceText into its own buffer. This matters especially for StringBuilderText, which is what's returned by SyntaxNode.GetText(), which is what source generators sometimes use to get a SourceText from a compilation root they built up using the syntax APIs.

This PR removes the usage of the indexer in the first commit. In the second commit, I enabled nullability in a few related files. In the 4th and 5th commits, I fixed some minor bugs related to conflict marker parsing that I noticed in the code. I recommend reviewing commit by commit.
EDIT: I reverted the nullability changes and other changes to put them in separate PRs based on feedback.

Here's a benchmark showing the difference in calling CSharpSyntaxTree.ParseText on a StringBuilderText from Roslyn's Syntax.xml.Internal.Generated.cs file. The difference isn't huge, but it's noticable.

|   Method |                  Job |              Runtime |     Mean |   Error |  StdDev | Ratio | RatioSD |
|--------- |--------------------- |--------------------- |---------:|--------:|--------:|------:|--------:|
| ParseOld |             .NET 6.0 |             .NET 6.0 | 125.6 ms | 2.43 ms | 2.49 ms |  1.00 |    0.00 |
| ParseNew |             .NET 6.0 |             .NET 6.0 | 117.6 ms | 2.13 ms | 1.99 ms |  0.93 |    0.03 |
|          |                      |                      |          |         |         |       |         |
| ParseOld |        .NET Core 3.1 |        .NET Core 3.1 | 138.5 ms | 1.91 ms | 1.79 ms |  1.00 |    0.00 |
| ParseNew |        .NET Core 3.1 |        .NET Core 3.1 | 131.3 ms | 2.06 ms | 1.93 ms |  0.95 |    0.02 |
|          |                      |                      |          |         |         |       |         |
| ParseOld | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 143.8 ms | 2.87 ms | 2.82 ms |  1.00 |    0.00 |
| ParseNew | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 137.8 ms | 1.60 ms | 1.49 ms |  0.96 |    0.01 |

Source code of the benchmark

Note: for the purpose of this benchmark, I added a boolean flag to Lexer to switch between the old and the new implementation of IsConflictMarkerTrivia to be able to compare them in a single benchmark.

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.Text;

BenchmarkRunner.Run<ParseTextBenchmark>();

[SimpleJob(RuntimeMoniker.Net60)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
[SimpleJob(RuntimeMoniker.Net472)]
public class ParseTextBenchmark
{
    private readonly SourceText _sourceText;

    public ParseTextBenchmark()
    {
        SourceText sourceText;

        using (var fileStream = File.OpenRead(@"C:\Users\{username}\source\roslyn\src\Compilers\CSharp\Portable\Generated\CSharpSyntaxGenerator\CSharpSyntaxGenerator.SourceGenerator\Syntax.xml.Internal.Generated.cs"))
            sourceText = SourceText.From(fileStream);

        var syntaxTree = CSharpSyntaxTree.ParseText(sourceText);
        _sourceText = syntaxTree.GetRoot().GetText(Encoding.UTF8);
    }

    [Benchmark(Baseline = true)]
    public SyntaxTree ParseOld()
    {
        return CSharpSyntaxTree.ParseText(
            _sourceText,
            options: null,
            path: "",
            diagnosticOptions: null,
            isGeneratedCode: null,
            toggleUseTextWindow: false,
            cancellationToken: default);
    }

    [Benchmark]
    public SyntaxTree ParseNew()
    {
        return CSharpSyntaxTree.ParseText(
            _sourceText,
            options: null,
            path: "",
            diagnosticOptions: null,
            isGeneratedCode: null,
            toggleUseTextWindow: true,
            cancellationToken: default);
    }
}

…eTrivia

Neme12 · 2022-06-02T19:07:15Z

src/Compilers/CSharp/Test/Syntax/LexicalAndXml/LexicalTests.cs

+            token = Lex(" <<<<<<< ").First();
+            Assert.Equal(SyntaxKind.LessThanLessThanToken, token.Kind());
+            Assert.True(token.HasLeadingTrivia);
+            Assert.True(token.LeadingTrivia.Single().Kind() == SyntaxKind.WhitespaceTrivia);


Note: I added these lines because the other 4 lines above don't actually verify the check that the conflict marker must be at the beginning of a line, because even if it was at the beginning of the line there, it still wouldn't parse as a conflict marker because of the missing space at the end.

Neme12 · 2022-06-02T19:13:36Z

src/Compilers/CSharp/Portable/Parser/Lexer.cs

+
+            // Keep the new line check last because TextWindow.Reset might need to recopy the buffer,
+            // although that's rare in practice.
+            return TextWindow.Position == 0 || SyntaxFacts.IsNewLine(getLastChar());


Really rare. At first, I kept this check at the top and tried debugging when parsing Roslyn's Syntax.xml.Internal.Generated.cs and a few other large files, and never hit the breakpoint in TextWindow.Reset where it had to copy stuff (and the benchmark results were the same). So I could copy the check back to the top if someone feels strongly that that is better.

CyrusNajmabadi · 2022-06-02T19:13:59Z

Can you extract the nrt work into its own PR?

Neme12 · 2022-06-02T19:16:20Z

~~Ok, I'll have to rebase and do a force push then, give me a second.~~ Ok never mind, you started reviewing anyway so I shouldn't force push right now.

CyrusNajmabadi · 2022-06-02T19:16:39Z

src/Compilers/CSharp/Portable/Parser/Lexer.cs

        {
-            var position = TextWindow.Position;
-            var text = TextWindow.Text;
-            if (position == 0 || SyntaxFacts.IsNewLine(text[position - 1]))


can you keep the logic the same, but just use the sliding window? this seems more complex to understand.

The logic is the same and there is even less lines of code here. Which specific part do you feel is more complex?

the logic before was to check that we were after a newline. Now the logic checks the character we are on first. I'd prefer to keep the old logic as it makes the most sense to me. These markers must be after newlines, so that makes sense as the first thing to check.

src/Compilers/VisualBasic/Portable/CommandLine/CommandLineDiagnosticFormatter.vb

src/Compilers/CSharp/Portable/Parser/Lexer.cs

CyrusNajmabadi · 2022-06-02T19:21:05Z

Done with pass.

CyrusNajmabadi · 2022-06-02T19:21:44Z

NO need to force push. Just remove those commits as followups. We'll be squashing whatever ends up finally being approved.

This reverts commit 46f9c29.

…ourceText` instead

Neme12 · 2022-06-03T21:47:39Z

@RikkiGibson @jcouv Could I get another pair of eyes on this? Thanks.

CyrusNajmabadi · 2022-06-03T22:03:22Z

i can look at this next week :)

This reverts commit 76beb3c.

This reverts commit dce6180.

This reverts commit 9f89605.

This reverts commit 507552d.

…EndOfLineTrivia" This reverts commit b51149a.

This reverts commit f2da6ce.

This reverts commit 2257e17.

CyrusNajmabadi · 2023-01-11T03:25:17Z

@Neme12 do you still want to talk this through?

CyrusNajmabadi · 2023-02-03T18:34:21Z

Moving ot draft for now. @Neme12 If you'd like to pick this back up again, let us know!

Neme12 added 5 commits June 2, 2022 16:19

Do not use SourceText's indexer when parsing conflict markers

65ec2a8

Enable nullable in the lexer and a few other places

46f9c29

Add an assert and advance position immediately without looping

2257e17

Fix bug with char.MaxValue being treated as end of line

f2da6ce

Fix bug with multiple line endings being treated as a single EndOfLin…

b51149a

…eTrivia

Neme12 requested a review from a team as a code owner June 2, 2022 19:05

ghost added Community The pull request was submitted by a contributor who is not a Microsoft employee. Area-Compilers labels Jun 2, 2022

Neme12 commented Jun 2, 2022

View reviewed changes

CyrusNajmabadi reviewed Jun 2, 2022

View reviewed changes

src/Compilers/VisualBasic/Portable/CommandLine/CommandLineDiagnosticFormatter.vb Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed Jun 2, 2022

View reviewed changes

src/Compilers/CSharp/Portable/Parser/Lexer.cs Outdated Show resolved Hide resolved

CyrusNajmabadi reviewed Jun 2, 2022

View reviewed changes

src/Compilers/CSharp/Portable/Parser/Lexer.cs Show resolved Hide resolved

Neme12 added 3 commits June 2, 2022 22:00

Revert "Enable nullable in the lexer and a few other places"

12d068f

This reverts commit 46f9c29.

Fix issue with VB ScanConflictMarkerEndOfLine change

507552d

Add an assert

9f89605

Neme12 mentioned this pull request Jun 3, 2022

Nullable annotate the lexer and a few related files #61688

Merged

Neme12 added 3 commits June 3, 2022 22:57

Remove BannedSymbols addition and prevent the lexer from accessing `S…

af8bc88

…ourceText` instead

Remove double evaluation of PeekChar in many places

dce6180

Remove culture sensitive operations

76beb3c

Neme12 added 3 commits June 4, 2022 00:06

Revert suppressions in CommandLineDiagnosticFormatter

4a32e55

Merge remote-tracking branch 'upstream/main' into sourceTextIndexer

9fa0ceb

Revert "Remove culture sensitive operations"

e25fa43

This reverts commit 76beb3c.

Neme12 added 7 commits June 4, 2022 23:17

Revert "Remove double evaluation of PeekChar in many places"

5cd6c1f

This reverts commit dce6180.

Revert "Add an assert"

8e47ce4

This reverts commit 9f89605.

Revert "Fix issue with VB ScanConflictMarkerEndOfLine change"

0cdce2b

This reverts commit 507552d.

Revert "Fix bug with multiple line endings being treated as a single …

ef93f33

…EndOfLineTrivia" This reverts commit b51149a.

Revert "Fix bug with char.MaxValue being treated as end of line"

53d6e4c

This reverts commit f2da6ce.

Revert "Add an assert and advance position immediately without looping"

5f2bdf8

This reverts commit 2257e17.

Simplify

e90f444

jcouv requested a review from CyrusNajmabadi January 11, 2023 03:15

CyrusNajmabadi marked this pull request as draft February 3, 2023 18:34

Neme12 closed this Apr 30, 2024

Conversation

Neme12 commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neme12 Jun 2, 2022

Choose a reason for hiding this comment

Uh oh!

Neme12 Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi commented Jun 2, 2022

Uh oh!

Neme12 commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrusNajmabadi Jun 2, 2022

Choose a reason for hiding this comment

Uh oh!

Neme12 Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi Jun 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CyrusNajmabadi commented Jun 2, 2022

Uh oh!

CyrusNajmabadi commented Jun 2, 2022

Uh oh!

Neme12 commented Jun 3, 2022

Uh oh!

CyrusNajmabadi commented Jun 3, 2022

Uh oh!

CyrusNajmabadi commented Jan 11, 2023

Uh oh!

CyrusNajmabadi commented Feb 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Neme12 commented Jun 2, 2022 •

edited

Loading

Neme12 Jun 2, 2022 •

edited

Loading

Neme12 commented Jun 2, 2022 •

edited

Loading

Neme12 Jun 2, 2022 •

edited

Loading