[LSP] Add semantic tokens benchmarks + improve edit algorithm by allisonchou · Pull Request #57231 · dotnet/roslyn

allisonchou · 2021-10-19T09:48:25Z

This PR accomplishes two main tasks:

Adds benchmarking for the LSP semantic token edits algorithm.
Improves perf of the edits algorithm by:
- Switching over to Razor's LCS algorithm, which has shown to be better perf-wise for semantic tokens.
- Partitioning tokens into smaller sets so we can call the edits processing algorithm concurrently. This should also fix the OutOfMemoryExceptions that we've been seeing reports of recently.

Fixes:
dotnet/razor#4308
https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1411329/

Thanks to Tomas and Ryan for their advice in this area 😄

Perf Results

Current Algorithm Results: The ideal partition size was found to be 50. With two different sets of 600k tokens each and partitioning, the results are as follows:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmark_10k	1.947 ms	0.0134 ms	0.0118 ms	355.4688	160.1563	56.6406	2 MB
RunLSPSemanticTokensBenchmark_20k	3.954 ms	0.0844 ms	0.2474 ms	656.2500	367.1875	121.0938	5 MB
RunLSPSemanticTokensBenchmark_100k	16.261 ms	0.3094 ms	0.3913 ms	2750.0000	1093.7500	468.7500	22 MB
RunLSPSemanticTokensBenchmark_250k	36.942 ms	0.7235 ms	0.6767 ms	6076.9231	1692.3077	538.4615	51 MB
RunLSPSemanticTokensBenchmark_AllTokens	99.239 ms	1.9669 ms	2.5576 ms	17800.0000	6000.0000	2000.0000	139 MB

Previously, we would throw an OOM and wouldn't even be able to process the above case. The new algorithm (Razor's) without partitioning took about ~6mins, although unfortunately it seems like I've lost the exact numbers.

Other Cases Tested:
Partition size: I also tested sizes of 150, 100, 75, and 25. Interestingly, if we stuck with using Roslyn's LCS algorithm, 100 would be the ideal partition size instead of 50.

Roslyn vs. Razor algorithms: Razor's LCS algorithm beat out Roslyn's when testing with both partitioning and non-partitioning. The below is Roslyn's best run with partitioning (partition size 100). Somehow I lost the "AllTokens" case when copying down the results, but I think the difference is clear nonetheless:

Method	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmark_10k	7.985 ms	0.1555 ms	0.1454 ms	8.005 ms	5210.9375	1914.0625	164.0625	32 MB
RunLSPSemanticTokensBenchmark_20k	13.878 ms	0.2695 ms	0.4427 ms	13.974 ms	10093.7500	3859.3750	250.0000	62 MB
RunLSPSemanticTokensBenchmark_100k	63.833 ms	1.2761 ms	2.5778 ms	63.223 ms	49888.8889	22000.0000	1111.1111	310 MB
RunLSPSemanticTokensBenchmark_250k	128.247 ms	2.5580 ms	7.2148 ms	126.326 ms	122500.0000	53750.0000	2000.0000	759 MB

More Perf Details:

Details

Editing to add more perf details as some indicated it might be valuable to include. As mentioned before, unfortunately it looks like I lost the "before" results, but tl;dr Roslyn's original algorithm hit OOM in these cases and Razor's took 6+ minutes.

Razor Algorithm - Partition Size 150:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmarkAsync_500	297.534 ms	5.8520 ms	13.0889 ms	292.530 ms	292000.0000	131500.0000	2500.0000
RunLSPSemanticTokensBenchmark_RazorImpl_10	2.225 ms	0.0441 ms	0.0471 ms	2.201 ms	351.5625	136.7188	35.1563
RunLSPSemanticTokensBenchmark_RazorImpl_20	4.429 ms	0.0852 ms	0.1046 ms	4.421 ms	687.5000	375.0000	109.3750
RunLSPSemanticTokensBenchmark_RazorImpl_100	19.487 ms	0.3163 ms	0.2804 ms	19.504 ms	2968.7500	1093.7500	437.5000
RunLSPSemanticTokensBenchmark_RazorImpl_250	44.846 ms	0.8845 ms	1.1186 ms	44.922 ms	6727.2727	1636.3636	545.4545
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens	113.969 ms	2.2534 ms	2.9301 ms	114.253 ms	19200.0000	5600.0000	1600.0000

Razor Algorithm - Partition Size 75:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmark_RazorImpl_10	2.111 ms	0.0381 ms	0.0535 ms	363.2813	156.2500	50.7813	2 MB
RunLSPSemanticTokensBenchmark_RazorImpl_20	4.785 ms	0.0953 ms	0.1367 ms	679.6875	382.8125	132.8125	5 MB
RunLSPSemanticTokensBenchmark_RazorImpl_100	18.345 ms	0.3557 ms	0.4499 ms	2812.5000	1093.7500	468.7500	22 MB
RunLSPSemanticTokensBenchmark_RazorImpl_250	41.753 ms	0.8174 ms	1.4947 ms	6083.3333	1500.0000	416.6667	52 MB
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens	112.337 ms	2.2328 ms	4.7582 ms	18400.0000	6000.0000	1800.0000	147 MB

Razor Algorithm - Partition Size 50 (IDEAL):

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmark_10k	1.947 ms	0.0134 ms	0.0118 ms	355.4688	160.1563	56.6406	2 MB
RunLSPSemanticTokensBenchmark_20k	3.954 ms	0.0844 ms	0.2474 ms	656.2500	367.1875	121.0938	5 MB
RunLSPSemanticTokensBenchmark_100k	16.261 ms	0.3094 ms	0.3913 ms	2750.0000	1093.7500	468.7500	22 MB
RunLSPSemanticTokensBenchmark_250k	36.942 ms	0.7235 ms	0.6767 ms	6076.9231	1692.3077	538.4615	51 MB
RunLSPSemanticTokensBenchmark_AllTokens	99.239 ms	1.9669 ms	2.5576 ms	17800.0000	6000.0000	2000.0000	139 MB

Razor Algorithm - Partition Size 25:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
RunLSPSemanticTokensBenchmark_RazorImpl_10	1.574 ms	0.0230 ms	0.0264 ms	357.4219	148.4375	42.9688	2 MB
RunLSPSemanticTokensBenchmark_RazorImpl_20	3.816 ms	0.0502 ms	0.0470 ms	679.6875	390.6250	113.2813	5 MB
RunLSPSemanticTokensBenchmark_RazorImpl_100	18.396 ms	0.3663 ms	0.7315 ms	2968.7500	1125.0000	500.0000	24 MB
RunLSPSemanticTokensBenchmark_RazorImpl_250	44.935 ms	0.8936 ms	1.9987 ms	6833.3333	2500.0000	833.3333	57 MB
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens	122.697 ms	2.4280 ms	5.1743 ms	18500.0000	6750.0000	1500.0000	138 MB

allisonchou · 2021-10-19T09:49:24Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/DiffEdit.cs

+
+namespace Microsoft.CodeAnalysis.LanguageServer.Handler.SemanticTokens
+{
+    internal readonly struct DiffEdit


This file is a direct copy of Razor's: https://github.com/dotnet/razor-tooling/blob/main/src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/DiffEdit.cs
We might eventually want to consider unifying Roslyn/Razor's implementations to avoid code duplication.

allisonchou · 2021-10-19T09:50:26Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs

+    /// since Razor's version of the algorithm performs better for semantic tokens. We may want to
+    /// consider unifying the two implementations in the future.
+    /// </remarks>
+    internal abstract class TextDiffer


As noted in the comment above, this class is also from Razor: https://github.com/dotnet/razor-tooling/blob/main/src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/TextDiffer.cs

agree that we should definitely unify this, especially since its fairly complex code

also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here

Filed #57245 to track unifying the implementations. I'm thinking it might make more sense logistics-wise to do the unifying work in a separate PR since it might take some time for the changes to propagate between Roslyn and Razor.

Razor has its own semantic tokens tests although I'm unaware of any that directly test the TextDiffer class itself. @ryanbrandenburg Would you know if these exist today? I agree it would be valuable to add tests if there aren't any - we could possibly include it as part of the above linked issue.

Would also be in favour of sharing this, though one thing to note is that the partitioning logic should be moved to this base class if that happens, as Razor has the SourceTextDiffer which uses this, but might want a different partition size.

@allisonchou I think these are the closest we have to tests for the TextDiffer.

I agree that we need to separate the Partitioning logic from the TextDiffer, that way if anything does need a strictly minimal TextDiff in the future it's possible.

dibarbet

that's a nice improvement in allocations!

dibarbet · 2021-10-19T17:59:25Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs

+    /// since Razor's version of the algorithm performs better for semantic tokens. We may want to
+    /// consider unifying the two implementations in the future.
+    /// </remarks>
+    internal abstract class TextDiffer


agree that we should definitely unify this, especially since its fairly complex code

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

dibarbet · 2021-10-19T18:58:52Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs

+    /// since Razor's version of the algorithm performs better for semantic tokens. We may want to
+    /// consider unifying the two implementations in the future.
+    /// </remarks>
+    internal abstract class TextDiffer


also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsHandler.cs

dibarbet · 2021-10-19T20:04:16Z

src/Tools/IdeCoreBenchmarks/LSPSemanticTokensBenchmarks.cs

+// The .NET Foundation licenses this file to you under the MIT license.
+// See the LICENSE file in the project root for more information.
+
+#nullable disable


could be removed

Hmm without this we get a ton of nullability warnings throughout the file. Other benchmarks explicitly disable nullable as well. I think #nullable enabled is the default now?

src/Tools/IdeCoreBenchmarks/LSPSemanticTokensBenchmarks.cs

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

tmat · 2021-10-22T17:43:36Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs

+                var (middleX, middleY) = FindMiddleSnake(lowA, highA, lowB, highB, vf, vr);
+
+                // Recursively find the midpoint of the left half.
+                ComputeDiffRecursive(edits, lowA, middleX, lowB, middleY, vf, vr);


ComputeDiffRecursive

Instead of recursion, can we use a Stack here? That would make it possible to write this function as an iterator (IEnumerable<DiffEdit>), which would save allocation of the List.

I'm not sure if I'm interpreting your comment correctly. In my latest push, I changed the List to an ArrayBuilder, which I think fixes the allocation issue? Or are you referring to the IReadOnlyList returned by the method?

I was refering to the returned list. Feel free to do follow up on that in a separate PR.

Gotcha, filed #57433 as a follow-up item since we're attempting to get this PR in for M2.

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

tmat

🕐

…kensBenchmark2

ryanbrandenburg

Changes look good to me. I agree both that we should eventually unify the code here across Roslyn and Razor and that it needn't be in this PR.

tmat

allisonchou · 2021-10-27T19:51:15Z

@jinujoseph for m2 approval

CyrusNajmabadi · 2021-10-27T21:51:06Z

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

+        protected override bool ContentEquals(
+            IReadOnlyList<int> oldSemanticTokens,
+            int oldIndex,
+            IReadOnlyList<int> newSemanticTokens,


reason we're using IReadOnlyList instead of some more concrete type without virtual dispatch?

We're technically passing in ArraySegments which according to docs can't be compared via index unless cast to IReadOnlyList. Is there an alternate approach that would be preferable here instead? (If so, I'll likely address in a follow-up PR since we're trying to get this in for M2.)

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs

…PSemanticTokensBenchmark2

allisonchou added 8 commits October 19, 2021 02:18

Add LSP semantic tokens benchmark

776f86f

Add LSP semantic tokens benchmark

d1010d6

Partition array

2bd7158

Compare Roslyn vs. Razor

0ada376

Disable Roslyn algorithm benchmark

4d56547

Fix edit logic

dc7eb46

Cleanup

8e8e50f

Further cleanup

2927900

allisonchou added Area-IDE LSP issues related to the roslyn language server protocol implementation labels Oct 19, 2021

allisonchou requested review from davidwengier, dibarbet, ryanbrandenburg and tmat October 19, 2021 09:48

allisonchou requested review from a team as code owners October 19, 2021 09:48

allisonchou commented Oct 19, 2021

View reviewed changes

dibarbet reviewed Oct 19, 2021

View reviewed changes

allisonchou mentioned this pull request Oct 19, 2021

[LSP] Unify Razor/Roslyn LCS implementations #57245

Closed

Review feedback

dc47611

allisonchou mentioned this pull request Oct 21, 2021

[LSP] Add feature flag to enable LSP classifications in local scenarios #57291

Merged

tmat reviewed Oct 22, 2021

View reviewed changes

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs Outdated Show resolved Hide resolved

tmat reviewed Oct 22, 2021

View reviewed changes

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs Outdated Show resolved Hide resolved

tmat reviewed Oct 22, 2021

View reviewed changes

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs Outdated Show resolved Hide resolved

tmat reviewed Oct 22, 2021

View reviewed changes

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs Outdated Show resolved Hide resolved

tmat requested changes Oct 22, 2021

View reviewed changes

allisonchou added 3 commits October 22, 2021 16:12

Preliminary code review feedback

32f963d

Merge branch 'main' of github.com:dotnet/roslyn into AddLSPSemanticTo…

1353053

…kensBenchmark2

Use pooled array

cca03ef

List -> ArrayBuilder

8635710

allisonchou requested a review from tmat October 25, 2021 21:48

Merge branch 'main' of github.com:dotnet/roslyn into AddLSPSemanticTo…

7924961

…kensBenchmark2

runfoapp bot mentioned this pull request Oct 25, 2021

Timeout on Microsoft.VisualStudio.LanguageServices.CSharp.UnitTests.dll tests #55638

Open

allisonchou changed the base branch from main to release/dev17.1 October 26, 2021 01:08

ryanbrandenburg approved these changes Oct 26, 2021

View reviewed changes

dibarbet approved these changes Oct 27, 2021

View reviewed changes

tmat approved these changes Oct 27, 2021

View reviewed changes

jinujoseph added the Approved to merge label Oct 27, 2021

CyrusNajmabadi reviewed Oct 27, 2021

View reviewed changes

src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs Show resolved Hide resolved

allisonchou added 2 commits October 27, 2021 16:06

Merge branch 'release/dev17.1' of github.com:dotnet/roslyn into AddLS…

1e159b6

…PSemanticTokensBenchmark2

Support edits capability

2068a59

allisonchou mentioned this pull request Oct 28, 2021

[LSP] Use stack instead of recursion in semantic tokens differ #57433

Closed

Use immutable array instead of allocating new array

f3b6b0b

mkmibrahim approved these changes Oct 28, 2021

View reviewed changes

runfoapp bot mentioned this pull request Oct 28, 2021

Failure in LspMiscellaneousFilesWorkspaceTests.TestLooseFile_MovedToRegisteredWorkspace #57422

Closed

allisonchou merged commit 7b99150 into dotnet:release/dev17.1 Oct 28, 2021

allisonchou deleted the AddLSPSemanticTokensBenchmark2 branch October 28, 2021 19:28

This was referenced Nov 1, 2021

Add Roslyn LSP semantic tokens benchmarks dotnet/razor#4308

Closed

Semantic tokens endpoint throwing exception and failing to colorize dotnet/razor#5656

Closed

Conversation

allisonchou commented Oct 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dibarbet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tmat left a comment

Choose a reason for hiding this comment

Uh oh!

ryanbrandenburg left a comment

Choose a reason for hiding this comment

Uh oh!

tmat left a comment

Choose a reason for hiding this comment

Uh oh!

allisonchou commented Oct 27, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

allisonchou commented Oct 19, 2021 •

edited

Loading