Skip to content

[LSP] Add semantic tokens benchmarks + improve edit algorithm#57231

Merged
allisonchou merged 17 commits intodotnet:release/dev17.1from
allisonchou:AddLSPSemanticTokensBenchmark2
Oct 28, 2021
Merged

[LSP] Add semantic tokens benchmarks + improve edit algorithm#57231
allisonchou merged 17 commits intodotnet:release/dev17.1from
allisonchou:AddLSPSemanticTokensBenchmark2

Conversation

@allisonchou
Copy link
Copy Markdown
Contributor

@allisonchou allisonchou commented Oct 19, 2021

This PR accomplishes two main tasks:

  1. Adds benchmarking for the LSP semantic token edits algorithm.
  2. Improves perf of the edits algorithm by:
    • Switching over to Razor's LCS algorithm, which has shown to be better perf-wise for semantic tokens.
    • Partitioning tokens into smaller sets so we can call the edits processing algorithm concurrently. This should also fix the OutOfMemoryExceptions that we've been seeing reports of recently.

Fixes:
dotnet/razor#4308
https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1411329/

Thanks to Tomas and Ryan for their advice in this area 😄


Perf Results

Current Algorithm Results: The ideal partition size was found to be 50. With two different sets of 600k tokens each and partitioning, the results are as follows:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmark_10k 1.947 ms 0.0134 ms 0.0118 ms 355.4688 160.1563 56.6406 2 MB
RunLSPSemanticTokensBenchmark_20k 3.954 ms 0.0844 ms 0.2474 ms 656.2500 367.1875 121.0938 5 MB
RunLSPSemanticTokensBenchmark_100k 16.261 ms 0.3094 ms 0.3913 ms 2750.0000 1093.7500 468.7500 22 MB
RunLSPSemanticTokensBenchmark_250k 36.942 ms 0.7235 ms 0.6767 ms 6076.9231 1692.3077 538.4615 51 MB
RunLSPSemanticTokensBenchmark_AllTokens 99.239 ms 1.9669 ms 2.5576 ms 17800.0000 6000.0000 2000.0000 139 MB

Previously, we would throw an OOM and wouldn't even be able to process the above case. The new algorithm (Razor's) without partitioning took about ~6mins, although unfortunately it seems like I've lost the exact numbers.

Other Cases Tested:
Partition size: I also tested sizes of 150, 100, 75, and 25. Interestingly, if we stuck with using Roslyn's LCS algorithm, 100 would be the ideal partition size instead of 50.

Roslyn vs. Razor algorithms: Razor's LCS algorithm beat out Roslyn's when testing with both partitioning and non-partitioning. The below is Roslyn's best run with partitioning (partition size 100). Somehow I lost the "AllTokens" case when copying down the results, but I think the difference is clear nonetheless:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmark_10k 7.985 ms 0.1555 ms 0.1454 ms 8.005 ms 5210.9375 1914.0625 164.0625 32 MB
RunLSPSemanticTokensBenchmark_20k 13.878 ms 0.2695 ms 0.4427 ms 13.974 ms 10093.7500 3859.3750 250.0000 62 MB
RunLSPSemanticTokensBenchmark_100k 63.833 ms 1.2761 ms 2.5778 ms 63.223 ms 49888.8889 22000.0000 1111.1111 310 MB
RunLSPSemanticTokensBenchmark_250k 128.247 ms 2.5580 ms 7.2148 ms 126.326 ms 122500.0000 53750.0000 2000.0000 759 MB

More Perf Details:

Details Editing to add more perf details as some indicated it might be valuable to include. As mentioned before, unfortunately it looks like I lost the "before" results, but tl;dr Roslyn's original algorithm hit OOM in these cases and Razor's took 6+ minutes.

Razor Algorithm - Partition Size 150:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmarkAsync_500 297.534 ms 5.8520 ms 13.0889 ms 292.530 ms 292000.0000 131500.0000 2500.0000
RunLSPSemanticTokensBenchmark_RazorImpl_10 2.225 ms 0.0441 ms 0.0471 ms 2.201 ms 351.5625 136.7188 35.1563
RunLSPSemanticTokensBenchmark_RazorImpl_20 4.429 ms 0.0852 ms 0.1046 ms 4.421 ms 687.5000 375.0000 109.3750
RunLSPSemanticTokensBenchmark_RazorImpl_100 19.487 ms 0.3163 ms 0.2804 ms 19.504 ms 2968.7500 1093.7500 437.5000
RunLSPSemanticTokensBenchmark_RazorImpl_250 44.846 ms 0.8845 ms 1.1186 ms 44.922 ms 6727.2727 1636.3636 545.4545
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens 113.969 ms 2.2534 ms 2.9301 ms 114.253 ms 19200.0000 5600.0000 1600.0000

Razor Algorithm - Partition Size 75:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmark_RazorImpl_10 2.111 ms 0.0381 ms 0.0535 ms 363.2813 156.2500 50.7813 2 MB
RunLSPSemanticTokensBenchmark_RazorImpl_20 4.785 ms 0.0953 ms 0.1367 ms 679.6875 382.8125 132.8125 5 MB
RunLSPSemanticTokensBenchmark_RazorImpl_100 18.345 ms 0.3557 ms 0.4499 ms 2812.5000 1093.7500 468.7500 22 MB
RunLSPSemanticTokensBenchmark_RazorImpl_250 41.753 ms 0.8174 ms 1.4947 ms 6083.3333 1500.0000 416.6667 52 MB
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens 112.337 ms 2.2328 ms 4.7582 ms 18400.0000 6000.0000 1800.0000 147 MB

Razor Algorithm - Partition Size 50 (IDEAL):

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmark_10k 1.947 ms 0.0134 ms 0.0118 ms 355.4688 160.1563 56.6406 2 MB
RunLSPSemanticTokensBenchmark_20k 3.954 ms 0.0844 ms 0.2474 ms 656.2500 367.1875 121.0938 5 MB
RunLSPSemanticTokensBenchmark_100k 16.261 ms 0.3094 ms 0.3913 ms 2750.0000 1093.7500 468.7500 22 MB
RunLSPSemanticTokensBenchmark_250k 36.942 ms 0.7235 ms 0.6767 ms 6076.9231 1692.3077 538.4615 51 MB
RunLSPSemanticTokensBenchmark_AllTokens 99.239 ms 1.9669 ms 2.5576 ms 17800.0000 6000.0000 2000.0000 139 MB

Razor Algorithm - Partition Size 25:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
RunLSPSemanticTokensBenchmark_RazorImpl_10 1.574 ms 0.0230 ms 0.0264 ms 357.4219 148.4375 42.9688 2 MB
RunLSPSemanticTokensBenchmark_RazorImpl_20 3.816 ms 0.0502 ms 0.0470 ms 679.6875 390.6250 113.2813 5 MB
RunLSPSemanticTokensBenchmark_RazorImpl_100 18.396 ms 0.3663 ms 0.7315 ms 2968.7500 1125.0000 500.0000 24 MB
RunLSPSemanticTokensBenchmark_RazorImpl_250 44.935 ms 0.8936 ms 1.9987 ms 6833.3333 2500.0000 833.3333 57 MB
RunLSPSemanticTokensBenchmark_RazorImpl_AllTokens 122.697 ms 2.4280 ms 5.1743 ms 18500.0000 6750.0000 1500.0000 138 MB

@allisonchou allisonchou added Area-IDE LSP issues related to the roslyn language server protocol implementation labels Oct 19, 2021
@allisonchou allisonchou requested review from a team as code owners October 19, 2021 09:48

namespace Microsoft.CodeAnalysis.LanguageServer.Handler.SemanticTokens
{
internal readonly struct DiffEdit
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a direct copy of Razor's: https://github.com/dotnet/razor-tooling/blob/main/src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/DiffEdit.cs
We might eventually want to consider unifying Roslyn/Razor's implementations to avoid code duplication.

/// since Razor's version of the algorithm performs better for semantic tokens. We may want to
/// consider unifying the two implementations in the future.
/// </remarks>
internal abstract class TextDiffer
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that we should definitely unify this, especially since its fairly complex code

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #57245 to track unifying the implementations. I'm thinking it might make more sense logistics-wise to do the unifying work in a separate PR since it might take some time for the changes to propagate between Roslyn and Razor.

Razor has its own semantic tokens tests although I'm unaware of any that directly test the TextDiffer class itself. @ryanbrandenburg Would you know if these exist today? I agree it would be valuable to add tests if there aren't any - we could possibly include it as part of the above linked issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be in favour of sharing this, though one thing to note is that the partitioning logic should be moved to this base class if that happens, as Razor has the SourceTextDiffer which uses this, but might want a different partition size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allisonchou I think these are the closest we have to tests for the TextDiffer.

I agree that we need to separate the Partitioning logic from the TextDiffer, that way if anything does need a strictly minimal TextDiff in the future it's possible.

Copy link
Copy Markdown
Member

@dibarbet dibarbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a nice improvement in allocations!

/// since Razor's version of the algorithm performs better for semantic tokens. We may want to
/// consider unifying the two implementations in the future.
/// </remarks>
internal abstract class TextDiffer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that we should definitely unify this, especially since its fairly complex code

/// since Razor's version of the algorithm performs better for semantic tokens. We may want to
/// consider unifying the two implementations in the future.
/// </remarks>
internal abstract class TextDiffer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

#nullable disable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm without this we get a ton of nullability warnings throughout the file. Other benchmarks explicitly disable nullable as well. I think #nullable enabled is the default now?

var (middleX, middleY) = FindMiddleSnake(lowA, highA, lowB, highB, vf, vr);

// Recursively find the midpoint of the left half.
ComputeDiffRecursive(edits, lowA, middleX, lowB, middleY, vf, vr);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComputeDiffRecursive

Instead of recursion, can we use a Stack here? That would make it possible to write this function as an iterator (IEnumerable<DiffEdit>), which would save allocation of the List.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm interpreting your comment correctly. In my latest push, I changed the List to an ArrayBuilder, which I think fixes the allocation issue? Or are you referring to the IReadOnlyList returned by the method?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was refering to the returned list. Feel free to do follow up on that in a separate PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, filed #57433 as a follow-up item since we're attempting to get this PR in for M2.

Copy link
Copy Markdown
Member

@tmat tmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@allisonchou allisonchou requested a review from tmat October 25, 2021 21:48
Copy link
Copy Markdown
Contributor

@ryanbrandenburg ryanbrandenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I agree both that we should eventually unify the code here across Roslyn and Razor and that it needn't be in this PR.

Copy link
Copy Markdown
Member

@tmat tmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@allisonchou
Copy link
Copy Markdown
Contributor Author

@jinujoseph for m2 approval

protected override bool ContentEquals(
IReadOnlyList<int> oldSemanticTokens,
int oldIndex,
IReadOnlyList<int> newSemanticTokens,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason we're using IReadOnlyList instead of some more concrete type without virtual dispatch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're technically passing in ArraySegments which according to docs can't be compared via index unless cast to IReadOnlyList. Is there an alternate approach that would be preferable here instead? (If so, I'll likely address in a follow-up PR since we're trying to get this in for M2.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to merge Area-IDE LSP issues related to the roslyn language server protocol implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants