[LSP] Add semantic tokens benchmarks + improve edit algorithm#57231
[LSP] Add semantic tokens benchmarks + improve edit algorithm#57231allisonchou merged 17 commits intodotnet:release/dev17.1from
Conversation
|
|
||
| namespace Microsoft.CodeAnalysis.LanguageServer.Handler.SemanticTokens | ||
| { | ||
| internal readonly struct DiffEdit |
There was a problem hiding this comment.
This file is a direct copy of Razor's: https://github.com/dotnet/razor-tooling/blob/main/src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/DiffEdit.cs
We might eventually want to consider unifying Roslyn/Razor's implementations to avoid code duplication.
| /// since Razor's version of the algorithm performs better for semantic tokens. We may want to | ||
| /// consider unifying the two implementations in the future. | ||
| /// </remarks> | ||
| internal abstract class TextDiffer |
There was a problem hiding this comment.
As noted in the comment above, this class is also from Razor: https://github.com/dotnet/razor-tooling/blob/main/src/Razor/src/Microsoft.AspNetCore.Razor.LanguageServer/TextDiffer.cs
There was a problem hiding this comment.
agree that we should definitely unify this, especially since its fairly complex code
There was a problem hiding this comment.
also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here
There was a problem hiding this comment.
Filed #57245 to track unifying the implementations. I'm thinking it might make more sense logistics-wise to do the unifying work in a separate PR since it might take some time for the changes to propagate between Roslyn and Razor.
Razor has its own semantic tokens tests although I'm unaware of any that directly test the TextDiffer class itself. @ryanbrandenburg Would you know if these exist today? I agree it would be valuable to add tests if there aren't any - we could possibly include it as part of the above linked issue.
There was a problem hiding this comment.
Would also be in favour of sharing this, though one thing to note is that the partitioning logic should be moved to this base class if that happens, as Razor has the SourceTextDiffer which uses this, but might want a different partition size.
There was a problem hiding this comment.
@allisonchou I think these are the closest we have to tests for the TextDiffer.
I agree that we need to separate the Partitioning logic from the TextDiffer, that way if anything does need a strictly minimal TextDiff in the future it's possible.
dibarbet
left a comment
There was a problem hiding this comment.
that's a nice improvement in allocations!
| /// since Razor's version of the algorithm performs better for semantic tokens. We may want to | ||
| /// consider unifying the two implementations in the future. | ||
| /// </remarks> | ||
| internal abstract class TextDiffer |
There was a problem hiding this comment.
agree that we should definitely unify this, especially since its fairly complex code
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Show resolved
Hide resolved
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Outdated
Show resolved
Hide resolved
| /// since Razor's version of the algorithm performs better for semantic tokens. We may want to | ||
| /// consider unifying the two implementations in the future. | ||
| /// </remarks> | ||
| internal abstract class TextDiffer |
There was a problem hiding this comment.
also, are there tests anywhere for this specifically on the razor side? I know we have the semantic tokens tests, but those might not exhaustively verify the algorithm here
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Outdated
Show resolved
Hide resolved
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsHandler.cs
Outdated
Show resolved
Hide resolved
| // The .NET Foundation licenses this file to you under the MIT license. | ||
| // See the LICENSE file in the project root for more information. | ||
|
|
||
| #nullable disable |
There was a problem hiding this comment.
Hmm without this we get a ton of nullability warnings throughout the file. Other benchmarks explicitly disable nullable as well. I think #nullable enabled is the default now?
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs
Outdated
Show resolved
Hide resolved
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/TextDiffer.cs
Outdated
Show resolved
Hide resolved
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Outdated
Show resolved
Hide resolved
| var (middleX, middleY) = FindMiddleSnake(lowA, highA, lowB, highB, vf, vr); | ||
|
|
||
| // Recursively find the midpoint of the left half. | ||
| ComputeDiffRecursive(edits, lowA, middleX, lowB, middleY, vf, vr); |
There was a problem hiding this comment.
I'm not sure if I'm interpreting your comment correctly. In my latest push, I changed the List to an ArrayBuilder, which I think fixes the allocation issue? Or are you referring to the IReadOnlyList returned by the method?
There was a problem hiding this comment.
I was refering to the returned list. Feel free to do follow up on that in a separate PR.
There was a problem hiding this comment.
Gotcha, filed #57433 as a follow-up item since we're attempting to get this PR in for M2.
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Outdated
Show resolved
Hide resolved
ryanbrandenburg
left a comment
There was a problem hiding this comment.
Changes look good to me. I agree both that we should eventually unify the code here across Roslyn and Razor and that it needn't be in this PR.
|
@jinujoseph for m2 approval |
| protected override bool ContentEquals( | ||
| IReadOnlyList<int> oldSemanticTokens, | ||
| int oldIndex, | ||
| IReadOnlyList<int> newSemanticTokens, |
There was a problem hiding this comment.
reason we're using IReadOnlyList instead of some more concrete type without virtual dispatch?
There was a problem hiding this comment.
We're technically passing in ArraySegments which according to docs can't be compared via index unless cast to IReadOnlyList. Is there an alternate approach that would be preferable here instead? (If so, I'll likely address in a follow-up PR since we're trying to get this in for M2.)
src/Features/LanguageServer/Protocol/Handler/SemanticTokens/SemanticTokensEditsDiffer.cs
Show resolved
Hide resolved
…PSemanticTokensBenchmark2
This PR accomplishes two main tasks:
Fixes:
dotnet/razor#4308
https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1411329/
Thanks to Tomas and Ryan for their advice in this area 😄
Perf Results
Current Algorithm Results: The ideal partition size was found to be 50. With two different sets of 600k tokens each and partitioning, the results are as follows:
Previously, we would throw an OOM and wouldn't even be able to process the above case. The new algorithm (Razor's) without partitioning took about ~6mins, although unfortunately it seems like I've lost the exact numbers.
Other Cases Tested:
Partition size: I also tested sizes of 150, 100, 75, and 25. Interestingly, if we stuck with using Roslyn's LCS algorithm, 100 would be the ideal partition size instead of 50.
Roslyn vs. Razor algorithms: Razor's LCS algorithm beat out Roslyn's when testing with both partitioning and non-partitioning. The below is Roslyn's best run with partitioning (partition size 100). Somehow I lost the "AllTokens" case when copying down the results, but I think the difference is clear nonetheless:
More Perf Details:
Details
Editing to add more perf details as some indicated it might be valuable to include. As mentioned before, unfortunately it looks like I lost the "before" results, but tl;dr Roslyn's original algorithm hit OOM in these cases and Razor's took 6+ minutes.Razor Algorithm - Partition Size 150:
Razor Algorithm - Partition Size 75:
Razor Algorithm - Partition Size 50 (IDEAL):
Razor Algorithm - Partition Size 25: