Add support for draft of semantic tokens LSIF spec to LSIF generator by gundermanc · Pull Request #65860 · dotnet/roslyn

gundermanc · 2022-12-08T02:09:39Z

Adds support for draft of Semantic Tokens LSIF protocol to the LSIF generator.

This adds two additional bits of content to the LSIF output:

'semanticTokensProvidier': string[] - a new 'string[]' typed property in the capabilities vertex. Each item in the array represents a unique token type that maps to one or more color, font, and style presentation details.

{
  "hoverProvider": true,
  "declarationProvider": false,
  "definitionProvider": true,
  "referencesProvider": true,
  "typeDefinitionProvider": false,
  "documentSymbolProvider": false,
  "foldingRangeProvider": true,
  "diagnosticProvider": false,
  "semanticTokensProvider": [
    "tokenTypes": [
      "namespace",
      "type",
      "class",
      "enum",
      "interface",
      "struct",
      ...
    ]
  ],
  "id": 1,
  "type": "vertex",
  "label": "capabilities"
}

'textDocument/semanticTokens/full' - A new single vertex per document off of the document node describing the semantic tokens for the full document. These data are emitted in the exact same format as LSP semantic tokens.

{
    "outV":4,
    "inV":12,
    "id":15,
    "type":"edge",
    "label":"textDocument/semanticTokens/full"
}

{
    "result":
    {
        "data": [0,0,20,17,0,1,0,5,15,0,0,6,6,48,0,0,6,1,54,0,1,0,5,15,0,0,6,6,48,0,0,6,1,21,0,0,1,10,48,0,0,10,1,54,0,1,0,1,54,0,0,1,8,15,0,0,8,1,54,0,0,2,6,15,0,0,6,2,21,0,0,2,6,48,0,0,6,1,21,0,0,1,7,48,0,0,7,1,21,0,0,1,10,48,0,0,10,1,21,0,0,1,24,22,0,0,24,1,54,0,0,1,30,18,0,0,30,1,54,0,0,2,20,53,0,0,21,1,21,0,0,2,22,18,0,0,22,1,54,0,0,1,1,54,0]
    },
    "id":12,
    "type":"vertex",
    "label": "semanticTokensResult"
}

These changes increase the size of the LSIF output by approximately 6.5 percent in my tests generating LSIF on the LSP client repository and increases the generation duration by 15 percent.

jasonmalinowski · 2022-12-10T01:05:34Z


            AssertEx.EqualOrDiff(
-"{""hoverProvider"":true,""declarationProvider"":false,""definitionProvider"":true,""referencesProvider"":true,""typeDefinitionProvider"":false,""documentSymbolProvider"":false,""foldingRangeProvider"":true,""diagnosticProvider"":false,""id"":1,""type"":""vertex"",""label"":""capabilities""}
+"{""hoverProvider"":true,""declarationProvider"":false,""definitionProvider"":true,""referencesProvider"":true,""typeDefinitionProvider"":false,""documentSymbolProvider"":false,""foldingRangeProvider"":true,""diagnosticProvider"":false,""semanticTokensProvider"":{""tokenTypes"":[""namespace"",""type"",""class"",""enum"",""interface"",""struct"",""typeParameter"",""parameter"",""variable"",""property"",""enumMember"",""event"",""function"",""method"",""macro"",""keyword"",""modifier"",""comment"",""string"",""number"",""regexp"",""operator"",""class name"",""constant name"",""delegate name"",""enum member name"",""enum name"",""event name"",""excluded code"",""extension method name"",""field name"",""interface name"",""json - array"",""json - comment"",""json - constructor name"",""json - keyword"",""json - number"",""json - object"",""json - operator"",""json - property name"",""json - punctuation"",""json - string"",""json - text"",""keyword - control"",""label name"",""local name"",""method name"",""module name"",""namespace name"",""operator - overloaded"",""parameter name"",""preprocessor keyword"",""preprocessor text"",""property name"",""punctuation"",""record class name"",""record struct name"",""regex - alternation"",""regex - anchor"",""regex - character class"",""regex - comment"",""regex - grouping"",""regex - other escape"",""regex - quantifier"",""regex - self escaped character"",""regex - text"",""string - escape character"",""string - verbatim"",""struct name"",""text"",""type parameter name"",""whitespace"",""xml doc comment - attribute name"",""xml doc comment - attribute quotes"",""xml doc comment - attribute value"",""xml doc comment - cdata section"",""xml doc comment - comment"",""xml doc comment - delimiter"",""xml doc comment - entity reference"",""xml doc comment - name"",""xml doc comment - processing instruction"",""xml doc comment - text"",""xml literal - attribute name"",""xml literal - attribute quotes"",""xml literal - attribute value"",""xml literal - cdata section"",""xml literal - comment"",""xml literal - delimiter"",""xml literal - embedded expression"",""xml literal - entity reference"",""xml literal - name"",""xml literal - processing instruction"",""xml literal - text""],""tokenModifiers"":[]},""id"":1,""type"":""vertex"",""label"":""capabilities""}


I worry here that any time we update the list of semantic token types we'll have to go update this test. Can we update this test to itself grab the list of semantic token types and construct the expected string that way?

I think there's value to having at least one test that checks the actual list of token types because 1) there's a shared code path between LSP and LSIF here and 2) LSP and LSIF have slightly different expectations of which tokens get returned.

I want to make sure that evolution of LSP doesn't accidentally regress the returning of all token types in LSIF.

How about if I replace these two instances of the list with SemanticTokensHelpers.AllTokenTypes? That type is internal though so it will require IVT.

How about if I replace these two instances of the list with SemanticTokensHelpers.AllTokenTypes? That type is internal though so it will require IVT.

That'd be perfect; adding IVT is fine here.

Are you willing to go without this one? Assembling correctly indented JSON from a combination of strings and a list of strings is going to be sort of a pain though it's easy enough to do for the regular LSIF output.

jasonmalinowski · 2022-12-10T01:06:18Z

    ""documentSymbolProvider"": false,
    ""foldingRangeProvider"": true,
    ""diagnosticProvider"": false,
+    ""semanticTokensProvider"": {


Ditto -- it might worth finding a way not trying to hard code this since it'll be unfortunate having to change this any time we add a new type.

jasonmalinowski · 2022-12-10T01:07:32Z

+            ' - Empty file - generates empty data
+            ' - Single line
+            ' - Multi-line
+            ' - Roslyn custom token type ('x' in 'int x' uses a Roslyn custom token type.
+            ' - Token in last line with no trailing newline (ensure sufficient range was passed to LSP generator).


Can we line these comments up with the InlineDatas?

What do you mean? Adjust the indentation? The comments are within the function, so it seems more stylistically consistent to match the indentation of the code within, no?

So my assumption reading this comment was there's a 1:1 mapping to the InlineData arguments, so the comments could just be mixed together with the InlineDatas rather than having the lists in two places. And actually I'm seeing there's not a 1:1 mapping since there's five bullets here and four InlineDatas, so something seems funky.

If trying to smash the comments together is too hard, consider using a MemberData where another method can provide the test cases here.

there's five bullets here and four InlineDatas, so something seems funky

The last inline data covers both cases. The point of this comment though is to describe things that are determined specifically by the LSIF generator, not the shared SemanticTokensHelper code.

Ah that wasn't clear at all. Can we break this into a MemberData w/ the applicable comments before each?

jasonmalinowski · 2022-12-12T22:11:03Z

+    using System.Threading.Tasks;
+    using Newtonsoft.Json;
+
+    internal sealed class SemanticTokensCapabilities


Should we be using the SemanticTokensLegend type that already exists in LSP for this?

I see two possible outcomes over the evolution of this tool:

We never add any more semantic tokens capabilities - in that case, yes, using SemanticTokensLegend would eliminate one boilerplate class.

We add more semantic tokens capabilities - these will almost definitely be different than the LSP ones requiring a custom type anyways.

I don't think it's worth spending any more time on this. When we standardize Semantic Tokens in the LSIF spec, this will likely require changes regardless.

Can we track this question in the spec somewhere if it's not already? Or does the spec not care and this becomes a issue for the LSP library?

Yeah this will be part of the Semantic Tokens LSIF spec proposal. Currently this exists as just a draft in my document.

I'll send a spec addition proposal on the LSIF site once I have finished validating the approach works.

I need an LSIF generator NuPkg for that :)

jasonmalinowski · 2022-12-12T22:12:47Z

+            ' - Empty file - generates empty data
+            ' - Single line
+            ' - Multi-line
+            ' - Roslyn custom token type ('x' in 'int x' uses a Roslyn custom token type.
+            ' - Token in last line with no trailing newline (ensure sufficient range was passed to LSP generator).


So my assumption reading this comment was there's a 1:1 mapping to the InlineData arguments, so the comments could just be mixed together with the InlineDatas rather than having the lists in two places. And actually I'm seeing there's not a 1:1 mapping since there's five bullets here and four InlineDatas, so something seems funky.

If trying to smash the comments together is too hard, consider using a MemberData where another method can provide the test cases here.

CyrusNajmabadi · 2022-12-12T22:31:40Z

    ""scope"": ""project"",
    ""data"": 2,
-    ""id"": 11,
+    ""id"": 13,


if we added semantic tokens to the end, these woudln't have to update, right?

jasonmalinowski

Signing off, but there are still some test cleanups I'd appreciate:

Trying to make the tests not hardcode the classification list.
Clarifying the comments in TestSemanticTokensData for which scenarios apply to what tests.

Obviously both of these can be follow-ups. The first I suspect will be forced if/when the moment @CyrusNajmabadi wants to add another embedded language. We're getting close to the holidays, so I presume that's imminent.

gundermanc · 2022-12-13T21:33:10Z

Looks like another infrastructure issue? The integration tests failed.

ghost added Area-IDE Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Dec 8, 2022

gundermanc changed the title ~~Dev/chgund/semantic tokens~~ Add support for draft of semantic tokens LSIF spec to LSIF generator Dec 8, 2022

gundermanc marked this pull request as ready for review December 8, 2022 02:30

gundermanc requested a review from a team as a code owner December 8, 2022 02:30

CyrusNajmabadi reviewed Dec 8, 2022

View reviewed changes

Comment thread src/Features/Lsif/Generator/Generator.cs Outdated

CyrusNajmabadi reviewed Dec 8, 2022

View reviewed changes

Comment thread src/Features/Lsif/Generator/Generator.cs Outdated

CyrusNajmabadi reviewed Dec 8, 2022

View reviewed changes

Comment thread src/Features/Lsif/Generator/Generator.cs

NTaylorMullen reviewed Dec 8, 2022

View reviewed changes

Comment thread src/Features/Lsif/Generator/Generator.cs Outdated

Comment thread src/Features/Lsif/Generator/Generator.cs

Comment thread src/Features/Lsif/Generator/Generator.cs Outdated

Comment thread src/Features/Lsif/Generator/Graph/Capabilities.cs Outdated

CyrusNajmabadi reviewed Dec 8, 2022

View reviewed changes

Comment thread src/Features/Lsif/GeneratorTest/OutputFormatTests.vb Outdated

gundermanc force-pushed the dev/chgund/SemanticTokens branch from 38ebc65 to 61a1bb3 Compare December 9, 2022 00:00

jasonmalinowski reviewed Dec 10, 2022

View reviewed changes

gundermanc force-pushed the dev/chgund/SemanticTokens branch from 61a1bb3 to 5abce19 Compare December 11, 2022 19:18

gundermanc requested a review from jasonmalinowski December 11, 2022 19:27

gundermanc force-pushed the dev/chgund/SemanticTokens branch 2 times, most recently from e3f5601 to f64e453 Compare December 12, 2022 18:43

gundermanc requested review from CyrusNajmabadi and removed request for jasonmalinowski December 12, 2022 20:49

jasonmalinowski self-requested a review December 12, 2022 21:07

jasonmalinowski reviewed Dec 12, 2022

View reviewed changes