Skip to content

Add support for draft of semantic tokens LSIF spec to LSIF generator#65860

Merged
dibarbet merged 2 commits into
dotnet:mainfrom
gundermanc:dev/chgund/SemanticTokens
Dec 13, 2022
Merged

Add support for draft of semantic tokens LSIF spec to LSIF generator#65860
dibarbet merged 2 commits into
dotnet:mainfrom
gundermanc:dev/chgund/SemanticTokens

Conversation

@gundermanc

@gundermanc gundermanc commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

Adds support for draft of Semantic Tokens LSIF protocol to the LSIF generator.

This adds two additional bits of content to the LSIF output:

  • 'semanticTokensProvidier': string[] - a new 'string[]' typed property in the capabilities vertex. Each item in the array represents a unique token type that maps to one or more color, font, and style presentation details.
{
  "hoverProvider": true,
  "declarationProvider": false,
  "definitionProvider": true,
  "referencesProvider": true,
  "typeDefinitionProvider": false,
  "documentSymbolProvider": false,
  "foldingRangeProvider": true,
  "diagnosticProvider": false,
  "semanticTokensProvider": [
    "tokenTypes": [
      "namespace",
      "type",
      "class",
      "enum",
      "interface",
      "struct",
      ...
    ]
  ],
  "id": 1,
  "type": "vertex",
  "label": "capabilities"
}
  • 'textDocument/semanticTokens/full' - A new single vertex per document off of the document node describing the semantic tokens for the full document. These data are emitted in the exact same format as LSP semantic tokens.
{
    "outV":4,
    "inV":12,
    "id":15,
    "type":"edge",
    "label":"textDocument/semanticTokens/full"
}

{
    "result":
    {
        "data": [0,0,20,17,0,1,0,5,15,0,0,6,6,48,0,0,6,1,54,0,1,0,5,15,0,0,6,6,48,0,0,6,1,21,0,0,1,10,48,0,0,10,1,54,0,1,0,1,54,0,0,1,8,15,0,0,8,1,54,0,0,2,6,15,0,0,6,2,21,0,0,2,6,48,0,0,6,1,21,0,0,1,7,48,0,0,7,1,21,0,0,1,10,48,0,0,10,1,21,0,0,1,24,22,0,0,24,1,54,0,0,1,30,18,0,0,30,1,54,0,0,2,20,53,0,0,21,1,21,0,0,2,22,18,0,0,22,1,54,0,0,1,1,54,0]
    },
    "id":12,
    "type":"vertex",
    "label": "semanticTokensResult"
}

These changes increase the size of the LSIF output by approximately 6.5 percent in my tests generating LSIF on the LSP client repository and increases the generation duration by 15 percent.

@ghost ghost added Area-IDE Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Dec 8, 2022
@gundermanc gundermanc changed the title Dev/chgund/semantic tokens Add support for draft of semantic tokens LSIF spec to LSIF generator Dec 8, 2022
@gundermanc gundermanc marked this pull request as ready for review December 8, 2022 02:30
@gundermanc gundermanc requested a review from a team as a code owner December 8, 2022 02:30
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Graph/Capabilities.cs Outdated
Comment thread src/Features/Lsif/GeneratorTest/OutputFormatTests.vb Outdated
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch from 38ebc65 to 61a1bb3 Compare December 9, 2022 00:00
Comment thread src/Features/Lsif/Generator/CompilerInvocation.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/Generator/Graph/Capabilities.cs Outdated

AssertEx.EqualOrDiff(
"{""hoverProvider"":true,""declarationProvider"":false,""definitionProvider"":true,""referencesProvider"":true,""typeDefinitionProvider"":false,""documentSymbolProvider"":false,""foldingRangeProvider"":true,""diagnosticProvider"":false,""id"":1,""type"":""vertex"",""label"":""capabilities""}
"{""hoverProvider"":true,""declarationProvider"":false,""definitionProvider"":true,""referencesProvider"":true,""typeDefinitionProvider"":false,""documentSymbolProvider"":false,""foldingRangeProvider"":true,""diagnosticProvider"":false,""semanticTokensProvider"":{""tokenTypes"":[""namespace"",""type"",""class"",""enum"",""interface"",""struct"",""typeParameter"",""parameter"",""variable"",""property"",""enumMember"",""event"",""function"",""method"",""macro"",""keyword"",""modifier"",""comment"",""string"",""number"",""regexp"",""operator"",""class name"",""constant name"",""delegate name"",""enum member name"",""enum name"",""event name"",""excluded code"",""extension method name"",""field name"",""interface name"",""json - array"",""json - comment"",""json - constructor name"",""json - keyword"",""json - number"",""json - object"",""json - operator"",""json - property name"",""json - punctuation"",""json - string"",""json - text"",""keyword - control"",""label name"",""local name"",""method name"",""module name"",""namespace name"",""operator - overloaded"",""parameter name"",""preprocessor keyword"",""preprocessor text"",""property name"",""punctuation"",""record class name"",""record struct name"",""regex - alternation"",""regex - anchor"",""regex - character class"",""regex - comment"",""regex - grouping"",""regex - other escape"",""regex - quantifier"",""regex - self escaped character"",""regex - text"",""string - escape character"",""string - verbatim"",""struct name"",""text"",""type parameter name"",""whitespace"",""xml doc comment - attribute name"",""xml doc comment - attribute quotes"",""xml doc comment - attribute value"",""xml doc comment - cdata section"",""xml doc comment - comment"",""xml doc comment - delimiter"",""xml doc comment - entity reference"",""xml doc comment - name"",""xml doc comment - processing instruction"",""xml doc comment - text"",""xml literal - attribute name"",""xml literal - attribute quotes"",""xml literal - attribute value"",""xml literal - cdata section"",""xml literal - comment"",""xml literal - delimiter"",""xml literal - embedded expression"",""xml literal - entity reference"",""xml literal - name"",""xml literal - processing instruction"",""xml literal - text""],""tokenModifiers"":[]},""id"":1,""type"":""vertex"",""label"":""capabilities""}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry here that any time we update the list of semantic token types we'll have to go update this test. Can we update this test to itself grab the list of semantic token types and construct the expected string that way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's value to having at least one test that checks the actual list of token types because 1) there's a shared code path between LSP and LSIF here and 2) LSP and LSIF have slightly different expectations of which tokens get returned.

I want to make sure that evolution of LSP doesn't accidentally regress the returning of all token types in LSIF.

How about if I replace these two instances of the list with SemanticTokensHelpers.AllTokenTypes? That type is internal though so it will require IVT.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if I replace these two instances of the list with SemanticTokensHelpers.AllTokenTypes? That type is internal though so it will require IVT.

That'd be perfect; adding IVT is fine here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you willing to go without this one? Assembling correctly indented JSON from a combination of strings and a list of strings is going to be sort of a pain though it's easy enough to do for the regular LSIF output.

""documentSymbolProvider"": false,
""foldingRangeProvider"": true,
""diagnosticProvider"": false,
""semanticTokensProvider"": {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto -- it might worth finding a way not trying to hard code this since it'll be unfortunate having to change this any time we add a new type.

Comment on lines +28 to +32
' - Empty file - generates empty data
' - Single line
' - Multi-line
' - Roslyn custom token type ('x' in 'int x' uses a Roslyn custom token type.
' - Token in last line with no trailing newline (ensure sufficient range was passed to LSP generator).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we line these comments up with the InlineDatas?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Adjust the indentation? The comments are within the function, so it seems more stylistically consistent to match the indentation of the code within, no?

@jasonmalinowski jasonmalinowski Dec 12, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my assumption reading this comment was there's a 1:1 mapping to the InlineData arguments, so the comments could just be mixed together with the InlineDatas rather than having the lists in two places. And actually I'm seeing there's not a 1:1 mapping since there's five bullets here and four InlineDatas, so something seems funky.

If trying to smash the comments together is too hard, consider using a MemberData where another method can provide the test cases here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's five bullets here and four InlineDatas, so something seems funky

The last inline data covers both cases. The point of this comment though is to describe things that are determined specifically by the LSIF generator, not the shared SemanticTokensHelper code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that wasn't clear at all. Can we break this into a MemberData w/ the applicable comments before each?

Comment thread src/Features/Lsif/GeneratorTest/SemanticTokensTests.vb Outdated
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch from 61a1bb3 to 5abce19 Compare December 11, 2022 19:18
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch 2 times, most recently from e3f5601 to f64e453 Compare December 12, 2022 18:43
@gundermanc gundermanc requested review from CyrusNajmabadi and removed request for jasonmalinowski December 12, 2022 20:49
@jasonmalinowski jasonmalinowski self-requested a review December 12, 2022 21:07
Comment thread src/Features/Lsif/Generator/CompilerInvocation.cs Outdated
Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
using System.Threading.Tasks;
using Newtonsoft.Json;

internal sealed class SemanticTokensCapabilities

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the SemanticTokensLegend type that already exists in LSP for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two possible outcomes over the evolution of this tool:

  • We never add any more semantic tokens capabilities - in that case, yes, using SemanticTokensLegend would eliminate one boilerplate class.

  • We add more semantic tokens capabilities - these will almost definitely be different than the LSP ones requiring a custom type anyways.

I don't think it's worth spending any more time on this. When we standardize Semantic Tokens in the LSIF spec, this will likely require changes regardless.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we track this question in the spec somewhere if it's not already? Or does the spec not care and this becomes a issue for the LSP library?

@gundermanc gundermanc Dec 13, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this will be part of the Semantic Tokens LSIF spec proposal. Currently this exists as just a draft in my document.

I'll send a spec addition proposal on the LSIF site once I have finished validating the approach works.

I need an LSIF generator NuPkg for that :)

Comment on lines +28 to +32
' - Empty file - generates empty data
' - Single line
' - Multi-line
' - Roslyn custom token type ('x' in 'int x' uses a Roslyn custom token type.
' - Token in last line with no trailing newline (ensure sufficient range was passed to LSP generator).

@jasonmalinowski jasonmalinowski Dec 12, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my assumption reading this comment was there's a 1:1 mapping to the InlineData arguments, so the comments could just be mixed together with the InlineDatas rather than having the lists in two places. And actually I'm seeing there's not a 1:1 mapping since there's five bullets here and four InlineDatas, so something seems funky.

If trying to smash the comments together is too hard, consider using a MemberData where another method can provide the test cases here.

Comment thread src/Features/Lsif/Generator/Generator.cs Outdated
Comment thread src/Features/Lsif/GeneratorTest/SemanticTokensTests.vb
Comment thread src/Features/Lsif/GeneratorTest/Utilities/TestLsifOutput.vb Outdated
Comment thread src/Features/Lsif/GeneratorTest/SemanticTokensTests.vb Outdated
""scope"": ""project"",
""data"": 2,
""id"": 11,
""id"": 13,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we added semantic tokens to the end, these woudln't have to update, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably.

@jasonmalinowski jasonmalinowski left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off, but there are still some test cleanups I'd appreciate:

  1. Trying to make the tests not hardcode the classification list.
  2. Clarifying the comments in TestSemanticTokensData for which scenarios apply to what tests.

Obviously both of these can be follow-ups. The first I suspect will be forced if/when the moment @CyrusNajmabadi wants to add another embedded language. We're getting close to the holidays, so I presume that's imminent.

@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch 2 times, most recently from 82b51ff to 6f4a036 Compare December 13, 2022 00:17
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch 2 times, most recently from c600f45 to e97fbe6 Compare December 13, 2022 01:08
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch from e97fbe6 to b9a0e72 Compare December 13, 2022 02:08
@gundermanc gundermanc force-pushed the dev/chgund/SemanticTokens branch from 703a3db to b4e97fd Compare December 13, 2022 18:59
@gundermanc

Copy link
Copy Markdown
Contributor Author

Looks like another infrastructure issue? The integration tests failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-IDE Community The pull request was submitted by a contributor who is not a Microsoft employee.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants