Set an encoding on source text by davidwengier · Pull Request #49436 · dotnet/roslyn

davidwengier · 2020-11-17T10:54:33Z

Fixes AB#1244286 (or possibly, works around?)

The remote workspace caches solutions based on their checksum, and so LUT can sometimes see our LSP solutions if its text is the same as the workspace, but can't use them because LUT needs to do a build, and build can't work without an encoding.

This unblocks LUT by setting an encoding. I think something should change with OOP that makes this not necessary, but that's a longer term fix and this doesn't seem to cost anything.

FYI @shyamnamboodiripad @drognanar

shyamnamboodiripad · 2020-11-17T17:59:05Z

@davidwengier So based on the investigation that @drognanar performed, it seemed like the VS-side snapshots that LUT is being told about via Workspace events were the 'real' snapshots (i.e. not LSP-based). The problem was that when it tried to retrieve the corresponding snapshot on the OOP side, LUT would get an LSP-based snapshot instead.

The current change would certainly workaround the problem. But it seems to me that a better fix may be to make the LSP-based snapshots to have a different checksum from the 'real' ones so that features (not only LUT, but also other analysis) that don't need to see the LSP-based snapshots on the remote side won't see them.

Now I'm not sure how different the LSP-based snapshots are from the real ones - if they are identical except for the encoding perhaps the current change is good enough. But if there are other differences that are not captured in the checksum OR if there are scenarios where the real (user-specified) encoding is significant for some analysis (or in this case LUT's build), we could run into problems in the future....

CyrusNajmabadi · 2020-11-17T18:27:38Z

Now I'm not sure how different the LSP-based snapshots are from the real ones

How do you define a 'real one' here?

shyamnamboodiripad · 2020-11-17T18:42:06Z

How do you define a 'real one' here?

I am thinking of this as the snapshot where the user-specified encoding is preserved currently. This is the snapshot LUT sees currently when it listens to Workspace events on the VS side.

CyrusNajmabadi · 2020-11-17T18:47:13Z

I am thinking of this as the snapshot where the user-specified encoding is preserved currently. This is the snapshot LUT sees when it listens to Workspace events on the VS side.

Hrmm. @davidwengier would it be somehow possible for LUT to use the LSP document? Similar to what we're doing now with Diagnostics where we have diags just use LSP's doc instead?

shyamnamboodiripad · 2020-11-17T19:10:39Z

@CyrusNajmabadi @davidwengier On the VS side, LUT is listening to workspace changed events (i.e. events on VisualStudioWorkspace). Not sure if VisualStudioWorkspace events can / should switch to the LSP documents (and what other features that might impact).

However, even if LUT were to only see the LSP documents and snapshots, after the current change we would be hardcoding the LSP document encoding to UTF8 as opposed to the user-specified encoding for the original document and I'm not sure if that can lead to problems down the line... Also not sure if this can impact any other (3rd party?) analyzers out there that actually care about the user-specified encoding for documents...

drognanar · 2020-11-17T23:19:41Z

I tried out this PR and LUT ran successfully and no longer has any compilation errors.

drognanar · 2020-11-17T23:20:16Z

I also don't know how the documents are loaded into the Solution(s) in the codespaces scenario.
As long as they synchronize with the content from the VS client and have the same encoding, then we could use the LSP document as well.

It's still somewhat surprising to me that the VisualStudioWorkspace's solution did define an encoding, but not the OOP process. Are there any other things that these solutions could differ on because of LSP?

davidwengier · 2020-11-18T00:13:17Z

make the LSP-based snapshots to have a different checksum from the 'real' ones so that features (not only LUT, but also other analysis) that don't need to see the LSP-based snapshots on the remote side won't see them.

I thought about this, but ultimately that would just be defeating a caching mechanism. If the checksums are the same, then that means the text is the same, so there should be no reason not to let multiple OOP requests use the same snapshot. The checksums actually handle the "no encoding" case by using UTF8 anyway, so this is really just making it explicit for the build tasks.

Now I'm not sure how different the LSP-based snapshots are from the real ones

They're not, otherwise the checksums wouldn't match. It's certainly possible, from a LUT point of view, that the LSP solution could be behind and not have the up to date text, but in that case the checksum would be different so this issue wouldn't occur.

if there are other differences that are not captured in the checksum

I would rather leave this for the future, if its an issue, to be fixed in OOP in general. LSP isn't doing anything weird here, or using internal APIs, so its possible this issue will come up in general (eg, what happens if the user supplies a Source Generator that adds source without an encoding?) but it doesn't make sense to me to block this fix.

would it be somehow possible for LUT to use the LSP document?

That is what is happening, and why this issue occurs. The interesting part is that it only gets the LSP document if LSP is caught up with the workspace, since the workspace event is what triggers LUT to ask for the documents. The real question is should LUT be listening to didChange and using that as a trigger to rebuild. At the moment there is a potential timing issue where LUT results could be out of date due to the workspace being a bit behind, but I suspect the immediacy expectations of LUT are very different to something like diagnostics, so it might not matter in the real world.

we would be hardcoding the LSP document encoding to UTF8 as opposed to the user-specified encoding

This is hard coding UTF-8 because that is what the LSP client sends us. If the user, on the client, specifies UTF-7 or UTF-32, that is either not supported by LSP, or is converted to UTF-8 before being sent to us. I don't know which of those is true, but its a client issue to deal with, per the spec. It's worth noting that where this change is being made, we're receiving the whole document text, and we always throw away what is in the workspace for that document when creating out solution, so there is no concern here of mixing encodings.

Are there any other things that these solutions could differ on because of LSP?

No, at the moment the only state that LSP holds is text content for documents that are open on the client. We have talked at times about other approaches, like having a dedicated LSP workspace, but we've never quite needed it, and it could cause lots of other issues - like breaking LUT probably :)

davidwengier · 2020-11-18T00:18:31Z

I'm merging this, but I've tried to reply to everything inline. If anyone wants to chat more, feel free to reach out over teams. It gets complicated trying to talk about all of this in text form :)

shyamnamboodiripad · 2020-11-18T01:11:05Z

@davidwengier Based on your responses, I agree the workaround is reasonable. We'll keep an eye out for any future feedback for LUT that may be related to encoding / LSP 😄

Also your comment around the immediacy expectation for LUT being lower is more or less accurate. LUT already operates on a delay so even though its listening to all workspace events, it only kicks off a build after a short delay after user has stopped typing. Assuming that the lag you mention is reasonable (e.g. that the workspace is not behind by several seconds), the main user expectation for LUT is that the build (and test results) will be eventually correct...

Set an encoding on source text

98bff01

davidwengier added the Area-IDE label Nov 17, 2020

davidwengier requested a review from a team as a code owner November 17, 2020 10:54

CyrusNajmabadi approved these changes Nov 17, 2020

View reviewed changes

davidwengier merged commit fb777bd into dotnet:master Nov 18, 2020

davidwengier deleted the SetEncodingOnLSPText branch November 18, 2020 00:18

ghost added this to the Next milestone Nov 18, 2020

allisonchou modified the milestones: Next, 16.9.P2 Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set an encoding on source text#49436

Set an encoding on source text#49436
davidwengier merged 1 commit intodotnet:masterfrom
davidwengier:SetEncodingOnLSPText

davidwengier commented Nov 17, 2020 •

edited by Pilchie

Loading

Uh oh!

shyamnamboodiripad commented Nov 17, 2020

Uh oh!

CyrusNajmabadi commented Nov 17, 2020

Uh oh!

shyamnamboodiripad commented Nov 17, 2020 •

edited

Loading

Uh oh!

CyrusNajmabadi commented Nov 17, 2020

Uh oh!

shyamnamboodiripad commented Nov 17, 2020 •

edited

Loading

Uh oh!

drognanar commented Nov 17, 2020

Uh oh!

drognanar commented Nov 17, 2020

Uh oh!

davidwengier commented Nov 18, 2020 •

edited

Loading

Uh oh!

davidwengier commented Nov 18, 2020

Uh oh!

shyamnamboodiripad commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

davidwengier commented Nov 17, 2020 • edited by Pilchie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shyamnamboodiripad commented Nov 17, 2020

Uh oh!

CyrusNajmabadi commented Nov 17, 2020

Uh oh!

shyamnamboodiripad commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrusNajmabadi commented Nov 17, 2020

Uh oh!

shyamnamboodiripad commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drognanar commented Nov 17, 2020

Uh oh!

drognanar commented Nov 17, 2020

Uh oh!

davidwengier commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidwengier commented Nov 18, 2020

Uh oh!

shyamnamboodiripad commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidwengier commented Nov 17, 2020 •

edited by Pilchie

Loading

shyamnamboodiripad commented Nov 17, 2020 •

edited

Loading

shyamnamboodiripad commented Nov 17, 2020 •

edited

Loading

davidwengier commented Nov 18, 2020 •

edited

Loading