Set an encoding on source text#49436
Conversation
|
@davidwengier So based on the investigation that @drognanar performed, it seemed like the VS-side snapshots that LUT is being told about via Workspace events were the 'real' snapshots (i.e. not LSP-based). The problem was that when it tried to retrieve the corresponding snapshot on the OOP side, LUT would get an LSP-based snapshot instead. The current change would certainly workaround the problem. But it seems to me that a better fix may be to make the LSP-based snapshots to have a different checksum from the 'real' ones so that features (not only LUT, but also other analysis) that don't need to see the LSP-based snapshots on the remote side won't see them. Now I'm not sure how different the LSP-based snapshots are from the real ones - if they are identical except for the encoding perhaps the current change is good enough. But if there are other differences that are not captured in the checksum OR if there are scenarios where the real (user-specified) encoding is significant for some analysis (or in this case LUT's build), we could run into problems in the future.... |
How do you define a 'real one' here? |
I am thinking of this as the snapshot where the user-specified encoding is preserved currently. This is the snapshot LUT sees currently when it listens to Workspace events on the VS side. |
Hrmm. @davidwengier would it be somehow possible for LUT to use the LSP document? Similar to what we're doing now with Diagnostics where we have diags just use LSP's doc instead? |
|
@CyrusNajmabadi @davidwengier On the VS side, LUT is listening to workspace changed events (i.e. events on However, even if LUT were to only see the LSP documents and snapshots, after the current change we would be hardcoding the LSP document encoding to UTF8 as opposed to the user-specified encoding for the original document and I'm not sure if that can lead to problems down the line... Also not sure if this can impact any other (3rd party?) analyzers out there that actually care about the user-specified encoding for documents... |
|
I tried out this PR and LUT ran successfully and no longer has any compilation errors. |
|
I also don't know how the documents are loaded into the Solution(s) in the codespaces scenario. It's still somewhat surprising to me that the VisualStudioWorkspace's solution did define an encoding, but not the OOP process. Are there any other things that these solutions could differ on because of LSP? |
I thought about this, but ultimately that would just be defeating a caching mechanism. If the checksums are the same, then that means the text is the same, so there should be no reason not to let multiple OOP requests use the same snapshot. The checksums actually handle the "no encoding" case by using UTF8 anyway, so this is really just making it explicit for the build tasks.
They're not, otherwise the checksums wouldn't match. It's certainly possible, from a LUT point of view, that the LSP solution could be behind and not have the up to date text, but in that case the checksum would be different so this issue wouldn't occur.
I would rather leave this for the future, if its an issue, to be fixed in OOP in general. LSP isn't doing anything weird here, or using internal APIs, so its possible this issue will come up in general (eg, what happens if the user supplies a Source Generator that adds source without an encoding?) but it doesn't make sense to me to block this fix.
That is what is happening, and why this issue occurs. The interesting part is that it only gets the LSP document if LSP is caught up with the workspace, since the workspace event is what triggers LUT to ask for the documents. The real question is should LUT be listening to didChange and using that as a trigger to rebuild. At the moment there is a potential timing issue where LUT results could be out of date due to the workspace being a bit behind, but I suspect the immediacy expectations of LUT are very different to something like diagnostics, so it might not matter in the real world.
This is hard coding UTF-8 because that is what the LSP client sends us. If the user, on the client, specifies UTF-7 or UTF-32, that is either not supported by LSP, or is converted to UTF-8 before being sent to us. I don't know which of those is true, but its a client issue to deal with, per the spec. It's worth noting that where this change is being made, we're receiving the whole document text, and we always throw away what is in the workspace for that document when creating out solution, so there is no concern here of mixing encodings.
No, at the moment the only state that LSP holds is text content for documents that are open on the client. We have talked at times about other approaches, like having a dedicated LSP workspace, but we've never quite needed it, and it could cause lots of other issues - like breaking LUT probably :) |
|
I'm merging this, but I've tried to reply to everything inline. If anyone wants to chat more, feel free to reach out over teams. It gets complicated trying to talk about all of this in text form :) |
|
@davidwengier Based on your responses, I agree the workaround is reasonable. We'll keep an eye out for any future feedback for LUT that may be related to encoding / LSP 😄 Also your comment around the immediacy expectation for LUT being lower is more or less accurate. LUT already operates on a delay so even though its listening to all workspace events, it only kicks off a build after a short delay after user has stopped typing. Assuming that the lag you mention is reasonable (e.g. that the workspace is not behind by several seconds), the main user expectation for LUT is that the build (and test results) will be eventually correct... |
Fixes AB#1244286 (or possibly, works around?)
The remote workspace caches solutions based on their checksum, and so LUT can sometimes see our LSP solutions if its text is the same as the workspace, but can't use them because LUT needs to do a build, and build can't work without an encoding.
This unblocks LUT by setting an encoding. I think something should change with OOP that makes this not necessary, but that's a longer term fix and this doesn't seem to cost anything.
FYI @shyamnamboodiripad @drognanar