Fix remote service invoker#11430
Conversation
The first time that a remote service call is made by RemoteServiceInvoker, it calls into Roslyn OOP to initialize itself. However, the CancellationToken that's passed for initialization is the same one that is passed by the original service caller. If that token is cancelled by the caller during initialization, RemoteServiceInvoker will get stuck and never actually make a remote service call because the initialization task will forever be in a cancelled state. The fix for this issue is to make RemoteServiceInvoker disposable, and add a CancellationTokenSource that is cancelled when the invoker itself is dispose. Then, pass that CancellationTokenSource's token to initialization.
1. Use AsyncLazy<RazorRemoteHostClient> for both the MessagePack and JSON remote clients. This means that RemoteServiceInvoker no longer acquires a RazorRemoteHostClient for every single service call, but shares the same client throughout its lifetime. 2. Call InitializeAsync before retrieving a client. This simplifies some awkward code. 3. Rework initialization for clarity and performance.
I had added this assert in a previous PR, but it turns out that it can be hit in perfectly normal scenarios during solution load.
| _isInitializedTask = remoteClient.TryInvokeAsync<IRemoteClientInitializationService>( | ||
| (s, ct) => s.InitializeAsync(initParams, ct), | ||
| cancellationToken); | ||
| _disposeTokenSource.Token); |
There was a problem hiding this comment.
🐞 Passing cancellationToken here (and below) is the bug. If the cancellationToken passed in by the caller is cancelled during this call, that leaves _initializedTask in a cancelled state. So, attempts to await the task just throw a TaskCanceledException and the remote service call never actually happens. Subsequent remote service calls note that initialization hasn't succeeded yet, so they try to initialize again and fail in the same way.
And on and on and on...
| var oopInitialized = _initializeOOPTask is { Status: TaskStatus.RanToCompletion }; | ||
| var lspInitialized = _initializeLspTask is { Status: TaskStatus.RanToCompletion }; |
There was a problem hiding this comment.
When these tasks are in the faulted state, they will permanently stay that way and this will effectively rethrow. Is that the intent here? Assume so but wanted to check.
There was a problem hiding this comment.
Yeah, I was thinking about that and decided I'd rather they just throw, at least for now. If we fail to initialize, that means that we couldn't get a remote client or the remote initialization call itself failed somehow (e.g. the process gets torn down or something like that).
I think we can make this more resilient and fail gracefully, but we need to change the TryInvokeAsync method above to do it. Currently, it just returns TResult?, so there's no way for a caller to tell the difference between an expected null result and a null-because-we're-in-bad-state result. So, instead of returning TResult?, I would propose returning Roslyn's Optional<TResult>, which is what RemoteClientHost.TryInvokeAsync is returning to us anyway. However, that's a much larger change to update all call sites. I'll file a bug for that and clean it up in a follow up.
| var remoteClient = await _lazyJsonClient | ||
| .GetValueAsync(_disposeTokenSource.Token) | ||
| .ConfigureAwait(false); |
There was a problem hiding this comment.
Happy to see remoteClient acquisition moving into the method vs. being passed in. Previously it was passed in but only conditionally used based which felt a bit odd. Now it's still conditionally used but the context is much clearer about when that occurs.
| { | ||
| return; | ||
| } | ||
| if (!lspInitialized && _clientCapabilitiesService.CanGetClientCapabilities) |
There was a problem hiding this comment.
I spent a lot of time looking at the relationship between CanGetClientCapabilities and GetClientCapabilities. Essentially, seeing if behavior in any of the implementations could break this code path. Particularly given that their appear to be threading concerns in this code (hence the lock) and that the property is explicitly mutable. Seemed like I could get things to break if I could find a place where we evere went from CanGetClientCapabilities being true to being false.
After digging through the implementations I can't see a case where this could occur. At the same though I was wondering if you considered just changing the interface to more simply:
internal interface IClientCapabilitiesService
{
VSInternalClientCapabilities? ClientCapabilities { get; }
}Basically let null be the "cannot provide" state? That is what all the implementations seem to do.
There was a problem hiding this comment.
You're correct. CanGetClientCapabilities can't really ever transition from true to false. (Unless the language server is torn down and started up again, but that's a whole other problem!)
Unfortunately, changing the interface to let null be the signal is a much more substantial change. There is a lot of code that touches that property that would need to be visited and updated (see below). Also, the vast majority of the code paths that touch the ClientCapabilities will only ever be called when the property is guaranteed not to be null by the LSP contract. E.g., the LSP client won't call them until it has called the "initialize" handler, which is what sets this value. So, while it's a bit more awkward and looks a little funny, we should probably leave CanGetClientCapabilities in place, at least for now.
|
It appears that the only integration test that failed is the same one that has been failing in all integration test runs since #11425 was merged. I won't block merging this PR on that particular test. cc @phil-allen-msft |
|
My test insertion shows a regression, but it's the same regression we're seeing in the latest Razor insertion from main. So, it doesn't appear to be related to this change. |
|
/backport to release/dev17.13 |
|
Started backporting to release/dev17.13: https://github.com/dotnet/razor/actions/runs/13445478219 |
|
@phil-allen-msft an error occurred while backporting to "release/dev17.13", please check the run log for details! Error: @phil-allen-msft is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your dotnet team membership visibility is set to Public on https://github.com/orgs/dotnet/people?query=phil-allen-msft |
|
/backport to release/dev17.13 |
|
Started backporting to release/dev17.13: https://github.com/dotnet/razor/actions/runs/13445535977 |
|
@phil-allen-msft backporting to "release/dev17.13" failed, the patch most likely resulted in conflicts: $ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix issue where RemoteServiceInvoker never initializes
Applying: Simplify RemoteServiceInvoker implementation
Applying: Remove unnecessary Debug.Fail(...)
Using index info to reconstruct a base tree...
A src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs deleted in HEAD and modified in Remove unnecessary Debug.Fail(...). Version Remove unnecessary Debug.Fail(...) of src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0003 Remove unnecessary Debug.Fail(...)
Error: The process '/usr/bin/git' failed with exit code 128Please backport manually! |
Backport of #11430 to 17.13
Backport of #11430 to 17.13

This fixes a bug in
RemoteServiceInvokerthat can cause its initialization step to be cancelled incorrectly, leaving it in a bad state. In this state, tag helpers/components never compute in Roslyn OOP for the lifetime of Visual Studio.I recommend reviewing this change commit-by-commit. The description of the bug and the fix is in the first commit.
CI Build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2631845&view=results
Test Insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/607258