Skip to content

Fix remote service invoker#11430

Merged
DustinCampbell merged 3 commits intodotnet:mainfrom
DustinCampbell:fix-remote-service-invoker
Jan 31, 2025
Merged

Fix remote service invoker#11430
DustinCampbell merged 3 commits intodotnet:mainfrom
DustinCampbell:fix-remote-service-invoker

Conversation

@DustinCampbell
Copy link
Copy Markdown
Member

@DustinCampbell DustinCampbell commented Jan 30, 2025

This fixes a bug in RemoteServiceInvoker that can cause its initialization step to be cancelled incorrectly, leaving it in a bad state. In this state, tag helpers/components never compute in Roslyn OOP for the lifetime of Visual Studio.

I recommend reviewing this change commit-by-commit. The description of the bug and the fix is in the first commit.

CI Build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2631845&view=results
Test Insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/607258

The first time that a remote service call is made by RemoteServiceInvoker, it calls into Roslyn OOP to initialize itself. However, the CancellationToken that's passed for initialization is the same one that is passed by the original service caller. If that token is cancelled by the caller during initialization, RemoteServiceInvoker will get stuck and never actually make a remote service call because the initialization task will forever be in a cancelled state.

The fix for this issue is to make RemoteServiceInvoker disposable, and add a CancellationTokenSource that is cancelled when the invoker itself is dispose. Then, pass that CancellationTokenSource's token to initialization.
1. Use AsyncLazy<RazorRemoteHostClient> for both the MessagePack and JSON remote clients. This means that RemoteServiceInvoker no longer acquires a RazorRemoteHostClient for every single service call, but shares the same client throughout its lifetime.
2. Call InitializeAsync before retrieving a client. This simplifies some awkward code.
3. Rework initialization for clarity and performance.
I had added this assert in a previous PR, but it turns out that it can be hit in perfectly normal scenarios during solution load.
@DustinCampbell DustinCampbell requested a review from a team as a code owner January 30, 2025 23:10
_isInitializedTask = remoteClient.TryInvokeAsync<IRemoteClientInitializationService>(
(s, ct) => s.InitializeAsync(initParams, ct),
cancellationToken);
_disposeTokenSource.Token);
Copy link
Copy Markdown
Member Author

@DustinCampbell DustinCampbell Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐞 Passing cancellationToken here (and below) is the bug. If the cancellationToken passed in by the caller is cancelled during this call, that leaves _initializedTask in a cancelled state. So, attempts to await the task just throw a TaskCanceledException and the remote service call never actually happens. Subsequent remote service calls note that initialization hasn't succeeded yet, so they try to initialize again and fail in the same way.

And on and on and on...

Copy link
Copy Markdown
Member

@davidwengier davidwengier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you <3

Comment on lines +130 to +131
var oopInitialized = _initializeOOPTask is { Status: TaskStatus.RanToCompletion };
var lspInitialized = _initializeLspTask is { Status: TaskStatus.RanToCompletion };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When these tasks are in the faulted state, they will permanently stay that way and this will effectively rethrow. Is that the intent here? Assume so but wanted to check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about that and decided I'd rather they just throw, at least for now. If we fail to initialize, that means that we couldn't get a remote client or the remote initialization call itself failed somehow (e.g. the process gets torn down or something like that).

I think we can make this more resilient and fail gracefully, but we need to change the TryInvokeAsync method above to do it. Currently, it just returns TResult?, so there's no way for a caller to tell the difference between an expected null result and a null-because-we're-in-bad-state result. So, instead of returning TResult?, I would propose returning Roslyn's Optional<TResult>, which is what RemoteClientHost.TryInvokeAsync is returning to us anyway. However, that's a much larger change to update all call sites. I'll file a bug for that and clean it up in a follow up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +142 to +144
var remoteClient = await _lazyJsonClient
.GetValueAsync(_disposeTokenSource.Token)
.ConfigureAwait(false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see remoteClient acquisition moving into the method vs. being passed in. Previously it was passed in but only conditionally used based which felt a bit odd. Now it's still conditionally used but the context is much clearer about when that occurs.

{
return;
}
if (!lspInitialized && _clientCapabilitiesService.CanGetClientCapabilities)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a lot of time looking at the relationship between CanGetClientCapabilities and GetClientCapabilities. Essentially, seeing if behavior in any of the implementations could break this code path. Particularly given that their appear to be threading concerns in this code (hence the lock) and that the property is explicitly mutable. Seemed like I could get things to break if I could find a place where we evere went from CanGetClientCapabilities being true to being false.

After digging through the implementations I can't see a case where this could occur. At the same though I was wondering if you considered just changing the interface to more simply:

internal interface IClientCapabilitiesService
{
    VSInternalClientCapabilities? ClientCapabilities { get; }
}

Basically let null be the "cannot provide" state? That is what all the implementations seem to do.

Copy link
Copy Markdown
Member Author

@DustinCampbell DustinCampbell Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. CanGetClientCapabilities can't really ever transition from true to false. (Unless the language server is torn down and started up again, but that's a whole other problem!)

Unfortunately, changing the interface to let null be the signal is a much more substantial change. There is a lot of code that touches that property that would need to be visited and updated (see below). Also, the vast majority of the code paths that touch the ClientCapabilities will only ever be called when the property is guaranteed not to be null by the LSP contract. E.g., the LSP client won't call them until it has called the "initialize" handler, which is what sets this value. So, while it's a bit more awkward and looks a little funny, we should probably leave CanGetClientCapabilities in place, at least for now.

image

@DustinCampbell
Copy link
Copy Markdown
Member Author

DustinCampbell commented Jan 31, 2025

It appears that the only integration test that failed is the same one that has been failing in all integration test runs since #11425 was merged. I won't block merging this PR on that particular test. cc @phil-allen-msft

@DustinCampbell
Copy link
Copy Markdown
Member Author

My test insertion shows a regression, but it's the same regression we're seeing in the latest Razor insertion from main. So, it doesn't appear to be related to this change.

@phil-allen-msft
Copy link
Copy Markdown
Member

/backport to release/dev17.13

@github-actions
Copy link
Copy Markdown
Contributor

Started backporting to release/dev17.13: https://github.com/dotnet/razor/actions/runs/13445478219

@github-actions
Copy link
Copy Markdown
Contributor

@phil-allen-msft an error occurred while backporting to "release/dev17.13", please check the run log for details!

Error: @phil-allen-msft is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your dotnet team membership visibility is set to Public on https://github.com/orgs/dotnet/people?query=phil-allen-msft

@phil-allen-msft
Copy link
Copy Markdown
Member

/backport to release/dev17.13

@github-actions
Copy link
Copy Markdown
Contributor

Started backporting to release/dev17.13: https://github.com/dotnet/razor/actions/runs/13445535977

@github-actions
Copy link
Copy Markdown
Contributor

@phil-allen-msft backporting to "release/dev17.13" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix issue where RemoteServiceInvoker never initializes
Applying: Simplify RemoteServiceInvoker implementation
Applying: Remove unnecessary Debug.Fail(...)
Using index info to reconstruct a base tree...
A	src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs deleted in HEAD and modified in Remove unnecessary Debug.Fail(...). Version Remove unnecessary Debug.Fail(...) of src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/ProjectSystem/CompilationHelpers.cs left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0003 Remove unnecessary Debug.Fail(...)
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants