-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Services should be resilient to unexpected exceptions and unexpected exceptions coming from IPC should not be any different than unexpected exceptions coming from other code. Services should handle them and recover where possible.
The client should indicate failure in the UI. The UI should indicate if the error is likely to be persistent or intermittent. An example of a persistent error is failure to calculate the result on the server due to a bug (unexpected exception) in the feature implementation. Attempting to recalculate with the same inputs will produce the same error again. The server may cache this failure. An intermittent error would be e.g. a network error. Results produced in presence of these errors should not be cached. The user may try to invoke the feature again if the error seems intermittent. The server may still produce a partial result with an indication of an error.
Currently unexpected exceptions thrown on the server side propagate to the client side, where a SoftCrashException is thrown (pretending the error is a cancellation), an info-bar is displayed suggesting the user to restart VS due to possible corruption.
Proposal
Implement a workspace service that provides a shutdown cancellation token. In VS this would be the disposal token of a Roslyn package.
We will require that remote calls do not throw expected (“by design”) exceptions. Expected failures (e.g. can’t find a symbol, etc.) need to be returned as a regular result of the service (e.g. a diagnostic, error code, success bool, etc.).
There are essentially 3 kinds of “unexpected” remote invocation failures (i.e. the call does not return a value). In all these cases the RemoteHostClient.TryRunRemote will return false (or default(Optional<T>)):
-
IO/network error
This shouldn't happen right now (or very rarely) since we are using named pipes on the same machine. In future the service might be on a different machine. A gold bar will be displayed byRemoteHostClient.TryRunRemote: “Feature xyz is currently unavailable due to network issues”.
We need to change our FatalError filters to let network exceptions (IOExceptions in general?) thru without reporting NFW since they are not unexpected exceptions anymore. -
Unexpected exception thrown by the service implementation due to a bug.
The service itself may recover from the exception. It should report NFW and make sure the error is reported to the user in some way. E.g. a diagnostic is included in the result of the service and then surfaced on the client. The service may also cache the error if it’d hit it again next time it’s invoked with the same arguments. If an unexpected exception leaks from a service implementation all the way to the server side of the remote call, the NFW is reported there and the exception is propagated to the client (already implemented in ServiceBase.RunServiceAsync). This failure surfaces on the client side as RemoteInvocationException. The remote invocation will catch this exception. A gold bar will be displayed byRemoteHostClient.TryRunRemote: “Feature xyz is currently unavailable due to an internal error [Details]”. -
Shutdown cancellation.
Ideally all entry points to all our features use the shutdown cancellation token or a token that’s linked with it. If a feature did not link shutdown cancellation token into its token the RPC invocation might throw ObjectDisposedException as it gets disposed when disconnected. TryRunRemote will catch this exception and check if the shutdown cancellation token is signaled. If it is it will not display any gold bar and just return false. If shutdown has not started then gold bar is displayed “Feature xyz is currently unavailable” (this may happen when our ServiceHub process is terminated for some reason – e.g. stack overflow, or the process is killed). In future we might automatically restart the ServiceHub process.
This assumes that each caller of a remote service can recover from a failure (indicated by TryRunRemote returning false) at the point it makes the remote call by doing “nothing”. For example, returning empty results of find references, etc. If, for some reason, it can't recover then it can throw some custom exception (or propagate the failure in some other way) to unwind to a frame where it can recover.
Remote callbacks
A callback implementation should report NFW on unexpected exceptions and propagate them. Reporting NFW captures state of the client at the time of the failure. Propagation of the exception results in RemoteInvocationException thrown on the server side (that invoked the callback mehtod). The service implementation reports NFW in ServiceBase.RunServiceAsync or in an earlier filter because RemoteInvocationException is unexpected. It propagates the exception back to the client where gold bar is displayed (scenario #2 above). The gold bar reporting needs to handle nested RemoteInvocationExceptions.
Serialization issues that occur during callback invocation will manifest as any other unexpected exception on the server side and get reported by NFW filters.
When a client drops connection during callback invocation the service should cancel the work on the request since the client has disconnected. NFW should not be reported for these issues. The assumption is that the connection is also dropped in the opposite direction (from client to the service), so there is no way (and no point) to tell the client about the issue.