Fixing a concurrency bug in ManagedWebSocket which may cause memory corruption #87966

zlatanov · 2023-06-23T13:55:39Z

When a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket.

However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams.

I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail.

However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed.

Fixes #86960

//cc @CarnaViire @stephentoub

ghost · 2023-06-23T13:55:49Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

When a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket.

However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams.

I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail.

However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed.

Fixes #86960

//cc @CarnaViire @stephentoub

Author:	zlatanov
Assignees:	-
Labels:	`area-System.Net`
Milestone:	-

CarnaViire · 2023-06-23T16:17:58Z

src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/ManagedWebSocket.cs

@stephentoub could you please take a look at the dispose pattern here?

stephentoub · 2023-06-24T02:33:47Z

src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/ManagedWebSocket.cs

This will end up queueing the clean up of resources such that it could end up happening after Dispose completes. We typically try hard to avoid that, as it means the consumer calling Dispose can't trust that everything has quiesced.

This change is to handle an erroneous case, right? The ManagedWebSocket is being disposed of while there are outstanding operations? And the actual resources involved here are all guarded at some level by SafeHandles, yes? Assuming that's all the case, I'd be inclined to say that Dispose should Dispose of the resources if it's safe to do (e.g. if it can immediately acquire the mutex), but otherwise it should just avoid disposing the resources and let finalization handle it in this erroneous case.

I agree with what you said, however there are cases when calling Dispose is not an erroneous case. Such is an Abort() call triggered for example by a cancellation token backed by a timeout. We could have different implementation for Abort and Dispose I guess, but that will only complicate things.

Another thing is that even if we do what you say, we still need to release the mutex with an asynchronous continuation. If we don't do this, we risk some user code never completing. Here is a simple example of how that might happen:

// User code calls send twice, concurrently in different threads // Thread 1: await socket.SendAsync(); // Thread 2: await socket.SendAsync(); // Thread 1 takes the lock and starts the send operation. // Before Thread 2 is queued in the AsyncMutex, // an Abort() or Dispose() happens in another thread which triggers the new code which // tries to acquire the lock and fails and doesn't release the resources, but also doesn't release the lock. // Thread 2 is queued to wait

In this case since, because we don't release the AsyncMutex, Thread 2 code will never complete.

This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.

If you decide we should go down that path, I would advise that we implement the disposal of the unmanaged resources in the Send / Receive methods when they exit - we can check if _disposed is true. The reason for this is that usually the websocket without compression takes about 500 bytes of memory. Using compression this increases up ot 500KB depending on the window size. I am sure that there is user code out there that doesn't release strong references to websockets early, this would incur a tremendous penalty on the memory used by an application. I am even sure that there is code out there leaking websockets, and 500 bytes is not very much but 500KB is.

This can even be DDOS'ed by having many websockets connect to an endpoint and have then behave in a such a way to trigger Aborts by the server side.

My point being is that this implementation, in the end, will have the same net result as the current code, but the current code is far simpler to understand.

This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.

Right, that's what I had in mind.

But taking a step back, I'm not clear on how we end up in this situation in the first place. Disposing of anything shouldn't lead to AVs... that's one of the main purposes of SafeHandle, where it defers releasing the underlying handle while a resource is still in use. Are we missing DangerousAddRef/Release calls around unsafe use of the underlying handle?

The issue here is not the SafeHandle, it's that it is being disposed while being used by another thread to inflate / deflate.

Edit: Actually, let me take a look, I misunderstood what you meant.

@stephentoub You were correct. I've pushed a new implementation which uses DangerousAddRef/Release to tackle the issue. Let me know what you think.

zlatanov · 2023-06-27T13:31:37Z

@stephentoub did you have time to look at the changes?

stephentoub · 2023-07-05T02:53:01Z

I took a peek. I'm a little disheartened, not by your change, but by how SafeHandle is being used today in System.IO.Compression. It's not really being used how it's intended to be used. It's not storing a pointer to the data that's actually being used by the native code, so the SafeHandle isn't itself being passed through the LibraryImports, which means it's not being ref counted via DangerousAdd/Release the way it's meant to be as part of P/Invokes. As such, using DangerousAdd/Release here isn't necessarily achieving the goal.

We might want to revert "temporarily" to your previous workaround, and then separately overhaul how SafeHandle is used by all of this code.

…uld corrupt state if there is pending send or receive operation and compression is enabled.

CarnaViire · 2023-07-18T17:29:35Z

src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/ManagedWebSocket.cs

+
+                if (lockTask.IsCompleted)
+                {
+                    resource.Dispose();


Should we double-check for null here and below?

Suggested change

resource.Dispose();

resource?.Dispose();

It is not needed, because the resource can never become null. We only need to check once at the beginning of the method.

Ah, I see, that's what you removed in the first place, right? once assigned, it will now stay assigned

CarnaViire

LGTM, thanks

ghost added area-System.Net community-contribution Indicates that the PR has been added by a community member labels Jun 23, 2023

CarnaViire reviewed Jun 23, 2023

View reviewed changes

stephentoub reviewed Jun 24, 2023

View reviewed changes

karelz added this to the 8.0.0 milestone Jul 3, 2023

Fixing concurrency issue in ManagedWebSocket where calling Dispose co…

e861917

…uld corrupt state if there is pending send or receive operation and compression is enabled.

zlatanov force-pushed the websocket-dispose-thread-safety branch from 794f671 to e861917 Compare July 9, 2023 09:21

zlatanov requested a review from stephentoub July 9, 2023 09:22

karelz assigned CarnaViire Jul 18, 2023

CarnaViire reviewed Jul 18, 2023

View reviewed changes

CarnaViire approved these changes Jul 18, 2023

View reviewed changes

CarnaViire merged commit 21f07e1 into dotnet:main Jul 18, 2023

zlatanov deleted the websocket-dispose-thread-safety branch July 24, 2023 10:28

CarnaViire mentioned this pull request Jul 25, 2023

Fix SafeHandle usage in System.IO.Compression #89445

Closed

ghost locked as resolved and limited conversation to collaborators Aug 23, 2023

Fixing a concurrency bug in ManagedWebSocket which may cause memory corruption #87966

Fixing a concurrency bug in ManagedWebSocket which may cause memory corruption #87966

Uh oh!

Conversation

zlatanov commented Jun 23, 2023

Uh oh!

ghost commented Jun 23, 2023

Uh oh!

CarnaViire Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

stephentoub Jun 24, 2023

Choose a reason for hiding this comment

Uh oh!

zlatanov Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlatanov Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlatanov Jun 24, 2023

Choose a reason for hiding this comment

Uh oh!

zlatanov commented Jun 27, 2023

Uh oh!

stephentoub commented Jul 5, 2023

Uh oh!

CarnaViire Jul 18, 2023

Choose a reason for hiding this comment

Uh oh!

zlatanov Jul 18, 2023

Choose a reason for hiding this comment

Uh oh!

CarnaViire Jul 18, 2023

Choose a reason for hiding this comment

Uh oh!

CarnaViire left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zlatanov Jun 24, 2023 •

edited

Loading

stephentoub Jun 24, 2023 •

edited

Loading

zlatanov Jun 24, 2023 •

edited

Loading