Skip to content

Conversation

@zlatanov
Copy link
Contributor

When a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket.

However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams.

I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail.

However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed.

Fixes #86960

//cc @CarnaViire @stephentoub

@ghost ghost added area-System.Net community-contribution Indicates that the PR has been added by a community member labels Jun 23, 2023
@ghost
Copy link

ghost commented Jun 23, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

When a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket.

However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams.

I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail.

However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed.

Fixes #86960

//cc @CarnaViire @stephentoub

Author: zlatanov
Assignees: -
Labels:

area-System.Net

Milestone: -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub could you please take a look at the dispose pattern here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will end up queueing the clean up of resources such that it could end up happening after Dispose completes. We typically try hard to avoid that, as it means the consumer calling Dispose can't trust that everything has quiesced.

This change is to handle an erroneous case, right? The ManagedWebSocket is being disposed of while there are outstanding operations? And the actual resources involved here are all guarded at some level by SafeHandles, yes? Assuming that's all the case, I'd be inclined to say that Dispose should Dispose of the resources if it's safe to do (e.g. if it can immediately acquire the mutex), but otherwise it should just avoid disposing the resources and let finalization handle it in this erroneous case.

Copy link
Contributor Author

@zlatanov zlatanov Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you said, however there are cases when calling Dispose is not an erroneous case. Such is an Abort() call triggered for example by a cancellation token backed by a timeout. We could have different implementation for Abort and Dispose I guess, but that will only complicate things.

Another thing is that even if we do what you say, we still need to release the mutex with an asynchronous continuation. If we don't do this, we risk some user code never completing. Here is a simple example of how that might happen:

// User code calls send twice, concurrently in different threads
// Thread 1:
await socket.SendAsync();

// Thread 2:
await socket.SendAsync();

// Thread 1 takes the lock and starts the send operation.
// Before Thread 2 is queued in the AsyncMutex, 
// an Abort() or Dispose() happens in another thread which triggers the new code which 
// tries to acquire the lock and fails and doesn't release the resources, but also doesn't release the lock.

// Thread 2 is queued to wait

In this case since, because we don't release the AsyncMutex, Thread 2 code will never complete.

This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.

If you decide we should go down that path, I would advise that we implement the disposal of the unmanaged resources in the Send / Receive methods when they exit - we can check if _disposed is true. The reason for this is that usually the websocket without compression takes about 500 bytes of memory. Using compression this increases up ot 500KB depending on the window size. I am sure that there is user code out there that doesn't release strong references to websockets early, this would incur a tremendous penalty on the memory used by an application. I am even sure that there is code out there leaking websockets, and 500 bytes is not very much but 500KB is.

This can even be DDOS'ed by having many websockets connect to an endpoint and have then behave in a such a way to trigger Aborts by the server side.

My point being is that this implementation, in the end, will have the same net result as the current code, but the current code is far simpler to understand.

Copy link
Member

@stephentoub stephentoub Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.

Right, that's what I had in mind.

But taking a step back, I'm not clear on how we end up in this situation in the first place. Disposing of anything shouldn't lead to AVs... that's one of the main purposes of SafeHandle, where it defers releasing the underlying handle while a resource is still in use. Are we missing DangerousAddRef/Release calls around unsafe use of the underlying handle?

Copy link
Contributor Author

@zlatanov zlatanov Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is not the SafeHandle, it's that it is being disposed while being used by another thread to inflate / deflate.

Edit: Actually, let me take a look, I misunderstood what you meant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub You were correct. I've pushed a new implementation which uses DangerousAddRef/Release to tackle the issue. Let me know what you think.

@zlatanov
Copy link
Contributor Author

@stephentoub did you have time to look at the changes?

@karelz karelz added this to the 8.0.0 milestone Jul 3, 2023
@stephentoub
Copy link
Member

I took a peek. I'm a little disheartened, not by your change, but by how SafeHandle is being used today in System.IO.Compression. It's not really being used how it's intended to be used. It's not storing a pointer to the data that's actually being used by the native code, so the SafeHandle isn't itself being passed through the LibraryImports, which means it's not being ref counted via DangerousAdd/Release the way it's meant to be as part of P/Invokes. As such, using DangerousAdd/Release here isn't necessarily achieving the goal.

We might want to revert "temporarily" to your previous workaround, and then separately overhaul how SafeHandle is used by all of this code.

…uld corrupt state if there is pending send or receive operation and compression is enabled.
@zlatanov zlatanov force-pushed the websocket-dispose-thread-safety branch from 794f671 to e861917 Compare July 9, 2023 09:21
@zlatanov zlatanov requested a review from stephentoub July 9, 2023 09:22

if (lockTask.IsCompleted)
{
resource.Dispose();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we double-check for null here and below?

Suggested change
resource.Dispose();
resource?.Dispose();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed, because the resource can never become null. We only need to check once at the beginning of the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, that's what you removed in the first place, right? once assigned, it will now stay assigned

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@CarnaViire CarnaViire merged commit 21f07e1 into dotnet:main Jul 18, 2023
@zlatanov zlatanov deleted the websocket-dispose-thread-safety branch July 24, 2023 10:28
@ghost ghost locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Net community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frequent WebSocket Compression Segfault

4 participants