-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fixing a concurrency bug in ManagedWebSocket which may cause memory corruption #87966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing a concurrency bug in ManagedWebSocket which may cause memory corruption #87966
Conversation
|
Tagging subscribers to this area: @dotnet/ncl Issue DetailsWhen a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket. However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams. I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail. However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed. Fixes #86960 //cc @CarnaViire @stephentoub
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub could you please take a look at the dispose pattern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will end up queueing the clean up of resources such that it could end up happening after Dispose completes. We typically try hard to avoid that, as it means the consumer calling Dispose can't trust that everything has quiesced.
This change is to handle an erroneous case, right? The ManagedWebSocket is being disposed of while there are outstanding operations? And the actual resources involved here are all guarded at some level by SafeHandles, yes? Assuming that's all the case, I'd be inclined to say that Dispose should Dispose of the resources if it's safe to do (e.g. if it can immediately acquire the mutex), but otherwise it should just avoid disposing the resources and let finalization handle it in this erroneous case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what you said, however there are cases when calling Dispose is not an erroneous case. Such is an Abort() call triggered for example by a cancellation token backed by a timeout. We could have different implementation for Abort and Dispose I guess, but that will only complicate things.
Another thing is that even if we do what you say, we still need to release the mutex with an asynchronous continuation. If we don't do this, we risk some user code never completing. Here is a simple example of how that might happen:
// User code calls send twice, concurrently in different threads
// Thread 1:
await socket.SendAsync();
// Thread 2:
await socket.SendAsync();
// Thread 1 takes the lock and starts the send operation.
// Before Thread 2 is queued in the AsyncMutex,
// an Abort() or Dispose() happens in another thread which triggers the new code which
// tries to acquire the lock and fails and doesn't release the resources, but also doesn't release the lock.
// Thread 2 is queued to waitIn this case since, because we don't release the AsyncMutex, Thread 2 code will never complete.
This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.
If you decide we should go down that path, I would advise that we implement the disposal of the unmanaged resources in the Send / Receive methods when they exit - we can check if _disposed is true. The reason for this is that usually the websocket without compression takes about 500 bytes of memory. Using compression this increases up ot 500KB depending on the window size. I am sure that there is user code out there that doesn't release strong references to websockets early, this would incur a tremendous penalty on the memory used by an application. I am even sure that there is code out there leaking websockets, and 500 bytes is not very much but 500KB is.
This can even be DDOS'ed by having many websockets connect to an endpoint and have then behave in a such a way to trigger Aborts by the server side.
My point being is that this implementation, in the end, will have the same net result as the current code, but the current code is far simpler to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also can be avoided, if we implement TryLock method on the AsyncMutex so we can have a fully synchronous path for lock / unlock.
Right, that's what I had in mind.
But taking a step back, I'm not clear on how we end up in this situation in the first place. Disposing of anything shouldn't lead to AVs... that's one of the main purposes of SafeHandle, where it defers releasing the underlying handle while a resource is still in use. Are we missing DangerousAddRef/Release calls around unsafe use of the underlying handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is not the SafeHandle, it's that it is being disposed while being used by another thread to inflate / deflate.
Edit: Actually, let me take a look, I misunderstood what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub You were correct. I've pushed a new implementation which uses DangerousAddRef/Release to tackle the issue. Let me know what you think.
|
@stephentoub did you have time to look at the changes? |
|
I took a peek. I'm a little disheartened, not by your change, but by how SafeHandle is being used today in System.IO.Compression. It's not really being used how it's intended to be used. It's not storing a pointer to the data that's actually being used by the native code, so the SafeHandle isn't itself being passed through the LibraryImports, which means it's not being ref counted via DangerousAdd/Release the way it's meant to be as part of P/Invokes. As such, using DangerousAdd/Release here isn't necessarily achieving the goal. We might want to revert "temporarily" to your previous workaround, and then separately overhaul how SafeHandle is used by all of this code. |
…uld corrupt state if there is pending send or receive operation and compression is enabled.
794f671 to
e861917
Compare
|
|
||
| if (lockTask.IsCompleted) | ||
| { | ||
| resource.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we double-check for null here and below?
| resource.Dispose(); | |
| resource?.Dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed, because the resource can never become null. We only need to check once at the beginning of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, that's what you removed in the first place, right? once assigned, it will now stay assigned
CarnaViire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
When a websocket with compression enabled is created we have two additional zlib streams inside - one for sending and one for receiving. They should only be accessed through Read and Write methods of the websocket.
However, since they contain unmanaged resources (zlib), when the websocket is disposed, those objects will be disposed as well. But Dispose and respectfully Abort methods are allowed to be called at any time for the websocket. This would cause all kinds of troubles if there are inflight send or receive operations which use the zlib streams.
I wrote a test that checks the implementation works for the receive case where I can use the test websocket stream and stimulate artificial delays. Without the current implementation the test will fail.
However for the send case it's very hard to do this, because the websocket needs to be disposed exactly when there is a message being compressed.
Fixes #86960
//cc @CarnaViire @stephentoub